Running Data Pipelines Locally Using Containerized Ceph S3, Kafka & NiFI

YearsExperience,Salary
1.1,39343.00
1.3,46205.00
1.5,37731.00
.
.
.

Prerequisites

  • A computer running Podman/Docker
  • In the case of using Podman, Install the podman-docker package

Setting Up The Infrastructure

$ docker network create data-pipeline

Running A Single-Node Ceph Cluster For S3

$ mkdir -p /data/etc/ceph/ 
$ mkdir -p /data/var/lib/ceph/
$ docker run -d --privileged --name ceph --net data-pipeline -e NETWORK_AUTO_DETECT=4 -v /data/var/lib/ceph:/var/lib/ceph:rw -v /data/etc/ceph:/etc/ceph:rw -e CEPH_DEMO_UID=nifi -e CEPH_DEMO_ACCESS_KEY=nifi -e CEPH_DEMO_SECRET_KEY=nifi -p 8080:8080 registry.redhat.io/rhceph-alpha/rhceph-5-rhel8@sha256:9aaea414e2c263216f3cdcb7a096f57c3adf6125ec9f4b0f5f65fa8c43987155 demo
$ pip3 install awscli
aws configure
AWS Access Key ID [****************nifi]:
AWS Secret Access Key [****************nifi]:
Default region name [None]:
Default output format [None]:
$ aws s3 mb s3://nifi --endpoint-url http://127.0.0.1:8080make_bucket: nifi
$ aws s3 cp paychecks.csv s3://nifi/ --endpoint-url http://127.0.0.1:8080upload: ./paychecks.csv to s3://nifi/paychecks.csv

Running A Single-Node Kafka Cluster

$ docker run -d --name zookeeper --net data-pipeline -e ZOOKEEPER_CLIENT_PORT=2181 -e ZOOKEEPER_TICK_TIME=2000 -p 22181:2181 confluentinc/cp-zookeeper:latest
$ docker run -d --name kafka --net data-pipeline -e KAFKA_BROKER_ID=1 -e KAFKA_ZOOKEEPER_CONNECT=zookeeper:2181 -e KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://kafka:9092,PLAINTEXT_HOST://localhost:29092 -e KAFKA_LISTENER_SECURITY_PROTOCOL_MAP=PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT -e KAFKA_INTER_BROKER_LISTENER_NAME=PLAINTEXT -e KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR=1 -p 29092:29092 confluentinc/cp-kafka:latest
$ docker exec -it kafka kafka-topics --create --topic data-pipeline --replication-factor 1 --partitions 1 -bootstrap-server kafka:9092Created topic data-pipeline.

Running A Single-Node NiFi

docker run -d --net data-pipeline --name nifi -d -p 8443:8443 apache/nifi:latest
docker logs nifi | grep -i generatedGenerated Username [xxxxxx]
Generated Password [xxxxxx]

Configuring The Data Pipeline

$ docker inspect ceph | grep IPAddress
"IPAddress": "",
"IPAddress": "10.89.3.5",

Running Our Data Flow

Validating Our Transformation

kcat -b localhost:29092 -t data-pipeline

Testing Our Flow’s Consistency

$ aws s3 cp paychecks2.csv s3://nifi/ --endpoint-url http://127.0.0.1:8080upload: ./paychecks2.csv to s3://nifi/paychecks2.csv

Conclusion

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store