Introduction to Real-Time Streaming
- Real-Time Streaming involves data pipeline for Data Ingestion from different sources using Apache Nifi, Apache Kafka, Apache Spark, and Cassandra.
- Apache Nifi provides Web UI Dashboard and helps to automate the workflow.
Real-Time Streaming Architecture for Data Pipeline Components
- Automate Data Workflow – Apache Nifi
- Messaging System – Apache Kafka
- Stream Processing Engine – Apache Spark Streaming
- Rest API & Twitter Dashboard for Real – Time Tweets
Business Challenge for Building Data Pipeline
- Benchmarking of Data Pipeline using Nifi and Kafka with message size and duration.
- Real-Time Streaming, Memory Management, scalability, and concurrency.
- Implement Interactive Dashboard with Real-Time Data Analytics and visualization in D3.js Charts and React.js.
- End-to-End delivery guarantee and Error handling of data from Twitter agent to processing engine.
- Test Data will be Apache Hadoop Cluster Logs and Twitter Stream API.
Solution Offered For Building Real-Time Streaming Data Pipeline
- Real-Time Streaming Platform with Apache Nifi as Collector as well as Producer for Data Ingestion.
- Apache Nifi as Collector and Apache Kafka as a Producer with Apache Spark Streaming and Apache Spark Structured Streaming.
- Apache Cassandra deployed as Microservices architecture on Kubernetes as well as on EC2 Instances as a Cluster for scaling, guaranteed delivery of data across the Data Pipeline.