XenonStack Recommends

DataOps

Azure Data Analytics Pipeline with Apache Spark

Chandan Gaur | 27 May 2017

Azure Data Analytics Pipeline with Apache Spark

Introduction to Real-Time Streaming

  • Real-Time Streaming involves data pipeline for Data Ingestion from different sources using Apache Nifi, Apache Kafka, Apache Spark, and Cassandra.
  • Apache Nifi provides Web UI Dashboard and helps to automate the workflow.
 

Real-Time Streaming Architecture for Data Pipeline Components

  • Automate Data Workflow - Apache Nifi
  • Messaging System - Apache Kafka
  • Stream Processing Engine - Apache Spark Streaming
  • Rest API & Twitter Dashboard for Real - Time Tweets
 

Business Challenge for Building Data Pipeline

  • Benchmarking of Data Pipeline using Nifi and Kafka with message size and duration.
  • Real-Time Streaming, Memory Management, scalability, and concurrency.
  • Implement Interactive Dashboard with Real-Time Data Analytics and visualization in D3.js Charts and React.js.
  • End-to-End delivery guarantee and Error handling of data from Twitter agent to processing engine.
  • Test Data will be Apache Hadoop Cluster Logs and Twitter Stream API.
 

Solution Offered For Building Real-Time Streaming Data Pipeline

  • Real-Time Streaming Platform with Apache Nifi as Collector as well as Producer for Data Ingestion.
  • Apache Nifi as Collector and Apache Kafka as a Producer with Apache Spark Streaming and Apache Spark Structured Streaming.
  • Apache Cassandra deployed as Microservices architecture on Kubernetes as well as on EC2 Instances as a Cluster for scaling, guaranteed delivery of data across the Data Pipeline.
captcha text
Refresh Icon

Thanks for submitting the form.