Building Data Flow Pipeline

StreamSets Data Collector contains connectors to many systems acting as origins or destinations including not only traditional methods such as relational databases, files, but Kafka, HDFS, cloud tools also. Moreover, it allows a graphical interface for building pipeline bifurcated into :


  • Data Acquisition
  • Data Transformation
  • Data Storage
  • Data Flow Triggers


Steps to Build Data Flow Pipeline using StreamSets


  • StreamSet Data Collector Installation
  • Creation of Java DataBase Connectivity
  • Create Data Flow Pipeline
  • Discard Useless Fields from Pipeline
  • Modification of fields through Expression Evaluator
  • Stream Selector to pass data to streams
  • View Data Pipeline States and Statistics
  • Automate through Data Collector Logs and Pipeline History


Supremacy of StreamSets


  • Efficient Pipeline Development
  • Pipelines Ingestion
  • Change Data Capture
  • Continuous Data Integration
  • Timely Data Delivery
  • Detection of Anomalies at every stage throughout the pipeline