Best Practices for Building Data Warehouse

 

Apache Hive is data warehouse used for analyzing, querying and summarising data. Apache Hive converts queries into Mapreduce jobs. Data Warehouse implementation comprises of ETL process i.e. Extract, Transform and Load.

 

Apache Spark is analytics framework for data processing which performs Real-Time, Batch and Advanced analytics.Apache Spark helps companies and markets to ingest data and run Machine Learning Models.

 

Data Warehouse using Apache Hive

 

  • Import Data
  • Design Data Warehouse
  • Build Data Warehouse with Hive using Data Library
  • Run Queries

 

Data Warehousing involves several steps –

 

  • Extraction of Data
  • Transformation of Data
  • Loading of transformed data
  • Data Visualisation