Building Query Platform with Presto and Apache Hive


Distributed SQL Query Engine Presto runs analytic queries. Infrastructure Automation implemented using Ansible and Terraform for Auto Launching, Auto Scaling and Auto Healing of the Presto Cluster and Hive using AWS On-Demand EC2 and AWS Spot Instances.


Presto has following Features

  • Presto queries data in Hive MetaStore and optimized for latency.
  • Presto has Push Data Processing Models like traditional DBMS implementations.
  • Presto includes memory limitation for query Tasks and runs daily /weekly reports queries Required a Large Amount of Memory.

Apache Hive Features

  • Hive runs Batch Processing against data sources of all sizes ranging from Gigabytes to Petabytes.
  • Hive optimized for query throughput.
  • Hive has Pull Data Processing Modelling.

Top Business Challenge for Big Data Processing

  • Build Data Processing & Query Platform and Cluster Management.
  • Large DataSets on remote storage and use Presto for data discovery and Apache Hive, Tez For ETL Jobs.
  • Infrastructure Automation for Cluster Management and deployment for Presto and Hive using AWS Spot Instances.

Solution Offerings for Infrastructure Automation

  • Simplify, Speed Up and Scale Big Data Analytics workloads.
  • Process Data from external storage using fast execution engines like Presto and Hive.
  • Run large and complex queries.
  • Cost effective using AWS spot instances as default and heal the cluster if cluster scale is smaller than the minimum cluster size.
  • Automatic Scale Up and Down the cluster according to the CPU load.

Looking For More Details

Download Now

Data Driven Enterprises with DataOps

Talk to Experts for Continuous Delivery to Analytics, Machine Learning and Data Management Practices

Reach Us

Disrupting Industries with Enterprise AI

Accelerate AI Adoption by Harnessing AI Power, Implementing AI Solutions and Leveraging AI Marketplace

Contact Us

Continuous Delivery Platform for Big Data and Data Science