Big Data Processing with Presto and Apache Hive

Building Query Platform with Presto and Apache Hive

Distributed SQL Query Engine Presto runs analytic queries. Infrastructure Automation implemented using Ansible and Terraform for Auto Launching, Auto Scaling and Auto Healing of its Cluster and Hive using AWS On-Demand EC2 and AWS Spot Instances.

Presto has following Features

It queries data in Hive MetaStore and optimized for latency.
It has Push Data Processing Models like traditional DBMS implementations.
It includes memory limitation for query Tasks and runs daily /weekly reports queries Required a Large Amount of Memory.

Apache Hive Features

Hive runs Batch Processing against data sources of all sizes ranging from Gigabytes to Petabytes.
Hive optimized for query throughput.
Hive has Pull Data Processing Modelling.

Common challenge for Big Data Processing

Build Data Processing & Query Platform and Cluster Management.
Large DataSets on remote storage and use Presto for data discovery and Apache Hive, Tez For ETL Jobs.
Infrastructure Automation for Cluster Management and deployment for it and Hive using AWS Spot Instances.

Solution for Infrastructure Automation

Simplify, Speed Up and Scale Big Data Analytics workloads.
Process Data from external storage using fast execution engines like it and Hive.
Run large and complex queries.
Cost effective using AWS spot instances as default and heal the cluster if cluster scale is smaller than the minimum cluster size.
Automatic Scale Up and Down the cluster according to the CPU load.

Building Real Time Applications

It queries data including Hive, Cassandra, relational databases, separating computation from storage performing independent scaling. It combines data from multiple sources and allows analytics. Its features involve Mobile Administration, Printer State Detection, Configurable Alerts, Active Directory Integration, Native Printing Workflows, Device Agnostic, Geolocation, flexible licensing.

Real-Time Applications of Presto on AWS

Presto as a Service involving security features.
SQL on Anything Presto Query Engine.
Cost Based Query Optimisation.
Autoscaling, Monitoring workload, predictable performance.
Gaining insights through Apache Superset.

Real-Time Applications of Hive on AWS

Statistical functions on Hadoop ecosystem.
Structured and Semi-structured Data Processing.
As Data Warehouse tool with Hadoop.
Real-Time Data Ingestion with HBase.
Usage of ETL and Data Warehousing tool.
To provide SQL type environment and to query like SQL using HIVEQL.
To use and deploy custom specified map and reducer scripts for the specific client requirements.

Big Data Processing with Presto and Apache Hive

Table of Content

In this Article

Additional Resources

Building Query Platform with Presto and Apache Hive

Presto has following Features

Apache Hive Features

Common challenge for Big Data Processing

Solution for Infrastructure Automation

Building Real Time Applications

Real-Time Applications of Presto on AWS

Real-Time Applications of Hive on AWS

Download the Use Case

Related Articles

Cyber Security Analytics | Challenges and Solutions

AI Platform for Infrastructure Management in Enterprises

Manufacturing Data Analytics Platform | Powered by ML

Request for Services

Company

Cloud Native

Data Engineering

AI Engineering

Cloud Platform

Solutions

XS Discover

XS Optimise

XS Scale

XS Cloud Native

XS Adaptive AI

XS Decision Intelligence

Industry Transformation

Industry 5.0

AI-Driven Industries

Technology updates and resources

XS Journey

XS Scale

Enablers of Tomorrow

Big Data Processing with Presto and Apache Hive

Table of Content

In this Article

Additional Resources

Building Query Platform with Presto and Apache Hive

Presto has following Features

Apache Hive Features

Common challenge for Big Data Processing

Solution for Infrastructure Automation

Building Real Time Applications

Real-Time Applications of Presto on AWS

Real-Time Applications of Hive on AWS

Download the Use Case

Related Articles

Cyber Security Analytics | Challenges and Solutions

AI Platform for Infrastructure Management in Enterprises

Manufacturing Data Analytics Platform | Powered by ML

Request for Services

Enablers of
Tomorrow