Skip to Content Treasure Data Logo Treasure Data Logo
  • Platform
    • Overview
      • AI Marketing Cloud
      • Pricing
    • Featured
      • Marketing Super Agent
      • Treasure Data AI
      • Intelligent CDP
      • Modular CDP
      • Integrations
    • AI-Powered Solutions
      • Omnichannel Engagement
      • Real-time Personalization
      • Paid Media Targeting & Optimization
      • Creative Automation for Marketing
      • Support, Clienteling & B2B Interactions
  • Solutions
    • Industries
      • Automotive
      • CPG
      • Entertainment & Media
      • Financial Services
      • Healthcare
      • Retail
      • Technology
      • Travel & Hospitality
  • Customers
  • Resources
    • Explore
      • Resource Library
      • Case Studies
      • Blog
      • Documentation
      • Training
      • Events
      • Webinars
    • Get Started
      • Demo
      • AI Workshop
      • Fast Proof of Concept
      • RFP Template
      • Trade-Up Program
      • Value Calculator
  • Company
    • Company
      • About Us
      • Careers
      • Partners
      • News
      • Trust & Security
      • Contact Us
Login
Get a demo
  • Menu Item 1
    • Sub-menu Item 1
      • Another Item
    • Sub-menu Item 2
  • Menu Item 2
    • Yet Another Item
  • Menu Item 3
  • Menu Item 4
Blog
    • Customer Data Strategy
    • CDP
    • Partners
    • Treasure Data CDP
    • CDP Use Cases
    • AI & Machine Learning
    • CDP | Customer Data Strategy
    • CDP|Customer Data Strategy
    • Company News
    • Data Privacy & Security
    • AI & Machine Learning | Data Privacy & Security
    • AI & Machine Learning | CDP | Data Strategy
    • AI & Machine Learning | Marketing
    • AI & Machine Learning | Privacy & Security
    • AI and Machine Learning | CDP
    • CDP Use Cases|Marketing
    • CDP | CDP Use Cases
    • CDP | CDP Use Cases | Marketing
    • CDP | Customer Data Strategy | Treasure Data CDP
    • CDP | Marketing
    • CDP | Partners
    • CDP|CDP Use Cases|Treasure Data CDP
    • Customer Data Strategy | Treasure Data CDP
    • Customer Service
    • Marketing
    • Marketing | Treasure Data CDP

Get the latest in your inbox.

March 20, 2015

Presto versus Hive: What You Need to Know

Ron Zvagelsky Ron Zvagelsky
  • Customer Data Strategy
There is much discussion in the industry about analytic engines and, specifically, which engines best meet various analytic needs. This post looks at two popular engines, Hive and Presto, and assesses the best uses for each.

How Hive Works

Hive translates SQL queries into multiple stages of MapReduce and it is powerful enough to handle huge numbers of jobs (Although as Arun C Murthy pointed out, modern Hive runs on Tez whose computational model is similar to Spark's). MapReduce is fault-tolerant since it stores the intermediate results into disks and enables batch-style data processing. Many of our customers issue thousands of Hive queries to our service on a daily basis. A key advantage of Hive over newer SQL-on-Hadoop engines is robustness: Other engines like Cloudera’s Impala and Presto require careful optimizations when two large tables (100M rows and above) are joined. Hive can join tables with billions of rows with ease and should the jobs fail it retries automatically. Furthermore, Hive itself is becoming faster as a result of the Hortonworks Stinger initiative.

How Presto Works

In some instances simply processing SQL queries is not enough—it is necessary to process queries as quickly as possible so that data scientists and analysts can use Treasure Data for quickly gaining insights from their data collections. For these instances Treasure Data offers the Presto query engine. Presto is an in-memory distributed SQL query engine developed by Facebook that has been open-sourced since November 2013. Presto has been adopted at Treasure Data for its usability and performance.
Best of Hive Best of Presto
Large data aggregations Interactive queries (where you want to wait for the answer)
Large Fact-to-Fact joins Quickly exploring the data (e.g. what types of records are found in the table)
Large distincts (aka de-duplication jobs) Joins with a large Fact table and many smaller Dimension tables
Batch jobs that can be scheduled

How to Best Use Hive and Presto

Hive Presto
Optimized for Throughput Interactivity
SQL Standardized fidelity HiveQL (subset of common data warehousing SQL) Designed to comply with ANSI SQL
Window functions Yes Yes
Large JOINs Very good for large Fact-to-Fact joins Optimized for star schema joins (1 large Fact table and many smaller dimension tables)
Hive is optimized for query throughput, while Presto is optimized for latency. Presto has a limitation on the maximum amount of memory that each task in a query can store, so if a query requires a large amount of memory, the query simply fails. Such error handling logic (or a lack thereof) is acceptable for interactive queries; however, for daily/weekly reports that must run reliably, it is ill-suited. For such tasks, Hive is a better alternative. In terms of data-processing models, Hive is often described as a pull model, since its MapReduce stage pulls data from the preceding tasks. Presto follows the push model, which is a traditional implementation of DBMS, processing a SQL query using multiple stages running concurrently. An upstream stage receives data from its downstream stages, so the intermediate data can be passed directly without using disks. If the query consists of multiple stages, Presto can be 100 or more times faster than Hive.

Hive vs. Presto

Learn how Treasure Data customers can utilize the power of distributed query engines without any configuration or maintenance of complex cluster systems. Still curious about Presto? Join us for a webinar with other Presto contributor Teradata on The Magic of Presto: Petabyte Scale SQL Queries in Seconds Presto Webinar

Topics Covered

  • Customer Data Strategy

Recent Posts

AI & Machine Learning | Data Privacy & Security 3 min read
Responsible AI Is Not Just for Subject Matter Experts—It’s Everyone’s Job
3 min read
From Question to Hypothesis to Action: Meet The Deep Insights Agent
Treasure Data Logo Symbol

+1 866.899.5386 (US)
+1 650.772.4500 (Non-US)

  • Platform
    • Overview
      • Platform Overview
      • Pricing
    • Featured
      • Marketing Super Agent
      • Treasure Data AI
      • Agent Hub
      • Intelligent CDP
      • Modular CDP
      • Integrations
      • Trust for Data & AI
      • Responsible AI
      • UX Research
    • AI-Powered Solutions
      • Omnichannel Engagement
      • Real-time Personalization
      • Creative Automation for Marketing
      • Paid Media Targeting & Optimization
      • Support, Clienteling & B2B Interactions
  • Solutions
    • Industries
      • Automotive
      • CPG
      • Entertainment & Media
      • Financial Services
      • Healthcare
      • Retail
      • Technology
      • Travel & Hospitality
  • Resources
    • Explore
      • Resource Library
      • Case Studies
      • Documentation
      • Blog
      • Training
      • Events
      • Webinars
    • Get Started
      • Demo
      • AI Workshop
      • Fast Proof of Concept
      • RFP Template
      • Trade-Up Program
      • Value Calculator
  • Company
    • Company
      • About Us
      • Careers
      • News
      • Partners
      • Trust & Security
      • Contact Us
      • Customers
  • Get a demo
  • Privacy Statement
  • Cookie Policy
  • Privacy Hub
  • Trademarks
  • Modern Slavery Statement
  • Your Privacy Choices
©2026 Treasure Data, Inc. (or its affiliates) All rights reserved.