Treasure Data CDP Resources

  • Filter by Resource Type
  • Analyst Reports
  • Articles
  • Blog
  • Case Studies
  • Cheatsheets
  • Events
  • Reports
  • Webinars
  • Filter by Industry
  • Automotive
  • CPG
  • Entertainment & Media
  • Financial Services
  • Healthcare
  • Retail
  • Technology
  • Travel & Hospitality
  • Filter by Topic
  • AI & Machine Learning
  • CDP
  • CDP Use Cases
  • Company News
  • Customer Data Strategy
  • Customer Service
  • Data Privacy & Security
  • Marketing
  • Partners
  • Treasure Data CDP

Four Reasons Presto is the Best SQL-on-Hadoop (That You Haven’t Heard Of)

Presto is an in-memory distributed SQL query engine developed by Facebook that has been open-sourced since November 2013. Presto has a number of key advantages over other SQL-on-Hadoop engines, yet these benefits are not widely recognized or understood. Reason #1: Presto is Plenty Fast Unlike MapReduce, which was designed for very high throughput at the ... Four Reasons Presto is the Best SQL-on-Hadoop (That You Haven’t Heard Of)

Eliminating Schema Rot in MPP Databases Like Redshift

The MPP database is an incredible piece of technology. These databases run large-scale analytic queries very quickly, making them great tools for iterative data exploration. With a cloud offering like Redshift in the market, MPP databases are enjoying increasing adoption today outside of enterprise IT. However, like any other great technology, they excel in some ... Eliminating Schema Rot in MPP Databases Like Redshift

Managing the Data Pipeline with Git + Luigi

One of the common pains of managing data, especially for larger companies, is that a lot of data gets dirty (which you may or may not even notice!) and becomes scattered around everywhere. Many ad hoc scripts are running in different places, these scripts silently generate dirty data. Further, if and when a script results ... Managing the Data Pipeline with Git + Luigi

Learn SQL by Calculating Customer Lifetime Value Part 2: GROUP BY and JOIN

This is the second installment of our SQL tutorial blog series. In the first part, we set up the data source with SQLite and learned how to filter and sort data. This time, we will learn two other key concepts in SQL: GROUP BY and JOIN. Get the FREE e-book based on this blog series! ... Learn SQL by Calculating Customer Lifetime Value Part 2: GROUP BY and JOIN

12 Open Source Software Innovations from Treasure Data Engineers

TD is proud to have some of the best technical minds in the world working on our unique managed service. When they’re not working on the TD Service or supporting our customers, many of our engineers continue to support technological innovation by...

Amazon Recommends Fluentd as “Best Practice for Data Collection” over Flume and Scribe

This month, Parviz Deyham from Amazon Web Service promoted as the best data collection tool for Amazon Elastic MapReduce (EMR), a hosted Hadoop framework running on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3)...

Treasure Data’s Plazma: Columnar Cloud Storage

TD has been developed by Hadoop experts. We get Hadoop, and, in many ways, it’s part of our core. As we have built out the platform, we noticed that the storage layer needs to be multi-tenant, elastic, and easy to manage while keeping the scalability...

Fluentd + Hadoop: Instant Big Data Collection

Many companies choose Hadoop Distributed Filesystem (HDFS) for big data storage. Until recently, however, the only API interface was Java. This changed with the new WebHDFS interface, which allows users to interact with HDFS via...

Understanding the Book-Crossing Dataset: Setup

I'm a data scientist at TD. In a series of blog entries, I want to introduce how to use our platform by interacting with a concrete dataset. I chose the publicly available Book-Crossing Dataset as our base data...