Glossary of Data Terms

All of the basic terms you need to know when you’re working with data.


A/B Testing
Analysis in which randomized target customers receive variations of an item, e.g. an application interface element or web page, to measure the effect on a desired outcome, such as conversions. Also called Split Testing or Multivariate Testing.
An automated series of steps that executes a mechanizable function, e.g. a mathematical function in a computer program
Anomaly Detection
Analysis in which the analyst is alerted when events occur that fall outside of an established norm. Examples include Fraud Detection and Bot Detection.
Removal of information about people from a database that could be used to identify individuals.
Application Programming Interface (API)
A set of commands provided for a software application that allows programmers to interact with it.
Artificial Intelligence (AI)
Software that carries out complex tasks that humans would normally be required to perform, such as pattern recognition and decision-making.
Analysis that seeks to discover what prior event caused the event being analyzed.
See Mean.


Big Data
Data that, because of volume or complexity, is beyond the processing capacity of ordinary analytics tools.
Data on the body of a user collected by digital tools designed for the purposes of measuring health or athletic performance.
A computer program that operates autonomously, to carry out tasks for a user and/or to mimic the behavior of a person.
Bulk Data
Data that is uploaded to storage all at once, e.g. historical data. In contrast with Streaming Data.
Business Operations
Management of tangible and intangible business assets with the intention of deriving more value from them.


Clickstream Analytics
Analysis of user behavior based on their clicks on web pages.
Computer services that are remotely hosted and accessed through the internet.
Clustering Analysis
The identification of groups of events linked by close proximity to each other, e.g. for the purpose of anomaly detection.
Click-Through Rate (CTR)
The ratio of users who click on an item to the number of users who view that item.
A database management system that stores data in columns rather than rows so as to speed query performance.
An API that enables a user to receive and/or send data to or from different computer programs.
Content Management System (CMS)
A computer program that provides a simplified interface for creating and editing pages and/or records on a website.
A user action defined as a desired outcome to be measured, e.g. a purchase of a product or a signup for a membership.
Cost Per Click (CPC)
Advertising payment method that charges a certain amount every time a user clicks on an advertisement.
Cost Per Thousand (CPM)
Advertising payment method that charges a certain amount for every thousand views of an advertisement.
Cross-Channel Analytics
Analysis of behavioral data from multiple channels (e.g. online, mobile, in-store) to provide a more complete picture of customer preferences for purposes of targeting and promotion.
Customer Data Platform (CDP)
An analytics platform that centralizes First Party Data and enriches it with Second and/or Third Party Data for a more accurate and actionable view of customer behavior.


A collection of information organized to allow efficient access, management, and retrieval.
Data Management Platform
An analytics platform that centralizes First Party Data for better tracking of advertising campaign performance.
Data Model
A conceptual or logical diagram of how data will need to flow in a computer program to fulfill its requirements.
Data Silo
A data source that is difficult to connect with other data because of dependency on Engineering resources or other constraints.
Data Warehouse
A central repository of business data for an organization.


Federated Database
A system in which multiple databases are linked together and can be interacted with as if they were a single database.
First Party Data
Information collected by a company about its own customers, from sources such as web and mobile usage tracking, CRMs and Business Analytics tools.
Funnel Analysis
Analysis of customer behavior in stages, typically starting with Awareness and ending with purchase or signup (Acquisition).


Geolocation Data
Device sensor data that tracks the physical location of a user.
Management of the quality, accessibility and security of data within an enterprise.
Growth Hacking
The use of rapid experimentation across marketing and product channels to drive growth.
Growth Marketing
See Growth Hacking.


The process of receiving data from a source and converting it into a format that can be accessed in a Data Warehouse.
Internet of Things (IoT)
The web of communication between devices equipped with internet connectivity (Smart Devices).


Javascript Object Notation (JSON)
A common data format consisting of strings of Key Value Pairs.


Key Value Pair
A common data format in which the data, or “Values”, are identified with text labels called “Keys”. Used for some NoSQL databases.
A word used as Metadata to identify a web page or other content in a search.


Live Data
Data that is Connected with other relevant data, Current, and Easily Accessible to the people and processes in an organization that need to use it.
Inability to easily move data from one platform to another due to limitations imposed by a data storage provider, Business Application, or CRM.
A text string that contains information about the state of a computer program or an event. Computer Data is composed of Logs.
Lookalike Modeling
Analytic process for targeting online advertising to website visitors with similar interests as a company’s existing customers.


Machine Learning
A category of Artificial Intelligence (AI) software in which the behavior of a program changes and improves based on exposure to new data.
A type of algorithm that breaks a data set apart so it can be processed on separate systems (Map) and then combines the data returned by those processes to create a report.
The sum of all the numbers in a set divided by the amount of numbers in the set. Also called the Average value of a set.
The middle point of a number set, for which half the numbers in the set will be above and half below.
Data that gives information about other data. On a web page, the Keywords for the page are metadata.
The process of moving a company’s existing data from one platform to another.
See Data Model.
A popular open source NoSQL database.
Multi-Touch Attribution
A method of scoring customer touchpoints that tells you the likelihood that any given touchpoint, in any channel, contributed to a sale.
Multivariate Testing
See A/B Testing.
A popular open source SQL database.


Normal Distribution
A bell-shaped graph that displays how items are ranked according to a randomly-distributed variable.
A general term that describes databases with structures and rules that differ from the row-and-column based SQL format.


Omnichannel Marketing
A marketing strategy that uses data from different sales channels to drive sales and optimize customer experience across other channels. An example is using data about a customer’s in-store shopping behavior to provide targeted email offer coupons.


Ability to move data, especially large amounts of different data formats, from one platform to another.
Predictive Analytics
A marketing function that uses machine learning to extrapolate customer traits to predict future purchasing behavior.
A SQL-based query language that allows very fast queries of large and distributed databases.


Recommendation Engine
Software that uses data derived from customers’ purchase habits to determine products to recommend, as on an e-commerce site.
Relational Database
A database organized in tables of rows and columns.
Return On Advertising Spend (ROAS)
Advertising spend metric, expressed as a percentage of the revenue earned compared to the dollars spent on a campaign. Formula: Revenue / Cost x 100. If you spent $200 and made $300 on a campaign, your ROAS would be 300/200 * 100 = 150{5b4fe36bddf3b02e316fbf4108886dcb7ff194fea4a9c7a3e1d13675f87abf4b}.
Return On Investment (ROI)
Spend metric, expressed as a percentage of profit on an investment. Formula: Revenue – Cost / Cost. If If you spent $200 and made $300 on a campaign, your ROI would be 100/200 * 100 = 50{5b4fe36bddf3b02e316fbf4108886dcb7ff194fea4a9c7a3e1d13675f87abf4b}.
Online advertising in which ads are shown to users based on their browsing history.


Ability of a platform to automatically scale to meet user needs. If a platform stops working or performance degrades once usage goes beyond a certain threshold, it is not scalable.
In computer programming, the structure of a database.
A type of data processing that allows data with different schema, including Schemaless data, to be inserted into the same database, aiding in connectivity.
Data that is not structured in rows and columns, e.g. NoSQL databases.
Division of a group of customers into subgroups based on shared characteristics in order to target them more precisely in marketing campaigns.
Second Party Data
Data on customers obtained from partner companies.
Software as a Service (SaaS)
Business software that is hosted in the cloud, rather than being installed locally on a user’s computer.
Split Testing
See A/B Testing.
Standardized Query Language, a programming language that can be used to access data in a relational database. Also refers to the specific type of relational database system that can be queried using that language.
Standard Deviation
A measure of the amount of variation in a group.
Statistical Significance
A strong enough correlation between two or more variables to be confident the correlation is not due to chance.
Streaming Data
Data that consists of logs generated dynamically by software activity.  In contrast with Bulk Data.


Third Party Data
Aggregated Customer Data from an external vendor used to enrich a company’s First Party Data for better Segmentation in a Customer Data Platform.
Treasure ML
A library for implementing Machine Learning in Treasure Data.
Treasure Workflow
A Treasure Data feature that allows complex workflows to be constructed, scheduled and reused.