Collect & Cleanse Customer Data

Data Quality & Enrichment

Collect & Cleanse Customer Data

There’s a common saying among marketing evangelists and consultants: “Data is the new oil.” The idea is that data has become a business asset, valuable in its own right, to be collected, stored and traded.

However, just like oil, data doesn’t do you any good if it’s buried out of your reach. You have to dig it up, refine and deploy it before you can realize the value within it.

Right now, many businesses are sitting on top of a wealth of data, but it’s buried treasure instead of useful fuel. Different departments collect customer data and store it in isolated systems. Customer service doesn’t share data with the marketing department. Marketing keeps its database separate from the sales team.

The challenge is to bring your marketing data sources (as well as other customer-related data sources) together to create a single source of truth, a virtual refinery where you can cleanse, enrich, and analyze your data. This is the process that turns crude data into jet fuel for your business.

In this section, we’ll talk about why high-quality data matters, what types of data you should be collecting, and how to clean and enrich that data to get it ready for analysis.


Why Data Quality Matters

It’s not hard to convince a marketer that customer data is important. But it’s possible to underestimate the importance of complete, clean, and trustworthy data. If you’re making decisions based on incomplete or faulty data, you may be worse off than just going with your gut.

Consider the typical customer journey. Our research found that 2/3rds of customers have at least three pre-purchase touchpoints before they purchase, while 33% reported six or more.

Now imagine if your data only showed two out of three touchpoints, or three out of six. Your attribution models would be faulty. Your personalization efforts would miss the mark. You would be missing out on opportunities to engage with potential customers and would risk losing them entirely.

For example: Say you suddenly had a boost in traffic to your website, and your Google analytics showed you were ranking for new keywords. You might invest more heavily in website development, doubling efforts in SEO and cranking out new content. But in actuality, the CEO had appeared on a hugely popular podcast and promoted the website. That influx of new traffic was driving the rankings, not the other way around.

Data-Driven Marketer's Guide to Data Enrichment

Data-Driven Marketer's Guide to Data Enrichment

Data Versus Goliath

Data Versus Goliath

Or perhaps you set an editorial calendar for the blog based on traffic reports you compiled at the end of last quarter — Q4 of 2019. How much of the strategic content planning you did will be relevant to customers right now, after what’s happened in the first half of 2020? In this case, while you had plenty of data on hand, it was too out of date to guide decision-making.

With these examples in mind, we can define four characteristics of a high-quality source of truth for customer data:

  • Completeness – It should bring together data from every department that deals with customers or potential customers
  • Timeliness – Data streams should be live and updated in real time, not copy-and-pasted manually
  • Relevance – Data should be fit for its intended use; ie, you don’t need your customer’s shoe size if you’re selling hats
  • Reliability – Data should be cleansed, verified and enriched to ensure accuracy and trustworthiness.


Types of Data to Integrate

Mapping the data landscape in your organization can feel like a daunting task. It’s important to focus on customer data that will be relevant to improving customer experience, driving personalization, and developing relationships with customers.

Here is our short list of must-have data sources:

  1. In-store and Online Sales Data. If your organization has brick-and-mortar locations, it’s essential to connect both online and offline sales data to your CDP. You’re certain to uncover opportunities to influence online behavior through offline promotions, and vice versa.
  2. Web Browsing Data. Dig deeper than traffic and time on page – your CDP data should include how traffic is referred to your site, how deep the average browser clicks into the site, and what their next steps are.
  3. Survey Data. First-hand customer data is invaluable. If you’re tracking Net Promoter Score, product satisfaction, or customer experience data, make sure that goes into the CDP.
  4. Customer Service Data. Marketers are still on the hook for nurturing customers after they make a purchase. So it’s important to know how people are experiencing the brand through customer service.
  5. Sales Department Data. Sales has multiple data streams that are useful to marketers. There’s simple sales data — purchases made or deals closed. But there’s also potential prospect lists, which can guide content creation, and missed opportunities data you can use to refine messaging.
  6. Advertising Platforms. Connect data from Google Ads, AdRoll, and other accounts to your CDP. Combined with your other data streams, you can get a more accurate picture of which ads are inspiring what next steps.
  7. Marketing Automation Platforms. Your automation platform is a treasure trove of data already, but it gains far more value when placed in context with the other streams on this list. Connecting HubSpot, Marketo, Salesforce, etc. to your CDP makes both sources more valuable.
  8. Loyalty Data. If you have a loyalty program, it’s generating plenty of data on what customers are buying, when and where. Combine this data with the others on this list, and you can zero in on your most valuable customers.
  9. Legacy Data. It’s likely you have historical customer data in offline formats, whether it’s on hard drives or in filing cabinets. This old data can be useful in forming a picture of the evolving customer journey. For example, one Treasure Data client used 80 years of collected data to drive exceptional results.
  10. Wearables and IoT Data. These sources are still in their infancy, but marketers should be keeping an eye on their potential. Smart watches, connected appliances, and home automation devices are generating a wealth of data that can be useful for marketers.


Customer Data Cleaning

Connecting your first party data sources to the CDP is the first step. The next step is to make sure the data is clean, complete and trustworthy.

The process of customer data cleaning helps identify data that is incomplete, corrupt, or redundant. Since you’re combining data from multiple sources, it’s important to standardize all of it before you start analysis. Follow these steps:

  1. Validate and Sanitize. Use your CDP to detect missing, false or irrelevant data — for example, form fills from “Mickey Mouse,” or people who work at “/./aqh;laweb” or email addresses like “”
  2. Merge Duplicates. In order to create customer profiles (in the next phase), it’s crucial to get rid of duplicate data. You don’t want one entry for J. Smith who lives in NE Melody Dr, and one for John Smith who lives on Melody Dr. NE. Your CDP can do much of this work automatically and continuously as new data is added.
  3. Standardize Formatting. Does your zip code field take five digits or nine? Is gender relevant to your offering, and if so, what type of input will you accept? This process includes discussing standards with the sources of your data streams, as well as standardizing how data is displayed in the CDP.
  4. Purge Out-of-Date Information. Email addresses, job titles, and addresses all change over time. If you have records that are over 6 months or a year old, it’s worth updating the data, either through direct interaction and verification, or by enriching with more current third-party data.


How to Enrich Customer Data

Data enrichment is the process of adding second-party and third-party data to your customer data, then combining it with your first-party data.

  1. First-party data is proprietary data that your brand has collected directly from customers.
  2. Second-party data refers to another company’s first-party data, bought directly or through a data marketplace.
  3. Third-party data is data aggregated by companies that have no direct relationship to the consumer.

Each of these types of data is a necessary part of a complete customer profile.

Here’s an example of how data enrichment expands your customer view:

Data enrichment expands your customer view

The right CDP can handle different types of data feeds from hundreds of second and third-party suppliers. Most importantly, your CDP should be able to automatically associate your enrichment data with your existing first-party data, combining them into actionable customer profiles.

For more training, see our list of CDP courses.

Collect & Cleanse Data