登录查看更多内容

10 common ways your revops data enrichment might be failing

Greg Meyer

Data Operations @ Mangomint ? Writing ?? at finddataops.com

发布日期: 2024年6月6日

Picture this: you have a million contact records to fix and need to find a title match based on email and determine the seniority of the contact. Or perhaps you’re focused on accounts and have tens or hundreds of thousands of companies to enrich to get to a Minimum Viable Record ready to be used in your data environment. In either case, you need to use a system of data enrichment to take the first-party data that you know and append second-party or third-party data to align this data with your required needs.

Data enrichment is one of the key tools at your disposal, enabling you to use an existing dataset to add data columns to your records based on a known key. For contacts, this is likely to be an email address or a Linkedin URL; for accounts, it might be some combination of the company name, website, and country. If you use an external source like Zoominfo or Clearbit, you can use their existing ID to re-enrich with new information collected by those teams.

This all works great when there are no problems with data. But since that’s a silly proposition (there are always some problems with data), let’s talk about 10 of the most common data enrichment problems you’re likely to have and some approaches to resolve or remediate them.

Problem 1: Ambiguous Matches for Common Names

When you get an exact match for a person or a company, great! But what happens when you have two records that seem pretty similar, like a contact record with the same first name, last name, and company but a different email address? Or you might find a company record for the same company in a different geographic region, and need to determine if it’s the same account.

For people, companies, or other objects, it will help you to determine a unique compound key for that record. For example, email + first + last name + country might be enough for a person, while website + name + country might be enough for an account. If you do this you’ll need to handle common changes, like when someone changes their email and you need to decide whether to create a new contact or update an existing contact. Either way, you’ll need a standard way of handling identity resolution.

Problem 2: Outdated External Sources

Many enrichment projects start like this: “let’s update all the records, and kick off the project", with the knowledge that new records will be enriched on a schedule. Existing records might get less attention unless the enrichment date is captured and a re-enrichment date is automated. Great, you say - let’s make sure every record is enriched every N months.

But if you think about the real value of enrichment and getting the best data possible in the place where it will do the most good, simply placing a date field on the record and having an automated cycle to re-enrich might not be the best choice. You might want to prioritize recent enrichment for accounts that have recent product activity or contacts that are hand-raisers. One way to do this is to combine the qualification process from SDRs with light data improvement. (Even clarifying the contact’s title goes a long way to making them feel like you know what’s going on.)

Problem 3: Conflicting Information

Even when you use only one enrichment source (most organizations use more than one) you will have the challenge of receiving conflicting information. When you get new information, what wins? When you receive the enrichment, if you store the source and time the value was received, you’ll be able to use that history to help you out.

Which source wins in a conflict of information? Usually, the closest to the original. When a person enters their new phone number into your system, it’s probably the best phone number. If you haven’t talked to someone before and you get a number attributed to them, you might not trust that number until you make a call and validate that they answered. There’s an important edge case here - when you are dealing with company names, you will often find many aliases (e.g. for JP Morgan Chase, you might hear JPMorgan, JPMC, and others) so you may need a separate canonical field for data like “Legal Company Name” in addition to a friendly company name.

Problem 4: Loss of Data Granularity

Enriching data also includes matching or appending product activity to accounts and contacts. When you do this you need to consider the time grain of the information you’re adding and confirm you’re not asking that data to work too hard by interpolation. Here’s what I mean - when you have weekly or monthly data, trying to get that data into a daily metric is not too accurate. You could push yourself into considering an average daily amount but it’s more effective to know the time frame of the metric.

Help yourself out here by labeling your metrics clearly. If you’re counting activity in the last 7 days, label it accordingly. If you’re capturing a weekly, monthly, yearly, or cumulative rollup, use a convention and name it so that the next person knows what they’re looking at. Scaling metrics up (monthly to yearly) is fine; interpreting a smaller time grain from a larger one is fraught with peril.

Problem 5: Personally Identifiable Information

If you have contact data, you have PII. Do yourself a favor when you enrich data using this information: use as little of it as possible. If you need to take the data and analyze it outside of a system, use an identifying key rather than the raw data and join it back in using a query when finished.

This also applies to identity resolution - you’ll do everyone in your org a favor if you create a standard way to link/merge or separate people who are the same but have personal and business emails. There are good reasons to combine or separate these records, but you should know how to proceed consistently.

领英推荐

Challenges Data Teams Face in 2024

Ayoob Ibrahim 7 个月前

August 2024 (Part 4)

Cher Fox (The Datanista), CDMP 2 个月前

February 2024 (Part 4)

Cher Fox (The Datanista), CDMP 9 个月前

Problem 6: An Incorrect Enrichment Match

Another common problem is an incorrect enrichment match on data that appears to be correct. In this case, Zoominfo erroneously combined my “Greg Meyer” contact record with several other similar records to create a Frankenrecord of bad data. If you get a lot of new values all at once, you might not have the right person.

Using a combination of email and another key is one way to fight this problem, creating an index key of values that you can compare to the new record. Another solution is to keep a flag on the record that is selected when a large percentage of data is changed in a record, then use a queue to review and remediate records that fall into that condition. If this is too noisy, lower the threshold of the automation that creates the flag.

Problem 7: The Cost of Enriching Frequently

Enrichment services aren’t cheap. They usually bill you by credits and the number of records that you enrich in a monthly time period. And you want your data to be as fresh as possible. How do you balance the need to update data and enrich new records with the cost of frequent enrichment?

After your initial enrichment task, setting a cadence for how often you want entities to be refreshed helps with this process, e.g. having a goal that every active contact might get updated every 180 or 365 days. Adding a secondary enrichment to contacts or accounts to know if events occur (title change, company change, email change) will also help you to determine the optimal cadence to get new enrichment data.

Problem 8: Lack of a standard data model

If you don’t know what a “good record” looks like, enrichment is not going to prove particularly valuable. Use a standard like the Minimum Viable Record to outline the most important fields in an object that need to be populated, and to understand when cross-field data integrity (think city, state, zip as an example for US Addresses) is breaking.

Creating a model for each common entity will also help team members identify obvious bad data, from picklist values that don’t match, to data formatting errors that point to a bad enrichment result or other data problems.

Problem 9: Matching non-equivalent data

This is a sneaky question: when is a contact title of “Vice-President” not the same as another title “VP” or “Vice-President”? Answer: when you are considering two companies with different title structures, like Amazon.com vs. a bank. Understanding equivalencies in data like this requires you to do other things, like measuring an adjacent data value like company industry.

This is one area where AI might be able to help quite a bit by deriving the level and responsibility related to a title when combined with the data on the company where that person works. Imagine a search for “director +” that actually gives you the right segment of contacts you’re seeking in an outbound campaign or in a segmentation exercise of internal contacts.

Problem 10: Not using external standards

By this point in the post, you’ve probably recognized that most of the common problems listed above could be improved with external standards. The meta-problem? There are no established external standards for “good” RevOps data. Savvy operators will use proxy data (e.g. Linkedin’s industry ID) as a way to find relevant sets of clean data. But we’re missing something like an open-source standard that –?by entity type –?suggests standard ways to gather and clean data based on type and composition.

What would such a solution look like? It’s unlikely to be an across-the-board solution and more like a cookbook of recipes helping you to “clean contact data” or “build account model data for companies with headquarters in a single country” or similar. The point here is that as a data quality professional, it’s incumbent upon you to have an opinionated view of your data and build systems and tests to deliver more of that data.

What’s the takeaway??We made a list of 10 common data quality problems here - there’s probably a long tail list of 1000 or more. The right fix for your org involves understanding what “good” looks like, building enrichment to reinforce that, and setting up monitoring and tests to ensure you catch records that don’t look right.

Data Operations

1,030 位关注者

Clay

5 个月

This was a great read -- subscribed! Excited for more!

1 次回应

Everett Berry

GTM Engineering at Clay

5 个月

Excellent article. I'm biased but I believe the Clay model of allowing you to tweak every step of the enrichment pipeline to solve for these is the way of the future.

4 次回应

查看更多评论

要查看或添加评论，请登录

查看全部

10 common ways your revops data enrichment might be failing

Greg Meyer

Data Operations @ Mangomint ? Writing ?? at finddataops.com

Problem 1: Ambiguous Matches for Common Names

Problem 2: Outdated External Sources

Problem 3: Conflicting Information

Problem 4: Loss of Data Granularity

Problem 5: Personally Identifiable Information

领英推荐

Problem 6: An Incorrect Enrichment Match

Problem 7: The Cost of Enriching Frequently

Problem 8: Lack of a standard data model

Problem 9: Matching non-equivalent data

Problem 10: Not using external standards

Data Operations

1,030 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

Your company needs a Data strategy, or it will look at Exit strategies; How and why you should use data.

Data Staleness No More: Connetra's Real-time Monitoring Capabilities

Gut decisions, data obscurity

Four data quality mandates every Chief Data Officer must enforce

Data Quality is a Money Pit

Poor Data Hygiene: How to Clean Up Your Lead Database Efficiently

Are you a needy data consultant?

The Politics of Data

Dirty Data? Clean it Up!

What do you include in Data Quality Issue Log?

Problem 1: Ambiguous Matches for Common Names

Problem 2: Outdated External Sources

Problem 3: Conflicting Information

Problem 4: Loss of Data Granularity

Problem 5: Personally Identifiable Information

领英推荐

Problem 6: An Incorrect Enrichment Match

Problem 7: The Cost of Enriching Frequently

Problem 8: Lack of a standard data model

Problem 9: Matching non-equivalent data

Problem 10: Not using external standards

Data Operations

1,030 位关注者

Redefining the Customer Journey

2024年11月7日

Going from 0-1 in Data Operations

2024年10月15日

An ode to console.log()

2024年9月17日

Great performance demands mental preparation

2024年8月22日

Data Operations, revisited

2024年8月1日

From Atoms to Bits: Building Software from Cow Paths

2024年7月16日

Am I typing to a person or a bot?

2024年6月27日

Creating a "Minimum Viable Record"

2024年5月21日

What do you need to build a good no-code application

2024年5月7日

On building an Analytics Maturity Model

2024年4月18日

社区洞察

其他会员也浏览了

Your company needs a Data strategy, or it will look at Exit strategies; How and why you should use data.

Data Staleness No More: Connetra's Real-time Monitoring Capabilities

Gut decisions, data obscurity

Four data quality mandates every Chief Data Officer must enforce

Data Quality is a Money Pit

Poor Data Hygiene: How to Clean Up Your Lead Database Efficiently

Are you a needy data consultant?

The Politics of Data

Dirty Data? Clean it Up!

What do you include in Data Quality Issue Log?