Bad Data Causing Big Problems
https://unsplash.com/@mbaumi

Bad Data Causing Big Problems

Some of the strongest headwinds facing companies today have nothing to do with the economy. They’re due to bad data.


Low-quality data – and what I call “data decay”?– is draining resources and opportunities from large and small businesses alike. Data is such an integral part of business that it’s hard to put an exact price tag on just how much lost revenue this adds up to, but IBM says it’s $3.1 trillion in the U.S., an average of $15M or more?to individual organizations. And those numbers are from 2016 and 2017; the data ecosystem has been booming since then, with exponential growth year over year. These days, whatever a company’s day-to-day business may be, its value is often determined by its data. For example, Fanatics has grown into a global digital sports platform with a $31 billion valuation, partly due to its massive database of over 100 million sports fans.


I came up with the term “data decay” because that’s exactly what data does over time: it loses accuracy and, by extension, value. Think of data like a new car: the moment you drive it off the lot, its depreciation begins. Even the most robust datasets can become meaningless, leading to inaccurate risk assessments and suboptimal decision-making. Reality is dynamic; people and organizations are in constant states of flux. All too often, though, data is static.


Let’s say your organization has paid tens of thousands of dollars for an email list for newsletters or outbound marketing. At one point in time every email address and contact name on that list may have been accurate. But people switch jobs and companies, emails change and peoples’ behaviors and preferences evolve. Without real-time updates, your expensive data loses accuracy – and value – almost as soon as it’s collected. You won’t get actionable contact information for all the names you paid for, and you’ll miss opportunities that accurate data could have provided.


This is not just a backend problem that develops after the data has been collected: frontend errors like inaccurate information and data-entry mistakes compound data decay. Low-quality data is a big problem, and it’s only going to get bigger. It’s a not-so-secret – but not often discussed - issue that plagues both blue chips and bootstrappers.


I know, because it happened to me. And this is why I want to help turn the tide of data decay.


I started realizing the scope of the problem a couple of years ago, when I found that the data I was relying on for sales funnels – purchased from one of the big providers - was only 5-10% accurate. In our first email campaign using the data, so many of the emails were being caught by spam filters that the provider’s own tools broke down: it couldn’t handle all the bad addresses.


I decided to pull the plug on my contract with the provider. I’d only been using the data for about a week, figured I could absorb the prorated cost, and wasn’t about to throw good money after bad.


Within days a debt collector called. “You signed a two-year contract,” they told me. “No cancellations.” Strictly speaking, this wasn’t entirely true: the contract had been signed by a former employee. But that aside, the speed and efficiency with which the collections process began seemed, to me, to be evidence that this was just part of being a big data provider, and enforcing non-cancellable contracts a necessity. Apparently I wasn’t the only one who felt ROI was lacking.


The data provider came after me for more than I had in the bank at the time. Our attorneys went back and forth for about six months before one of the provider’s account managers reached out and tried to negotiate a deal, a step that – to me – should have come long before a call from the debt collector.


In the end, I had to pay almost 20% of the contract for my week’s use of data that didn’t work. And when I started looking around I discovered that I was far from alone. The data market is dominated by a few big players, onerous contracts, and decayed data.


That’s my story, but you don’t have to look far to find other signs that low-quality data is creating big problems.


Take JPMorgan Chase’s ill-fated $175 million acquisition of Frank, the startup that called itself “an Amazon for higher education.” Frank’s value proposition was that it could simplify the student-loan application process for American college students – and it had a database of four million users.


According the lawsuits and criminal charges filed by JPMorgan, the SEC and the Justice Department, Frank’s database was mostly made-up, with just a few hundred thousand genuine contacts. The problem came to light only after JPMorgan got a 70% bounce-back rate to an email sent to some of Frank’s alleged users.


Jamie Dimon called the deal a “huge mistake,” and now the world’s largest bank is reportedly looking to recoup some of its losses through insurance.


Part of the reason for the data decay crisis is inertia: the data market is a volume business dominated by a few big names, so it’s not easy to disrupt. And at companies that license this data, it’s hard to admit that an investment of tens of thousands of dollars isn’t delivering the ROI it should. I can’t tell you how frustrating it is to put months of work into a project, only to have it fail because the third-party data is mostly inaccurate.


What about AI? The explosive growth of AI tools – such as GPT models – presents both opportunities and potential obstacles. These models rely on vast amounts of data to train themselves, and if they’re using decayed or inaccurate data their effectiveness will be limited. ?At worst, they’ll produce false or fictional results.


So what can be done about data decay, and low-quality data in general?


In future posts I’m going to examine case studies in detail, explore the role big data plays in modern business, and dig down to the roots of data decay.?It’s not an intractable problem, and I don’t think anyone needs to dump their big-data provider. But to solve this issue we need to shine a light on it, understand how it happens and peel back the curtain to find solutions.

Resources:

Linked within article

Zander Geronimos

Global Head of Business Development at PRODA

1 年

This is fantastic and very insightful Bobby

This is very relatable. Great article, Bob.

Eddie Morris

Client Consultant - Private Capital Solutions at MSCI Inc.

1 年

Nice Bob! The "data decay" problem affects every sector and majority of stakeholders in a business, albeit in different ways (sales, ops, supply chain, strategy, can keep naming...). Curious to follow these case studies on it.

Paige Hill

Marketer | Growth & Go-to-market Leader | SaaS & Fintech

1 年

Bobby Hill - “It couldn’t handle all the bad addresses.” … spot on! And everyone has data issues and gaps causing these ineffiencies and burning resources (time and money) that lead to bounce backs, wrong numbers, and organizations making “data-driven” decisions with unreliable data. Time to change the tide!

Dr. Paul Melchiorre III

The Offices of Dr. Paul Melchiorre

1 年

Yesterdays data isn’t todays data! Love it have never heard the term data decay before. My partner Olivia F. certainly knows about bad data sets and how badly they can limit SDRs all the way up the food chain.

要查看或添加评论,请登录

Bobby Hill的更多文章

  • Applying AI to Data

    Applying AI to Data

    Everyone’s definition of data is in constant motion from day to day and hour to hour. What’s essential is relative and…

    1 条评论
  • The Data Hierarchy

    The Data Hierarchy

    I’ve written about the value of data in today’s marketplace and some of its challenges. Similar to everything else…

    1 条评论
  • AI Case Studies

    AI Case Studies

    I’ve been discussing data and AI basics for several weeks. I think it’s essential that everyone have a basic…

  • A Brief History of AI

    A Brief History of AI

    For many people, AI magically became ubiquitous overnight. That’s not true, of course, but its adoption by (or…

  • The A.I. Environment

    The A.I. Environment

    Since the term “A.I.

    1 条评论
  • Defining A.I.

    Defining A.I.

    In my last few posts, I've discussed data: what it is, why it's valuable, and how the current paradigm makes getting…

  • First-Party Data’s Acquisition Problem

    First-Party Data’s Acquisition Problem

    In my last post I talked about the challenges of acquiring quality data, particularly first-party data: that which a…

  • Good Data: A Moving Target

    Good Data: A Moving Target

    It is crucial to understand that data comes in many varieties, and I want to emphasize the importance of this idea. As…

    1 条评论
  • The Value of Data

    The Value of Data

    In my last post, I introduced the notion of “data decay”: how data degrades almost from its creation. Data decay costs…

社区洞察

其他会员也浏览了