How poor data quality destroys enterprise value
Source: Lightup.ai

How poor data quality destroys enterprise value


There is no way to really know how bad your data is.

Until of course, something breaks. Businesses lose money. Trust. Credibility.

Ask British Airways , which had once offered a $40 fare from the US to India. Or American Express which priced a Presidential suite in a hotel at $51. 艾美 had offered an Oceanfront Villa for $33. Best of all, Expedia Group listed rooms at the Hilton Tokyo for $3. Yes, three bucks.

Poor data quality leads to significant loss of value, penalties, gaffes, loss of trust and brand image erosion. And it’s not just such silly, obvious and expensive errors. As the ML overlords take over our lives, bad data feeds into the giant maws of these models, causing drifts, errors and flawed outcomes. A lot of bad data is hidden and we never know -its impact is like a silent killer, slowly draining away our resources. Many do not realize they may be suffering lost revenues, high operating costs, poor customer service, brand image and compliance costs.?

When a CEO cannot trust their own data, how can they feel comfortable presenting financial results? How can revenue projections and product decisions be made? How can FoxConn or Apple trust its projections on raw materials, supply chain and logistics? Can Walmart trust its shelf inventory data? Or claims / returns and ordering data? Can a CMO trust their Marketing Qualified Leads (MQLs)? How do enterprises address compliance reports with GDPR??

At every level, data quality plays a role yet this is an area we have very little insights into. P & L Leaders and LOB managers - watch out. The first step in the data driven and AI journey is to have a quality check mindset.? How do we define data quality? How do we measure it? And most importantly, how do we autonomously course-correct when bad data creeps in??

Poisoned food, poisoned data

Manu Bansal , Founder and CEO of Lightup.ai, a leading data quality company (in which I have invested, so yeah I am biased) today announced its Series A lead by Andreessen Horowitz . Manu educated me using a simple analogy from our real world. Dark Chocolate - our latest addiction made great gifts and is my perpetual indulgence. In a Consumer Reports publication study, 23 of the 28 leading brands of dark chocolate such as Ghirardelli, Lindt, and Hershey's were found to have lead or cadmium above the acceptable levels. We, the consumers, would have never known until a non-profit organization decided to do a quality check. And that is the same with bad data in enterprise. We never know what is poisoning us silently, and often it's a small leak that grows until it's too late.?

As we define the quality parameters of data, enterprises have to adopt efficient solutions to run data quality checks. The fire hose flow of data comes in multiple sources, formats and frequency. According an O'Reilly survey (N=1900 respondents), major data quality challenges faced by companies include:

  • Too many data sources
  • Disorganized data stores
  • Poor data controls at data entry

Data quality issues generally appear at three levels:

  1. Data source level: unreliability, trust, data copying, inconsistency, multi-sources, and data domain challenges.
  2. Generation level: human data entry, sensors’ readings, social media, unstructured data, and missing values.
  3. Process level: acquisition, collection, transmission

The business case for data quality platform

In this video, senior software engineers, Preetam Joshi and Vivek Kaushal share how they have tackled data quality at Netflix . If Netflix screwed up the basic data set on Kids / Adults classifications imagine the chaos. Instead of recommending Beat Bugs or Sesame Street, the recommendation engine served kids with Reservoir Dogs or No Country for Old Men - Netflix brand and stock might take a hit.? When a business like Netflix’s DNA depends on data and the quality thereof, you operate from a higher order data quality mindset.?

With the rise of the Chief Data Officer (CDO) role, data filters and purity checks are now becoming critical. Yet the state of the current data quality is so bad that at least 33% of CDOs do not trust their own data. In a CDO survey conducted by Jill (Greenberg) Chase of CapitalG , fifty of the leading Chief Data Officers (CDOs) shared their views on the evolution of the data stack and highlighted some critical challenges:

  • Productivity challenges: 50% of analyst time is spent in searching for the right data. A separate Forrester survey found that 40% of analysts spend 1/3rd of their time validating and fixing data quality issues. So a lot of time is just lost in hunting for data.
  • Siloed access: 47% of respondents say the biggest challenge is access to data. Politics? Tech stack? It could be all of the above.
  • Higher Costs: Poor data quality can increase the costs of storing, processing, and analyzing data. While storage is no longer a major cost issue, it impacts efficiency. Data lakes become data swamps.??

Bad models, bad behavior

If 2023 is the dawn of Generative AI, data quality is a critical foundational element. Bad model behavior, anyone? I anticipate each line of business managers taking the lead in asking for data quality checks. Just as SaaSification of the enterprise apps led us to domains of control on applications, the next obvious step is data quality by LOB. With high quality data comes higher confidence in our predictions, analytics, superior ML models and above all, trust of the customer.?

And the fastest way to lose trust is to have zero quality checks. Ask the chocolate vendors. With no quality checks, we thought we were giving chocolate. Now we know - we are poisoning our loved ones with lead and cadmium.??

Kazuki Notsu

Founder & GP @ Incubate Fund | B2B x Software

1 年

Wow, impressive article! Thank you very much Mahendra-san!

Jennifer Drayton

Content Marketing | Content Strategy | Content Marketing Writer/Editor

1 年

Congratulations, Manu Bansal! Remarkable progress since we talked in March.

Daniel H L.

Data Engineer ⊕ Technologist ? Legacy ETL/Data-to-Cloud Modernization

1 年

Quite a few levels down from the CDO, we engineers struggle with data quality. It cripples our ETL pipelines and ingestion flows, and skews the data going downstream. In my space, I see it in modernization efforts where data is moved to cloud DWH and linked to BI dashboards for the business, but the struggle is real behind the curtains to get the data quality so pipelines flow smoothly and numbers look and are correct. Great news. Congratulations to the team Lightup Data. Wishing you great success ??

要查看或添加评论,请登录

社区洞察

其他会员也浏览了