"Garbage In, Garbage Out" Is a Lame Excuse
If we start with bad inputs, we’ll end up with bad outputs. Consider these scenarios:
You get the point.
When people use the phrase "garbage in, garbage out" in the context of data, it often sparks a negative, even fatalistic reaction. It’s treated as the final word on the matter - a full stop in the discussion. It’s a shrug, or a resignation. C’est la vie. Shhit happens.
But that’s a lame excuse. Don’t fall into that trap. I want to challenge you to think differently. “Garbage in, garbage out” shouldn’t be the punchline at the end of the conversation - it should signal the beginning.
The Origins
The first published uses of the phrase “garbage in, garbage out” were observed in the 1950s. It entered the Oxford English Dictionary as an acronym, GIGO, in 1993. It has its roots in early computing, when developers realized that flawed inputs led to flawed outputs, regardless of the sophistication of the computations. As computers were used to solve complex mathematical problems, the importance of accurate data became evident - leading to GIGO being a cautionary tale for all those who work with data.?
We used the phrase in the past, we use it now, and it will keep its place in our vocabulary for years to come. As we move into an era of AI, GIGO is more relevant than ever. Business leaders are sensitive to the fact that data inconsistencies can cause major errors in AI-driven processes. As data teams’ roles are expanding to cover both data and AI, I suspect that the “garbage in, garbage out” refrain will become more frequent and acute. Buckle up.
Cake: An Analogy?
Imagine I bake you a chocolate cake. I start with a proven recipe that’s been passed down from my mother. But this time, my ingredients are less than ideal - my eggs are rotten, my flour is infested with beetles, and I mistakenly use carob instead of chocolate. Even if I frost it nice and serve it to you on a fancy plate, there’s no disguising the fact that the cake is ruined. You won’t enjoy it, and it may leave you wondering whether you want to eat my cooking again.
It’s the same with data. When a data team starts with flawed inputs like inaccurate data or incomplete information, the end product will fail to deliver, no matter how much effort goes into the analysis or presentation. As a result, business stakeholders lose trust in the insights provided, which in turn erodes the perceived value of the data team’s work. This lack of trust can snowball, leading to reduced influence within the organization, missed opportunities for strategic decision-making, and ultimately, a diminished role for data in driving business outcomes. Like a bad cake, even the most sophisticated models and dashboards can’t hide the flaws of poor data.
How it Comes Up
Do you hear “garbage in, garbage out” in your business? Pay attention to who’s saying it and what they really mean.
领英推荐
Within the data team: Imagine a data scientist saying, "Garbage in, garbage out." What they might really mean is, "I’m paid to do the fancy math, but if the data I’m given is flawed, it limits the quality of my work. This is frustrating, and it’s not something I can fix on my own. Someone else needs to address this." It’s a sign of exasperation, but it can also come across as a form of blame-shifting. I get it. Poor data quality disrupts your work, but a mindset of simply passing the buck won’t lead to meaningful improvement. Don’t play the victim.
Within business leadership: Now consider when a marketing VP receives an analysis that’s clearly off the mark and says, "Garbage in, garbage out." What they really mean is, "Data team, this isn’t working. How can I trust or value what you give me when half the time what you give me is crap? I need reliable insights to make informed decisions. Go fix the issues, and don’t let this happen again. But I won’t hold my breath." It’s a harsh rebuke, reflecting frustration with ongoing data quality issues that undermine the value of the data team’s work. Hey, marketing VP, we’re doing the best we can over here.
Among data producers: And then there are the people who create or manage the source of the “beetle-infested flour.” They are likely silent. They might not even know about the disturbance. If they do know, they might not fully grasp the downstream impact of the flaws. The disconnect between data producers and data consumers is a key part of the problem.
What Can We Do About It
Within the data team, there are several actions we can take to fight GIGO. We can conduct code reviews, run rigorous validation checks, manage data quality proactively, set up alerts on critical data sources, and increase bandwidth devoted to data cleaning and preprocessing. Some of these actions are easier than others, but most fall under the umbrella of operational excellence. The real challenge is transforming “best intentions” into ingrained, robust processes. We also need to consider solutions that go beyond the basics, such as rearchitecting systems to be more resilient. Maybe our cake should be gingerbread, not chocolate - or perhaps we don’t even need to serve dessert at all.
For business leadership, the first step is to give the data team the benefit of the doubt. Recognize that GIGO isn’t just a data team problem; it’s often symptomatic of a larger issue within the organization. The data team is part of the solution, but we’re not the sole owners. Help by bringing other leaders into the conversation, and give us funding so we can tackle the problem effectively.
As for data producers, it’s time to own up to it. Make the effort to understand why data quality matters and how it impacts the business. Proactively collaborate with your data consumers to address GIGO issues. It’s not just the right thing to do - it’s critical for the success of the business.
What’s the BEST we can do about it?
The best we can do is foster collaboration and accountability across all parties involved:
Next time you hear someone say, “Welp, garbage in, garbage out,” don’t let those words echo as a lame excuse. Instead, turn it into the beginning of a meaningful conversation about how to improve the situation.
Helping companies use customer behavior data and AI to drive business impact.
6 个月Another great article. Love the cake ingredient analogy.
Founder, SaaS Pimp and Automation Expert, Intercontinental Speaker. Not a Data Analyst, not a Web Analyst, not a Web Developer, not a Front-end Developer, not a Back-end Developer.
6 个月It's a thin line between data from a bad implementation and correct data that the stakeholders don't understand. El gran classico is stakeholders adding up daily unique visitors over a month period, and comparing the total with the monthly unique visitors for the same date range. Still today, I bet that a large number of stakeholders will dismiss the digital analytics tool the data comes from as junk. Even after explaining them, many of them are thinking out loud: "I'm gonna catch them at fault one day! I'M GONNA!"
Simplifying and Automating Marketing Tag QA
6 个月There are two options in this case: Be the change or find a place that cares If you can make a difference and have enough authority to, then go for it. If the org is resistant, move on. Staying in a resistant org is not an option.
VP, Data Science at Thumbtack
6 个月Agreed this is a very poor excuse! There is something structurally broken, though, about accountability. If you'll indulge an extension of the metaphor. Lets say you're tasked with baking a cake. You go to the store, and discover ALL the ingredients are spoiled. What do you do? To make matters worse, you expect that no one will really trust you that there are NO unspoiled eggs to be found nearby. Surely you can find enough to make something edible? You may say in the abstract that a data team is responsible for developing the quality of its upstream sources. And certainly at the leadership level I would agree with you. But I push back on the idea of "last touch accountability" where the data team gets held accountable for producing bad results in a context where sufficiently plentiful good data does not really exist. There needs to be a simple and clear statement from the data team up front. Decline to bake that cake if you don't have a supply chain that can deliver you fresh ingredients in time! Don't let yourself be held hostage by being the last person to touch the report before execs see it; it's not unreasonable to say "I'm not able to do X because the dependencies are not available."