Behind Every Great Analysis Lies Great Data Wrangling
Data analytics sits behind good business decisions. Today more than ever. Businesses generate enormous amounts of data from their activities. They also purchase additional data from vendors to enrich their own data. Companies collate and analyse it and produce insights that then drive higher sales, better efficiencies, and lower costs. Businesses that can effectively turn raw data into insights generate substantial benefits to their operations. So, it’s no surprise that most companies increasingly dedicate significant time and resources to data analytics.
Yet, what most of them end up spending the time and money on is not actual analysis and insights. Unfortunately. What they expense the vast majority of their resources doing is data wrangling, in a predominantly manual way.?
Data wrangling
Data wrangling is the most challenging aspect of data analytics. It’s the process of cleaning, structuring, and enriching raw data into a desired format for decision making and analysis. This task, often quite meticulous, is a foundation for any data analytics project. It ensures the data is of high quality and suitable for exploration, analysis, and modelling.
Cleaning: Removing or correcting inaccuracies, inconsistencies, and errors in data. This may include handling missing values, correcting typos, or removing duplicates.
Transforming: Changing the format or structure of data to make it more suitable for analysis. This could involve converting data types, normalising data, or aggregating data points.
Merging: Combining data from different sources to create a more comprehensive dataset. This may involve joining different tables or datasets based on a common key.
Enriching: Adding external data to enhance the existing dataset. This can provide additional context or insights that were not previously available.
Filtering: Selecting a subset of the data based on certain criteria. This helps in focusing on data that is relevant to the specific analysis or task at hand.
Validating: Ensuring the data meets certain criteria or quality standards. This step is crucial to make sure the data is reliable and suitable for analysis.
?Why do we spend time so much time on data wrangling?
Data preparation, or wrangling, is a critical step in the data analysis process. Bad quality and format of the data can significantly impact the outcomes of the analysis. Remember, garbage data “in” means your boss will have an egg on his face when presenting the latest trading figures to the board. So, it’s very important to get it right. The problem is that lengthy and largely manual data wrangling effort limits the full benefits from the data insights.
The amount of time dedicated to data wrangling is substantial, consuming a significant portion of the overall data analytics workflow. It is not uncommon for developers and analysts to spend 50-80% of their time just preparing data. Before it ever gets to the analytical stage. It’s crazy.
Why are businesses spending so much effort on it? Well, let’s consider the following:
What’s in the effort?
Most companies are entrenched in spreadsheet analytics. There is nothing wrong with that. But what that means is that many of the reports are maintained by a single individual, manually. I’ve been that individual myself. The raw data arrived in the warehouse once a month. And it was my job to clean it, make sure it made sense, chase any errors and outliers, etc. I then enriched it with other data and created a summary dashboard that was consumed by various parties in the business.
领英推荐
I got pretty good at maintaining it, but I still could spend a day or two chasing anomalies. I hated doing it. The work took me away from my day job of delivering actual value. But we never got to productionising it since the engineering resources were stretched out doing BAU and big-ticket projects.?
Manual data wrangling is very expensive
The problem is that such an approach to analytics leads to a cost creep. I wasn’t the only one in the company doing manual updates. There were dozens of reports, dashboards and tools maintained by the analysts and SMEs across many departments. Individually, it was a day or two to clean the data. Collectively, it was months of manhours wasted on something that should have been done automatically.
Then you get to the ad hoc stuff, of which there is a never-ending stream. Can we just add more data to this report and build a new chart? We just finished a meeting and need to action the following points. There is a board meeting tomorrow and we need the latest data.
So, you spend a ton of time searching for relevant data, collating, validating and cleansing it before actually analysing it properly. The time it takes to wrangle the data before you get it to a working state is disproportionately long. We weren’t unique by any means. It happens everywhere. So, the manual wrangling is still proliferating and costing companies millions in wasted time every year. The worst thing is that the businesses all hate it, but the practice is notoriously difficult to eradicate.?
Misalignment of interests
The responsibility of data wrangling falls predominantly on data professionals, including data scientists, data analysts, and data engineers. These individuals possess the technical expertise required to navigate through the complexities of data transformation. They employ a variety of tools and programming languages such as Python, R, SQL, and specialised software to manipulate large datasets effectively.
There is a certain sense of satisfaction about being able to do things others cannot. It feels great to get praise after delivering a complicated data request Like an artist, you can throw in lines of custom code, stich several technologies in harmony to produce a data masterpiece. It makes you feel more valued and thus secure about your role in the organisation. Understandable. Self-serving, but completely human. The fact that the said masterpiece is a generic summary table in a warehouse is irrelevant.
But putting a “business hat” on, this is a massive waste of valuable time and resource on a menial task. The value lies in the insight, not manual data wrangling. Data wrangling is an unfortunate by-product of inefficient data analytics.
Companies must understand that to be data-driven means to derive insights quickly and efficiently. What it doesn’t mean is to engage in the constant data wrangling effort using an army of data experts who spend 80% of their time preparing data for consumption. Contrary to what you may believe, it is not the necessary cost of being a data-driven organisation.
Adopt automation across the “data” board
There has never been a better time to view data analytics in a new light. We have a plethora of efficient architectures, power of the cloud and fantastic modern tools that make manual data wrangling a thing of the dark past. Cutting-edge innovation from the thriving start-up data community is especially strong. Start-ups have vast experience of working through data inefficiencies. Their founders have gone through the wrangling pain themselves. They saw ways to help others avoid it by creating better tools and practices to deal with data wrangling.
The data world evolves very quickly. Companies face increasing pressures to use more data and more variety of data to support decision-making. Businesses are pushed to become more agile in how they work with data. GenAI is a prime example of that. The introduction of this tech now makes organisations scramble to get GenAI working in their settings. However, they face a mountain of all the legacy data practices and tech debt they have been collecting through the years. If they continue to wrangle the data manually as they always did, they will fail. Expensively.
As businesses increasingly rely on data-driven decisions, turning vast amounts of raw data into meaningful information will only grow in importance. The more time your organisation spends on preparatory work instead of analysis, the less value you generate for the bottom line. It’s pointless to hoard the data if you cannot put it to use effectively.
The solution is staring you in the face: automation. Automate as many manual data wrangling tasks as possible, within the constraints of your data estate. Streamline your data architecture (or even just put one in place!). Put in practical data governance policies. Make your data easily available to those who work with it, so they can focus on value-add insights. Don’t be afraid to try new tech and approaches to get there. Just don’t stagnate. Remember, it costs you a lot of money to idle in your manual BAU world.
Fractional CMO | Google Business Expert | SEO | Podcast Host-The Marketing Hygiene Show
9 个月Sounds exhausting. Imagine rolling a boulder uphill in business data wrangling.