What is Data Wrangling?
Nicola Askham
DataIQ 100 2022 | Award Winning Data Governance Training | Consultant | Coaching | Data Governance Expert | D.A.T.A Founding Committee
If you are a regular follower of my videos and articles, you will know that one of my key aims is to help explain the vast - and sometimes confusing – amount of terminology that is found within Data Governance.
Often things have different meanings depending on the organisation you work within or can even vary from person-to-person, which is why I want to say first and foremost: there is no such thing as a stupid question! The person who sent me today's question actually apologised for asking it but I'm a great believer that there should be no such thing as a stupid question when it comes to Data Governance.
If you feel that you need to ask the question, then that means that somebody hasn't explained it well enough to you. So, the question we’re dealing with in this article is not a stupid one.
What is Data Wrangling?
Now, the person who sent me this e-mail felt stupid because they felt that perhaps it was something they should be doing, but they didn't understand what it was, and they didn't want to look stupid by asking.
The short answer is this: yes, the chances are you probably do have to do Data Wrangling in your job, whatever your job is, but whether you should be doing it is a different matter entirely.
I've actually heard the term Data Wrangling quite a lot over the past year or so, and I think people are using it to describe the situation where data isn't perhaps where you would like it to be, or it isn't good enough quality for you.
So, what they tend to use the term to mean, is the getting together of data from various sources and doing something to it so that you can use it.
What could that be? Well, it might be amalgamating it into a spreadsheet; it could be cleansing and fixing the data; it could even be running around various people asking them to fill in the gaps that you've got on your spreadsheet.
领英推荐
That all means that unfortunately, Data Wrangling is unfortunately a necessary thing if you have poor quality or missing data, and is very common in organisations that perhaps haven't yet got a proper Data Governance initiative in place or are very early on in their journey.
It’s part of the problem – not the solution
Data Wrangling also tends to be used to describe the frustration that you have of doing these activities, of bringing together data from disparate systems or spreadsheets, or fixing data before you can do what you should do with it.
Therefore, I don't think Data Wrangling is necessarily a good thing. It’s definitely not a skill you should perhaps aspire to have – what you should be aspiring to have is complete and accurate business data with a proper Data Governance initiative in place. Data Wrangling is not the solution – it’s a temporary fix for a much wider problem within your organisation. Especially if you find yourself having to do this regularly. At that point you should really stop and ask yourself ‘why am I having to do this so often – what data quality issues is my organisation facing and how can we find long-term solutions to address them’?
Data Wrangling is just something that unfortunately we have to do a lot of in our jobs at the moment, but it should be one of the things we should be looking to eradicate by having Data Governance in place.
Get in touch
Don't forget if you have any questions you’d like covered in future videos or articles please email me - [email protected].
Originally published https://www.nicolaaskham.com/
Data and Analytics Leader
2 年Nice Article Nicola and some very good points. I will volunteer another definition of Data Wrangling that has become fashionable in recent times: That is preparing our data for reporting by transforming it from what lies in the source systems to structures that are suitable for reporting. ETL or "Extract Transform and Load" is the industry standard terminology for these operations but Data Wrangling is perhaps more easily understood by the masses (questionably!). It inevitably includes some data cleansing (which should be unnecessary) but it also includes adding calculations and shaping our data into performant, easy to understand data models that are optimised for reporting and analytics. We actually market one of our training courses as Power BI Data Wrangling as we felt that was more consumable than other potential names: https://www.altisconsulting.com/uk/training/private-data-analytics-training/power-bi-data-wrangling/ Thoughts?
Perfect: “Data Wrangling is not the solution – it’s a temporary fix for a much wider problem within your organisation.” And this final paragraph: “Data Wrangling is just something that unfortunately we have to do a lot of in our jobs at the moment, but it should be one of the things we should be looking to eradicate by having Data Governance in place.” This is everything right there - unimprovable explanation and even business case!
Nicola, 've never thought about data wrangling that way but you make a good point. It may just be a symptom of a deeper problem. Thanks.