Data Wrangling is Career Strangling

Data Wrangling is Career Strangling

Data wrangling is a necessary process when working with big data; most data, in reality. This opinion piece is not to diminish its importance. Nor, is this to be confused with Data Engineering. But I will argue that data wrangling is career strangling, in that it is holding you back in your career progression. Let me explain...

Firstly, let's agree that the whole basis of big data is to whittle it down to little data, that we call "Insights". The point of any data analysis is to identify a trend or anomaly. The point of a machine learning model is to find a set of defined patterns or assign a probability.

Observe any Data Scientist or Analyst presentation and the only pieces that get talked about are the Insights and the model. Zero time is spent explaining how the data was wrangled, despite that being 60-80% of the effort.

I am making the argument that data wrangling is low-level, tedious work that is wasted when an expensive resource such as a Data Scientist or Data Engineer or Analyst decides to take this on.

The best consultants know that:

You don't get paid for the hour. You get paid for the value you bring to the hour

The more time you spend on lower value work, the more you diminish your value.

And if you're an Analyst / Data Scientist spending a greater portion of your time wrangling data, that's much less time that you're spending to understand the data, that's much less time you're spending to analyze the data, that's much less time to you're spending on delivering business value from the data.

When it comes to big data, I believe that folks are starting to realize that robust software engineering practices need to be put in place to ensure quality of the data pipeline and #datagovernance. ...Cue the Data Engineer.

In today's episode (Aug 14) of the Digital Analytics Power Hour (a wonderful podcast, btw), there was a great discussion about raw data and data virtualization. I didn't feel that there was any consensus, so I'll throw in my 2 cents.

A company must adopt a tool or process to virtualize the raw data for the Data Scientists and Analysts. Drawing from software principles, the solution — built in-house or purchased — must be robust, scalable, extendable, and re-usable.

This will save an immense amount of time (and headache).

For example, when working with raw clickstream data, you have billions of atomic events. In most cases, identity resolution is required over a specified period of time. If every Data Scientist or Analyst is starting with the raw data, I guarantee that each will resolve the identity in a different manner (different "code"). This leads to multiple, inconsistent "truths". The Analysts / Data Scientists should only work from a consistent, consolidated schema for the vast majority of cases.

So, when I say "Data wrangling is career strangling", it's because you're devoting too much time to work with a lower-assigned value.

[Tangential annecdote: I use Salesforce a lot in my work. If I'm to be diligent, the data entry could be up to 4 hrs a week. I hired a VA on my own dime to handle this. This allows me to spend more time on higher value (and quite frankly, more fun) tasks. I value my time]

In the end, businesses are results-oriented. If you can produce more positive business results in a shorter time frame, then your career trajectory will move up-and-to-the-right at an accelerated pace.

And it's a compounding factor. Those that produce results are provided more opportunities. The sooner you produce results, the sooner those opportunities present themselves.

Focus on value delivered.

The faster you iterate, the faster you grow.

--

And if you found some value from this, please LIKE or SHARE this so that others may potentially benefit as well.

If you would like to discuss this further, I’m happy to share thoughts/ideas around the tools (commercial and open-source) available on the market today.

My consultation is free. :) I’m happy to help you, in any way I can.

And, of course, I would welcome opposing views, as I continue to gather information to make informed opinions. Let’s be civil, though, and not resort to any name-calling.

Jennifer Comisford

Revenue Operations | Analytics & Data Management | Process Improvement | Technology Enthusiast

6 年

I completely agree with this and feel like I’m experiencing it in my current role. I spend entirely too much time wrangling data in excel rather than analyzing and providing value. Unfortunately resources are scarce to reallocate this to a more appropriate party. What advice would you give to someone who is building the case to hire engineering resources to undo the strangling?

回复
Kerry Hew

Sales, Customer Success, and Product Management. Curious and Growth-Minded. Aspiring GOAT Husband & Father ??

6 年

I've already had a couple folks PM me. Of course, data wrangling is necessary for research or one-off projects. But for business applications where the opportunity for repeatability is high, this should be taken advantage of.

回复

要查看或添加评论,请登录

Kerry Hew的更多文章

社区洞察

其他会员也浏览了