Hiring useful data scientists - breaking the magician's code

Hiring useful data scientists - breaking the magician's code

Idea in brief

The single biggest mistake people make when hiring data scientists is insisting on them being, well, scientists. In a hyped up competitive skill market, this leads to more demand chasing the wrong and limited supply. I'll share what I learned about hiring data scientists from managing a large team of data scientists over a long enough period of time, if you don't work for an internet monopoly, this article is for you. :-)

Wait wait wait, did you just say being a scientist is not a requirement for being a data scientist, are you stupid?

I know, this is precisely why you will make a mistake, let's break it down. Why are you looking to get a data scientist? if you're like most of us, you're doing one of two things, you're either building data products, or you're trying to extract insights from data in a corporation - or both. In the old days (before this), you would use a software engineer for building a data product, and a business analyst for data insights. Let's split the discussion to these two activities, what happened, why do we now call these people data scientists and what you absolutely must insist on to succeed.

Building data products - engineering is not enough

Data products is truly where the term data scientist got popularized, and for good reasons. When you build a product that uses data to define its logic (read - machine learning), the quality of your product will depend on that data, this means that unlike your engineering activities, this one will resemble research, in the sense that your ability to predict the result of the work, prior to seeing representative data is zero. This is a very important distinction, it means that building data products requires different processes than building pre-programmed-logic products. Another aspect that can be very uncomfortable for software engineers is that robustness of data products will depend on the attributes of the data, one day your product is well behaved, the next it might not. This makes understanding the fundamentals of probability theory important, and getting in the habit of starting with the data, something that needs to be learned.

So, you will need a special development process - does the process require a PhD to run it? don't be silly.

However, as you can easily tell from the paragraph above, the really critical requirement from a hiring perspective is intelligence, your data scientist in product needs to be smart, really smart, very very smart. Are PhDs smart? yeah, sure, are they all smart? I can't judge, can you find smart people who are not PhDs?

What about statistics and machine learning algorithms? in my experience this is all learnable, which leads me to the second quality you will be looking for - passion. Passion will get your employee to learn whatever is needed, if they're as smart as I told you to hire, this should be no big deal. Are scientists passionate? some are... not necessarily in building products.

Data Scientists for Insights

So... when did business analytics become data science? I see two culprits, the first is the market hype, every statistician knows (sorry guys) that if you call yourself a data scientist your salary will jump drastically, especially if you have a PhD because well, you really are a scientist.

The other less cynical one is big data. You must have seen this one before:

Why does a business analyst now need hacking skills? big data! data exploded on us quite quickly, and the tools business analysts and statisticians used to use didn't keep up with the open source world, so to produce insights from big data you really needed to be a hacker - at the same time, big data opened up new inferences that weren't reachable in small data world (think about A/B testing in a marketing context, being able to differentiate a 0.1% improvement), so stats became more important to analysts (it always has).

I do think the world has progressed along, where hacking is no longer a requirement in big data analytics. Can your business analyst produce better insight if they have robust statistics understanding - sure. Do they need a PhD in statistics for that, really?

One requirement that favors a scientifically disciplined mind is rigor in analysis, not confusing confounding factors, making sure an analysis has no mistakes in its logic - this has been a requirement of business analytics long before we stamped Science on it.

From my experience, the non teachable part in the insights world is big picture thinking, understanding what matters and what doesn't in the business, as I keep asking my data scientists in this space - what do you want us to change in the world with this analysis? so what? some PhDs do this well, some don't, I haven't seen a strong correlation.

The other part that I found to be hard to teach is influence, you know, the people side. For insights to matter somebody has to advocate it, you can't be an introverted scientist and really create business value with insights, you really need to have a little politician in you. Are PhDs good politicians? some are amazing, but I think on this one they're mostly at a disadvantage.

Are you saying anybody can do this?

Ok, clearly you haven't been reading carefully.

For a data product, look for engineers with high intelligence and passion. Help them with a research friendly process, get them trained on machine learning.

For data insights, look for big picture thinking, rigorous analytical thinking and extroversion. Help them with tooling that avoids big data, and teach them your business domain and organizational structure.

Are you saying I should avoid PhDs?

Absolutely not, if like me, you're lucky to find super smart, passionate about product, scientists that are willing to tolerate you after writing this article ( :-) ) - go for it. All I'm saying, it's not as hard a requirement as the term Data Science implies, not sure how we got here, it sure sounds sexy and you wouldn't read this article without it. Get good people, treat them well, create value, have fun.

Let me know what you think!

-- Shadi

Aalia Sultana

Product Owner at Nationwide Building Society

4 å¹´

Totally agree with this post!! I believe it’s anyone who can immerse themselves in data and tell a great story using it.

赞
回复
Stephen F. Heffner

President / Owner at XTRAN, LLC

5 å¹´

Shadi -- great article! I'm late to the party, but... "...your ability to predict the result of the work, prior to seeing representative data is zero." Exactly; among other things, the testing burden shifts from compile-time to run-time. As we get more "meta", this will be more and more the case. "...robustness of data products will depend on the attributes of the data" -- which means that, to the extent possible, programs must make both legality and reasonableness tests on data they input. This has always been good practice, but as our "intelligence" shifts more and more from code to data, it becomes even more important. A corollary involves traceability; we can examine the code, but it's harder to watch the code process our data, so that code should, as much as possible, explain why it did what it did in terms of the data it processed. I agree with you 100% about passion. I've identified 4 properties that virtually guarantee an ideal staffer -- 1) functioning intelligence, 2) passion, 3) good character, and 4) ability to communicate. (Work ethic? Passion takes care of that.) Unfortunately our collapsed educational system is creating fewer and fewer people with that combination; the intelligence may be there, but it hasn't been taught how to function. "...can you find smart people who are not PhDs?" You bet! Super-smart people frequently don't have the patience to endure the grind to get to a PhD; they'd rather create something and make it work. Are PhDs bad? Not inherently, but in my experience they're not a reliable indicator. One thing -- the harder the science involved, the more the PhD tends to mean, because hard science's results are measurable and repeatable. (Full disclosure -- I don't even have a Bachelor's, much less a Phd, but I've taught PhDs.) I would add one thing to your excellent treatise -- the ability to work with Entity Relationship (ER) analysis, which undergirds everything we can understand about our data. If the ER structure is wrong, things go rapidly downhill. (Data science shares this with Enterprise Architecture; see my LinkedIn Pulse articles about EA.)

Marco Deluca

#webtechnologies · #entrepreneur · #inventor · #engineering · #masscommunications

6 å¹´

100% agree and really well written Shadi. Let me just add that finding smart people that are really passionate is the key to successful hiring for all jobs :). Motivated people can "walk on water"

Pranav Parekh

Advisory IT Architect at IBM

7 å¹´

Thank you for this article and making it clear for me and many others to not go for all so much hyped buzz word required to pursue career in Data Science and continue working passionately around data and problem solving skill.

Irina Fedulova

Principal Data & AI Lead @ Philips | ex-IBM

7 å¹´

Great article, Shadi! I can also add that young data scientists (those fresh MSc or even PhDs who are applying for data science positions) will expect focusing on doing *science*, or "building models", and may get disengaged nad upset shortly after being immersed in the realities of building a data product. Pure ML is only about 5% of the typical production machine learning system, where remaining 95% is the data pipeline (getting data, loading cleaning, verification, feature extraction, resource management, serving, testing, etc). So I totally agree that for building a data product one should look for a team of engineers (with understanding of how ML works). I think, there is a profession/specialty called "machine learning engineer", and engineering part need to be explicitly communicated in a job description.

要查看或添加评论,请登录

Shadi Copty的更多文章

  • Leading Magical Teams

    Leading Magical Teams

    Nothing of value is ever created by a single person. Teams produce all meaningful achievements in business.

    21 条评论
  • No, your engineers are not lazy - you have a lazy product strategy

    No, your engineers are not lazy - you have a lazy product strategy

    Idea in brief Products that don't meet customer expectations are the result of unrealistic scope/schedule/resource…

    18 条评论
  • My aspirations for our product management team

    My aspirations for our product management team

    Champions behave like champions before they're champions; they have a winning standard of performance before they're…

    7 条评论
  • Managing happy data scientists

    Managing happy data scientists

    Idea in brief To get the maximum value from your data scientists, you want them to be happy and engaged - Most…

    15 条评论
  • Data Science in Product Land

    Data Science in Product Land

    The idea in brief Data science is a special beast in the software solutions world, it has both facets of research and…

    7 条评论

社区洞察

其他会员也浏览了