Advice from an Old Data Scientist
Lessons for Data Scientists (Particularly Those Just Coming Up in the Ranks)
1.??????The term “data science” is similar to what the term “big data” was ten years ago.?It means something different to each person who hears it.?Furthermore, the general field is so important and the jobs are so well paying, that anything that can be termed data science is termed data science.?Always find out what someone means if they throw the phrase "data science" or "data scientist" into the conversation.?Ask them to be specific.?We have spent a good deal of time defining what data science is and who is a data scientist at the companies I have worked for and the result is more narrow than people would think.
2.??????Be an all-arounder with a specialty.?In a very simplistic model of the data science world, there are those who specialize in the coding or computer science aspects of the role and those who specialize in the modeling or statistical aspects of the role.?The best data scientists are reasonably good at both but specialize in one side over the other.?For example, while I am a good coder, I am a statistician at heart.?Models do it for me.?But I still need to keep up with the other side in order to be successful.?
3.??????Oh, in addition to those aspects you also need to be able to interpret and communicate your results.?It seems like a lot of things you need to be good at.?The good news is that these two typically come with time and experience.?Most junior data scientists aren’t great at these aspects, although you find some that are.?Experience teaches you not only how to make the results of what you did meaningful to the interested parties but it also teaches you how to communicate complicated ideas clearly to those who are not as technical as you may be.?Any role you take should pair you with a senior member in this field to teach you these things.?
4.??????Learning models and techniques to apply to data is critically important.?However, particularly when you are in a junior role, the vast majority of one’s time is spent getting data ready to use.?If you are lucky enough, you will be handed a ready made data set but that is usually not the case, particularly for interesting problems.?So understanding what data resides in the public sphere that can be used in addition to private data your company may provide is essential.?Furthermore, knowing how to put datasets together that may have different structures and purposes is key to success.?Finally understanding data hygiene gets you to the point you can do interesting things with the data.
领英推荐
5.??????Better isn’t always better – Part 1.??You will find that some data scientists get termed purists.?This isn’t always a great tag to have.?These folks are looking for the best model possible using whatever technique they can.?Once they have determined what this is, any other model that is used is seen with derision and distain.?The positive aspect is that purists are moving the ball forward, being innovative and getting the best result possible.?This is an amazing thing and you need purists around.?But … in terms of getting a model into production or relying on a result that folks can’t necessarily replicate for themselves, the best model may not always work from a practical standpoint.?Let me give you an example.?In my line of business we sometimes have missing data that we need to take care of.?My expertise is in imputation.?So I had a ton of ideas how to handle the missing data and had waved my hand at the simpler type of model being used when I joined my current company.?The best imputation models, even to this day, take a while to run. ?Furthermore, multiple imputation or doing the imputation process multiple times and averaging the results, is the way to go, in purist terms, for an imputation model.?My company's production timelines wouldn’t allow for even one run of the advanced imputation models much less multiple runs.?Furthermore, the results would have had to be augmented / rounded, as decimal places for things units are not acceptable to clients.?This would have added some error into the process, in some cases, a lot of error.?Finally, the results from the more advanced techniques were not that much better than the results from the simpler techniques.?Better wasn’t better in this case because we couldn’t implement the better model.?Finally, there are always better models.?You may think you have come up with the best model, wait a minute and the field will come up with a better way of doing things.? Your team needs to have purists and pragmatists to be successful. You need to innovate but you also need to get stuff done.
6.??????Better isn’t always better – part 2.?Earlier in my career, I served on or chaired many conference committees.?We were always on the lookout for the new thing.?We showcased many new, cool techniques, models, and products that never took off even though they all promised better results and deeper insights.?Heck, I even produced things that were “better” but never took off.?As I looked back, I started to ask myself why these things never took off.?
In order to tell you what I found let me tell you another story.?I was in the audience of a panel at a technical conference.?The panel was on weighting data, something of great interest to me.?One paper was really well done from a technical point of view.?The author had created a new variance estimator that he could show was better at the fifth or sixth decimal place.?It was demonstrable.?I raised my hand during the questions and pointed out that the variables in data being analyzed were not precise enough to take advantage of the “better” variance estimator.?In effect, you got no benefit from the estimator being better, even though it was.?It was also complicated to calculate.?So I asked why one would use this over standard methods that were easy to implement and effectively gave the same answer.?The author just kept saying “it’s better” not getting that it was effectively meaningless that it was better.?Unfortunately this is the same answer with regard to why these new, cool techniques, models and products didn’t take off.?They may have been better from a technical standpoint but in order to displace the existing way of doing things, you need to demonstrate that you are getting something worthwhile out of it that you aren’t getting out of the old way of doing things.?This is particularly true because the new thing, typically costs more, either in terms of actual costs or computational costs. The standard ways are standard for a reason. By now, they are pretty good and it will take a lot to displace them. Being "better" simply isn't enough.
7.??????This is the big one, remember this if you forget everything else.?But don’t forget everything else, those are important as well.?Your clients are trying to make an important decision, answer questions or gain insight.?Most could care less how you do things.?They just need to make the decision in front of them correctly, want the insights, or the answers to their questions.?They also need to have confidence in the results.?You can never forget this.?We often get caught up in how we are doing things, and it matters, but not in the end.? The only thing that matters in the end is whether enabled the client to do what they needed to do.
Data science is an exciting career.?I was a statistician when it was a lonely profession and it wasn’t the cool thing to do.?I am so excited that the profession has become so valued.?It is a wonderful life. I hope my thoughts are helpful.?Of course, really excellent people who I know will disagree with some or all of this as this is strictly my point of view.??Take it for what it is.
Director, Stakeholder Insights & Analytics at Medical Mutual of Ohio
2 年I have always admired your analytical rigor and am so grateful for my time at Harris where I was able to learn from you. Thank you for the stories and continued sharing of insights.
Serial entrepreneur with multiple PE backed exits. Board Director and investment banker in the data, insights, and analytics sector. ESOMAR President 2021-2023.
2 年What?? We’re old??
SO much "yes"
Retired but open to work.
2 年You’re not old, you’re wise! Thank you for the insights.
You are brilliant at what you do but beyond that I have always greatly appreciated the practical overlay you give. It is essential that we remember why and for whom we do what we do.