登录查看更多内容

Is Coding the Least Important Core Skill of Data Science?

Jehan Gonsal

Principal Product Manager at Atlassian

发布日期: 2018年2月18日

Don’t get me wrong. Coding is a core skill of data science. But it is overhyped. Or, at least, overemphasized.

I see job postings for data scientists and data analysts that look for candidates with fluency in SQL, R, Python, C++ etc.

What’s the problem?

It does nothing to ensure you don’t hire someone who knows just enough about data science and analytics to be dangerous. Data science and analytics are fields that require deep technical expertise and the coding component is just the tip of the iceberg.

There’s nothing more dangerous than someone who can run an algorithm but has no idea how it works nor whether it worked.

So, what are the more important core skills in data science? Here are my thoughts in order of importance. And, yes, this is my opinion so you are more than welcome to disagree with me!

Business Context

There is no value to data science without context. You need to know the company and the sector for analysis to make sense. If you work in the FMCG sector and want to measure changes in spend over time, you’ll need to factor in seasonality into your analysis. If you work in a subscriber-based industry or product category (like postpaid mobile phones where you make fixed monthly payments), you’ll need to figure out which inputs are symptoms of churn and which are causes of churn. Although much of this work will be solved analytically, the initial context will more likely be set by the business. This is the framework around which initiatives are proposed and creativity thrives.

It also provides a clear understanding of the business impact of each project. In many cases, data science initiatives require too much time to be worth the investment.

In the absence of business context, analytics teams can spend long periods of time creating models that deliver negative returns, by either failing to solve a worthwhile problem or requiring more resources than the problem warrants.

Statistical Thinking/Applied Mathematics

Data science and analytics is about separating the signal from the noise. Business context gives you a clear picture of the problem, mathematical knowledge enables the solution. I’d argue that there are three core components here:

Probability: Understanding how to compare any result to what you would expect by chance. Given the popularity of significance testing, people often trawl through data looking for significant effects. This leads to chance effects being offered as “insights” and a loss of trust in analytics when the effect is never replicated. However, this also extends to research design (such as stratified A/B testing with control groups). Probability is one of those things that the human mind lacks an intuition for and so it requires study. It’s not data science per se but is a consideration for many analytical techniques.
Modelling Techniques: Knowing how modelling techniques work is crucial when it comes to fitting a mathematical function to an analytical problem. In supervised learning, you need to have some hypotheses about how the input variables map onto the output variable and the sample size considerations for the approach you choose. The more complex algorithms (Gradient Boosting Machines etc.) can map non-linear interaction effects, which is useful when they are present but wasteful if the relationships are largely linear and bivariate. Moreover, more complex algorithms can map more complexity but require more statistical power. You need larger sample sizes so they can fit to the sparser regions of the feature space. In unsupervised learning, you need to consider what your data looks like. Do you have highly skewed data with plenty of outliers? If so, K-Means will result in clusters that misrepresent your data and do not generalize. The bottom line is that understanding how these methods work will help you figure out which techniques to apply to the problem.
Model Evaluation: Knowing how models work is the first piece of the puzzle. The next is how to validate your models. The problem is that these tools will fit to your data and pump out results no matter what you put in them. It doesn’t mean that it fit anything other than noise. There are many things that can go wrong here. For example, cross-validation tends to overestimate how well your model is describing reality leading to a false sense of security regarding your model. For classification, accuracy metrics can be misleading if you predict one class better than another. This is only scratching the surface, smarter people than me have written entire books on this.

I’d personally much rather have a data scientist with terrible coding skills but exceptional analytical skills. They may need to lean on IT to deploy an algorithm but will be better able to navigate the mathematical complexity of data science.

Interpersonal Skills

Articles often cite communication as a key skill in analytics and data science, but often gloss over the details. I’d argue that this has led to misunderstandings about what is required. Let’s be a bit more detailed:

Trust-Building: Analytics teams work with other teams (data managers, marketing departments etc.) and close relationships and trust is required for great work to happen. If you have a team of PhDs that don’t get on well with the data warehousing teams and confuse the marketing teams, then you’ll struggle to get what you need and sell in great initiatives. It is very easy for business stakeholders and analytics professionals to develop adversarial relationships due to this process breaking down.
Bargaining Skills: Once trust is established, you’ll need to learn how to negotiate mutual victories. The problem with analytics teams is that they often get hit with lists of poorly scoped requests and non-analytical tasks that other teams can’t do. It’s not practical to simply say no every time but lines will need to be drawn on occasion. It’s about picking your battles and avoid the traps of pushing back too often or not often enough.
Empathy: Although being able to communicate complexity is very important, I’d say it plays second fiddle to understanding other people’s perspectives and speaking to them. For many subject-matter experts, analytics is a threat. They may feel that data-driven insights will work against their recommendations. Once trust is established, partnering with them by understanding and speaking to their perspective will make them an ally rather than a road block.

Information Systems

Each company stores information in different ways. The more technologically developed have non-relational, big data architectures whereas most have the more common relational databases. Good analysts understand where information comes from, what it refers to and know how to combine these conceptually. They also know how to ensure the environment is looked after. A common issue in these environments is tech debt, where poor code that is not deployment ready is pushed into deployment environments and leads to inefficient systems that may have bugs causing outages and inaccuracies.

For example, a data scientist may be looking to create some derived fields for a simple dashboard based on them being strong predictors of customer attrition. Although these could be manually created in Tableau or Shiny, this could lead to the interface becoming slower and, depending on how fields are updated, result in the dashboard being out of sync with the rest of the system, leading to end-users wondering why they are seeing different numbers in different places. Or changes to the data eco-system could lead to the dashboards developing bugs where certain fields are coming up with nulls (or worse). Knowing when a change should be briefed to IT/data management and when it should be done in a non-scalable way requires sound judgement and this comes from understanding both the information systems and priorities of the business.

Coding Skills

So, here is where I put coding. Once you have the above, you are in a good position to improve your ability to execute by further developing your coding skills. In reality, it’s hard to have the skills above and not have a reasonable grasp of coding (it’s almost always learned in tandem). However, it’s worth mentioning that the programming is the easiest part. That’s not to say it’s easy. It is incredibly difficult. When I was learning to program in R, I wrote some awful code. And I still get schooled every time I log onto Stack Overflow. Good data scientists write efficient code that is easy to read, check and modify but let’s not fall into the trap of thinking that great data science will happen from coding skills alone.

When I see people looking for data scientists but only mention programming skills, I ask myself:

“Are they looking for a software developer who dabbles in data science or a data scientist who can program?”

Because these two things are not the same.

But those are just my thoughts. What do you think?

Anandam Sarcar

Data & AI | People | 20+ years of Global Experience in BOTH Business & Engineering roles for multiple Microsoft businesses

7 年

Both are required. A good balance of core technical and business skills are required for being a good data scientist. That’s why the field is niche and hard to get folks who are great in both ends of the spectrum ??.

2 次回应

Ranvijay Singh

Sr. Data and Reporting Engineer

7 年

I agree but both skills are vital component for Data Scientist and that is the reason world is moving from hard core analytics to Machine Learning and Deep learning. The tools are self capable to handle complex analytics implicitly without much core knowledge in Statistics.

Vijay Bhogi

M365/Power Platform Specialist at Rio Tinto

7 年

Vishal Gogineni

查看更多评论

要查看或添加评论，请登录

Jehan Gonsal的更多文章

My shift into product management

2020年2月29日

My shift into product management

I spent years investing in my career as a data analyst. I read entire textbooks on statistics and machine learning.

3 条评论
How I networked my way into a great job

2019年5月13日

How I networked my way into a great job

I'm not writing this to show off. This is a very common way of getting a role, but is something that many people…

8 条评论
Seeing is Believing: Visualising Data for Better Analytics

2018年2月25日

Seeing is Believing: Visualising Data for Better Analytics

When people say they love data, they usually mean they love data visualization. They rarely want to write endless SQL…

5 条评论
The Relaxed Lasso: A Better Way to Fit Linear and Logistic Models

2018年1月28日

The Relaxed Lasso: A Better Way to Fit Linear and Logistic Models

Model building can be a painful process when building data-driven linear and logistic regression models. Stepwise…

1 条评论
How Santa Leverages Analytics to Distribute Presents to 1.9 Billion Children

2017年12月9日

How Santa Leverages Analytics to Distribute Presents to 1.9 Billion Children

It turns out that Santa is not only real but running a large-scale organization that leverages information systems…

2 条评论
Opening the Black Box: Visualising Machine Learning Algorithms

2017年11月25日

Opening the Black Box: Visualising Machine Learning Algorithms

These days machine learning is all the hype. Unfortunately, these algorithms are usually considered rather hard to…

4 条评论
Significance Testing is Broken (and How to Fix it)

2017年9月4日

Significance Testing is Broken (and How to Fix it)

Making inferences from a sample of data is hard. We often use significance testing to see how well we can generalize…

6 条评论
Imputing Missing Data: Playing with Fire

2017年3月5日

Imputing Missing Data: Playing with Fire

Missing data is what keeps most analysts and data scientists awake at night. Or at least me.

94 条评论
Statistical and Machine Learning Modelling for the Rest of Us

2017年1月2日

Statistical and Machine Learning Modelling for the Rest of Us

Let’s say you are a highly-experienced professional with little to no experience in analytics. You find yourself paired…

8 条评论
Career Advice from the Lean Startup

2015年7月24日

Career Advice from the Lean Startup

I recently got into the Lean Startup, an entrepreneur movement that takes a hands-on, analytical approach to starting a…

2 条评论

See all articles

Is Coding the Least Important Core Skill of Data Science?

Jehan Gonsal

Principal Product Manager at Atlassian

Business Context

Statistical Thinking/Applied Mathematics

Interpersonal Skills

Information Systems

Coding Skills

Jehan Gonsal的更多文章

社区洞察

其他会员也浏览了

Mastering Data Science From Basics to Advanced

Cracking the Code of Data Science: A 12 Step Guide to Becoming a Data Scientist

Know how Pandas Profiling makes data exploration easier and more effective.

10 Best Data Science Questions for Beginners

5.5 Tips on Starting a Career in Data Science

Effective Strategies for Handling Missing Data in Pandas

How to Become a Data Scientist in 2025

Is Data Science the right career option for you? Do not go for a Data Science career if …...........

Mastering the Art of Data Science: Key Skills That Elevate Your Career

Your Roadmap to Becoming a Data Scientist: A Step-by-Step Guide for Aspiring Data Professionals

Business Context

Statistical Thinking/Applied Mathematics

Interpersonal Skills

Information Systems

Coding Skills

Jehan Gonsal的更多文章

My shift into product management

How I networked my way into a great job

Seeing is Believing: Visualising Data for Better Analytics

The Relaxed Lasso: A Better Way to Fit Linear and Logistic Models

How Santa Leverages Analytics to Distribute Presents to 1.9 Billion Children

Opening the Black Box: Visualising Machine Learning Algorithms

Significance Testing is Broken (and How to Fix it)

Imputing Missing Data: Playing with Fire

Statistical and Machine Learning Modelling for the Rest of Us

Career Advice from the Lean Startup

社区洞察

其他会员也浏览了

Mastering Data Science From Basics to Advanced

Cracking the Code of Data Science: A 12 Step Guide to Becoming a Data Scientist

Know how Pandas Profiling makes data exploration easier and more effective.

10 Best Data Science Questions for Beginners

5.5 Tips on Starting a Career in Data Science

Effective Strategies for Handling Missing Data in Pandas

How to Become a Data Scientist in 2025

Is Data Science the right career option for you? Do not go for a Data Science career if …...........

Mastering the Art of Data Science: Key Skills That Elevate Your Career

Your Roadmap to Becoming a Data Scientist: A Step-by-Step Guide for Aspiring Data Professionals