登录查看更多内容

Categories of data scientists – where do you want to be?

Subrata Das

Gen AI Professor & Principal AI & Data Scientist

发布日期: 2019年9月16日

I lay out these three broad choices in front of an aspiring data scientist seeking advice: Do you want to be a slave of a chief scientist or do you want to be among the masses or do you want to be deep into foundation?

The first category covers mostly those who are limited self-learners and may only have been exposed to an online course. They know the jargon of machine learning and input/output to a handful of techniques. Some moved from internal traditional IT departments to newly formed data science divisions hoping for a more rewarding and cooler career. Their managers use them basically for the time-consuming and laborious process of preparing data, tell them which package to learn along with specific techniques, and instruct them in setting the appropriate parameters. They often come with a good background for generating reports, mostly by querying existing data sources. They will not be able to explain even for a simple regression model what a p-value is but will know how to discard irrelevant attributes based on its value. They will also know what a neural network looks like or a decision tree. For these professionals, deep linguistics processing is still searching words in texts.

The second category constitutes most of the so-called professional data scientists. They come from a variety of disciplines with which they have started their career but subsequently obtained a master’s degree from any of the mushrooming university departments that have started offering degrees in data science. They have studied algorithms and know the parameters that affect algorithmic performances. They usually get goal-oriented assignments from their managers, meaning the problem statement is provided but the laborious process of collecting and preparing data is still left to them. They can run predictive algorithms in R and Python using various packages, vary the values of the parameters knowing their effect at a very high-level, and then select a combination that gives the best performance. They think of analytics as a “bag of tricks” meaning they adopt whatever techniques solve the current problem. But they still think that tensors are just blocks of data, AI is all about machine and deep learning, and “Bayesian” is just a buzz word. They will know eight or so popular machine learning techniques but it is unlikely that they will have any knowledge of gradient boosting type of algorithms. They will know well how to code deep learning in Keras/TensorFlow and AWS and in other similar platforms but will have difficulty explaining dropouts, vanishing gradients, etc., and the appropriateness of different activation functions under different circumstances.

Less than two percent of all professionals are in the third category and will continue to increase its share as the data science field matures. Professionals in the third category are those who have degrees in the foundational disciplines, such as mathematics, probability and statistics, linear algebra, and broadly the theory of computer science and artificial intelligence. Most not only know some of the algorithms well but also the foundational mathematics behind the algorithms such as the cost function formulation and techniques for convex optimization along with geometric interpretations. Many come with strong publication backgrounds and tend to solve everything with only a handful of techniques they have mastery of. They are therefore highly biased in their approach to solving problems. But they are likely to be aware of all the latest and greatest in the field. Many of this category lack practical usability sense and fail to explain the results to na?ve users. We are still not at that stage yet when an analytics system configures and adapts itself and hence the value of the professionals from this category to build the most efficient models.

The purpose of this broad subjective categorization is not to highlight the level of usefulness of professionals of one category versus another but rather to help you assess the strength and shortfall of your existing team against the need. In fact, you need a mixture of all three to successfully run a data science unit. You cannot make someone without a proper mathematical background do the job in the third category. Conversely, one interested in building the best model with deep algorithmic background cannot be asked to spend time doing routine modeling and data preparation all the time.

Now the question is – in which category do you belong?

Raunak Sinha

Consultant at General Mills

5 年

Great read sir. Thanks for sharing the same.

Michael K.

5 年

Basically you categorized them by experience.

查看更多评论

要查看或添加评论，请登录

Subrata Das的更多文章

Nobel & AI

2024年11月1日

Nobel & AI

This year’s Nobel Prize in Physics has been awarded to two veteran AI scientists, while the Chemistry prize has…
Can generative AI produce realistic medical images?

2024年1月2日

Can generative AI produce realistic medical images?

The question above was posed to the students of my Generative AI class for graduate students at Northeastern, which…

3 条评论
Deduction in ChatGPT

2023年1月30日

Deduction in ChatGPT

Something fundamental to the intelligence of a system is to be able to make inferences of different types, such as…
Systems Engineering in Building Complex AI Systems

2021年3月16日

Systems Engineering in Building Complex AI Systems

An extended abstract of the invited presentation at the workshop Leveraging Systems Engineering to Realize Synergistic…
Factors inhibiting AI adoption

2019年11月5日

Factors inhibiting AI adoption

Despite the recent surge of activities in the field of data science and demonstrable benefits as a result, many…

1 条评论
Analysis of Text (aText) Tool in Python and Java

2019年10月29日

Analysis of Text (aText) Tool in Python and Java

Analytsis of Text (aText) is a Natural Language Processing (NLP) package developed over many years using machine and…
The Death of True Intelligence?

2017年5月11日

The Death of True Intelligence?

[Alternative title: Quest for True Intelligence] Much has been spoken recently about the danger of making computers…

8 条评论
Computational Business Analytics

2016年12月12日

Computational Business Analytics

1 条评论
Internet of Things - critical roles of data fusion, analytics, and intelligent agents

2015年12月11日

Internet of Things - critical roles of data fusion, analytics, and intelligent agents

Anywhere between twenty and a hundred billion physical objects and devices are expected to be interconnected via…
Time Series Modeling and Forecasting

2015年10月15日

Time Series Modeling and Forecasting

A time-series is a sequence of data points representing the state of a “system” as it evolves over time. Each data…

10 条评论

See all articles

Categories of data scientists – where do you want to be?

Subrata Das

Gen AI Professor & Principal AI & Data Scientist

Subrata Das的更多文章

社区洞察

其他会员也浏览了

Breaking into Data Science & Machine Learning: A Guide for Newcomers

Breaking BERT?—?How to break into Machine Learning

How a Neural Network Sees a Cat, 5 SQL Data Wrangling Techniques, and a 70% Discount to ODSC West

2024 Data Science Toolkit: Top Skills You Need to Master

Data Scientist vs. Machine Learning Engineer

Artificial Intelligence 2.0: Career Pathways, Essential Skills, and Industry Insights with a Focus on the Telecommunications Industry

What Will I Learn in the Data Science Course?

What Skills Do You Need to Succeed in Data Science?

Vector Indexing plus Knowledge Graphs with Neo4j

Responsible Data Science Framework: Techniques, Algorithms, and Fairness for Insightful Analysis and Ethical Practices

Subrata Das的更多文章

Nobel & AI

Can generative AI produce realistic medical images?

Deduction in ChatGPT

Systems Engineering in Building Complex AI Systems

Factors inhibiting AI adoption

Analysis of Text (aText) Tool in Python and Java

The Death of True Intelligence?

Computational Business Analytics

Internet of Things - critical roles of data fusion, analytics, and intelligent agents

Time Series Modeling and Forecasting

社区洞察

其他会员也浏览了

Breaking into Data Science & Machine Learning: A Guide for Newcomers

Breaking BERT?—?How to break into Machine Learning

How a Neural Network Sees a Cat, 5 SQL Data Wrangling Techniques, and a 70% Discount to ODSC West

2024 Data Science Toolkit: Top Skills You Need to Master

Data Scientist vs. Machine Learning Engineer

Artificial Intelligence 2.0: Career Pathways, Essential Skills, and Industry Insights with a Focus on the Telecommunications Industry

What Will I Learn in the Data Science Course?

What Skills Do You Need to Succeed in Data Science?

Vector Indexing plus Knowledge Graphs with Neo4j

Responsible Data Science Framework: Techniques, Algorithms, and Fairness for Insightful Analysis and Ethical Practices