Data Science requires heavy dose of statistics not less

Data Science requires heavy dose of statistics not less

Recently, there are narratives doing the rounds that there ought to be two courses

1) B.Sc./M.Sc. Statistics

2) B.Sc./M.Sc. Data Science?

The proponents of this narrative imply that Data Science somehow requires less statistical rigor and hence a more watered-down course of (B.Sc./M.Sc. Data Science) would suffice.

This narrative needs to be nipped in the bud.?

But first we need to understand why people push such narratives.

  1. They want to sell their 'watered-down' courses
  2. It is easier to teach things with less rigor than more
  3. It facilitates the selling of low code DS libraries since they are marketed as "you don't need to know advanced stat/math to use our tools"

Why does this topic rile me up ? and it should rile you up too.

(Hint : Skin in the game)

My company like many others is in the business of Applied Data Science. Applied Data Science has maximum skin in the game. Ask why? Well if we apply a wrong model, we don't get repeat?business. Our business survival depends on applying the right statistical/ data science algorithm to business problems.

This calls for having a workforce that is trained in statistics/ Mathematics rigorously.?

Institutes that push for 'watered-down' courses have lesser skin in the game. When is the last time you heard a Institute say "Hey if you don't get the job post completion of the course, we give money back".

I certainly haven't and this exactly what 'lesser skin in the game' means.?

Why people think Applied Data Science is easy.

The theory of applied data science in industry requires 'lesser depth/rigor' is quite bewildering.

Perhaps it is due to the fact that all our lives have been made easy by .fit() function. People conflate the easiness of fitting a model through .fit() to thinking that '.fit() is all there is to applying data science to business problems".

It is also due to this apparent 'easiness' that data engineers and software engineers browbeat data scientists. They think data scientists have it easy and .fit() is all they do.

People conflate the easiness of fitting a model through .fit() to thinking that .fit() is all there is to applying data science to business problems.

Majority of Data Science projects fail because people don't know how the algorithms work under the hood.

Anyone can .fit() on a data, but it requires deep knowledge to

  • Know when not to .fit() on the data?
  • What to do when the model breaks down.

Real expertise is realized when things breakdown.

Real expertise is realized when things breakdown.

Advocating for 'watered-down' Data Science course is a slippery slope

The advocacy of 'watered-down' data science course is a slippery slope because one really can't be sure in which domains such candidates could work at.

God forbid if they were to go on and work in mission critical areas like medicine.

  • Would you trust a medicine cleared by a person who passed out of such course ?
  • Would you trust the results of a scan which a data scientist with "only minimum stat/math knowledge" developed?

I am sure the answer is NO to the above.

Also, every domain is a mission critical domain in their own right. Be it Marketing, Manufacturing, Banking and Finance. Wrong models and poorly trained data scientists could do more harm than good.

Not to mention such institutes push candidates with poor knowledge in the job market. Companies face a herculean task finding good talent and the poor candidates too get dejected having faced rejection in many interviews. The only one laughing their way to the bank is people who create such watered-down courses.

The only one laughing their way to the bank is people who create such watered-down courses.

Avoiding AI winter

AI winter was a period in the 70's and 80's where companies and research institutes lost interest in AI and significantly reduced spending towards many AI initiatives.?

The reasons ranged from Hype, Not meeting expectations, insufficient computing capacity and Empty pipeline.

In our current context and scenario, poorly trained data scientists applying wrong ML algorithms could create deep disenchantment among organizations. And Déjà vu, you have AI winter.

The Way forward

Let us not separate statistics from data science. This New Year let us shun mediocrity and strive for excellence. Data Science as a field requires a heavy dose of Statistics not less.

Happy New Year All.

Your comments are welcome.

Rahul Gupta

Consultant - Data Science @ EY

2 年

Well said

Stanislav Sykora

CTO at Extra Byte srl

2 年

Once there were people who "went into life-sciences because they did not really like math". Now we know that they were fools. Today there are people who "go into data-science because they do not really like statistics, or the whole math". In short time, they will become known as fools.

Ben Jepson

Statistical Data Scientist

2 年

"Real expertise is realized when things breakdown" YES.

Stanford S.

Parmanoo | Moonshot Innovations

2 年

Please add problem solving to this too. Will complete your thoughts perfectly. Many folks focus too much on modelling techniques with zero understanding of problem solving.

Shoaib Raza

Business Intelligence Engineer at Amazon

2 年

I support the idea you've presented here. I want to do masters in data science with my job going on. I'm confused due to the plethora of courses being offered. Your advice would help me a lot Venkat Raman.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了