登录查看更多内容

No ML Algorithms Cheat Sheet, Please

Venkat Raman

Co-Founder & CEO at Aryma Labs | Building Marketing ROI Solutions For a Privacy First Era | Statistician |

发布日期: 2020年10月1日

What is a Cheat Sheet ?

Wikipedia defines cheat sheets as a concise set of notes used for quick reference. Now the word that needs to be emphasized here is ‘quick reference’.

In programming, cheat sheets are OK because no one can remember all the syntax of a programming language. Especially if the programming language constantly evolves (like Python) or if the programmer finds himself/herself transitioning in and out of different programming languages.

A quick reference like a cheat sheet helps the programmer save time and focus on the larger problem.

Data Scientist, What are you in a hurry for ?

Machine learning algorithm learning and implementation are never supposed to be a 100 M dash. Each machine learning implementation is supposed to be mulled over, thought through carefully, and then implemented. Data science solution takes time, it is an exploratory and experimental endeavor.

Following some cheat sheet makes you less experimental and you fail to explore all the options. The ‘Dive-straight into the problem’ attitude might help you win some Kaggle competitions but it won’t take you far in real-life machine learning use cases.

OK, now let’s get to the crux of the matter…

Why ML algorithms cheat sheets are a bad idea?

Data and Assumptions

Even within a company, one department’s business problem varies from the other. On a case to case basis, the data variety & complexity are too vast that no one single approach could be prescribed. But ML cheat sheet does exactly that.

For e.g.

If data < 1k, choose algorithm X

Else If data > 1k, choose algorithm Y

Coming to the assumptions, there are multitudes of assumptions considered for every machine learning algorithm. Starting from, assumptions about the data generation process to assumptions about the model. These assumptions are simply not studied or evaluated in detail.

Cheat sheets put you on a path with no U-turns or detours

Image credit: Pixabay

Much like hard coding in programming, cheat sheets for ML algorithms constrain your options. They put you on a path in which you merrily thread and once when you do realize that the path that you are on is wrong, it is often too late!

No opportunity to Innovate

If you go by cheat sheets, you are not taking the road less traveled. Needless to say, Innovation happens by taking the road less traveled. The cheat sheets don’t tell you to apply learnings from one domain to another. Neither does it tell you to try some ensemble technique or to try some amalgamation of different algorithmic techniques. One is more or less like a horse with blinkers.

We (Data Scientists) stand on the shoulders of giants. Be it OLS from Legendre or Geoffrey Hinton’s various Deep learning techniques, none of them were invented by following cheat sheets.

Cheat sheet makes your Decision making ‘machine-like’

Well, coding machine learning algorithms does not mean you become the machine itself !! Cheat sheets often make your decisions binary at every stage.

For e.g.

Voila, you have your clusters.. or do you?

A na?ve Data scientist or an aspiring Data Scientist would just be happy with clusters he/she got and just move on.

But here is the catch…

One of the pitfalls k means algorithm has is that it will cluster almost anything. Because we now have clusters, it does not mean we have accomplished anything! It is just a Pyrrhic victory

I would urge the readers to read this excellent blog by David Robinson on the drawbacks of k means.

So one can clearly see that, while a cheat sheet has led the data scientist down the path of K means algorithm, it gives a false sense of task completion when further probing is required.

No free lunch theory — The final nail in the coffin

Perhaps ‘No free lunch theory’ is the final nail in the coffin for ML cheat sheets. No free lunch theory states that

“There is no one model that works best for every problem. The assumptions of a great model for one problem may not hold for another problem”.

If there is no one model or algorithm which works best for every problem, then does it really make sense to have an ML algorithm cheat sheet?

So, please refrain from using ML algorithm cheat sheets. Try to arrive at a solution organically. Let your mind intuit and connect the dots!

Your comments and opinions are welcome.

Thank you.

David Knickerbocker

Chief Scientist, Co-founder, Author

3 年

Excellent article!

2 次回应

Chin Fang

Founder & CEO at Zettar Inc.

4 年

It is true that with deep understanding of a subject, one may be able to ?????????????? the essence into something "short and sweet". Nevertheless, to learn basics from such "short and sweet" materials is almost impossible. True understanding takes time and deliberation, lots of both!

1 次回应

查看更多评论

要查看或添加评论，请登录

查看全部

No ML Algorithms Cheat Sheet, Please

Venkat Raman

Co-Founder & CEO at Aryma Labs | Building Marketing ROI Solutions For a Privacy First Era | Statistician |

What is a Cheat Sheet ?

Data Scientist, What are you in a hurry for ?

Why ML algorithms cheat sheets are a bad idea?

Data and Assumptions

Cheat sheets put you on a path with no U-turns or detours

No opportunity to Innovate

Cheat sheet makes your Decision making ‘machine-like’

No free lunch theory — The final nail in the coffin

更多精彩文章

社区洞察

其他会员也浏览了

Python For Beginners

Python For Beginners

Python For Beginners

Python For Beginners

Python For Beginners

Scikit-Learn: A Comprehensive Machine Learning Library for Python

Unlocking Enterprise AI: Why Python is Your Key

The Grand Tapestry of Technological Innovation

Building a Machine Learning Model from Scratch Using?Python

Why Is Python Used for Machine Learning

What is a Cheat Sheet ?

Data Scientist, What are you in a hurry for ?

Why ML algorithms cheat sheets are a bad idea?

Data and Assumptions

Cheat sheets put you on a path with no U-turns or detours

No opportunity to Innovate

Cheat sheet makes your Decision making ‘machine-like’

No free lunch theory — The final nail in the coffin

Data Science requires heavy dose of statistics not less

2022年1月1日

“All models are wrong, some are useful” ≠ Modeling is a futile exercise

2021年7月21日

Abstraction and Data Science - Not a great combination

2021年7月8日

Predicting Heart Disease using Machine Learning? Don’t!

2020年11月3日

Why MOOCs may not help you get that Data Science Job

2020年10月5日

Ain’t No Such a Thing as a "Citizen Data Scientist"

2020年9月25日

As a data scientist, what are some dead giveaways that a person is a complete amateur?

2020年1月22日

Degrees of Freedom and Sudoko

2019年1月28日

How I used NLP (Spacy) to screen Data Science Resumes

2019年1月15日

Want To Become a Data Scientist? Try Feynman Technique.

2018年1月11日

社区洞察

其他会员也浏览了

Python For Beginners

Python For Beginners

Python For Beginners

Python For Beginners

Python For Beginners

Scikit-Learn: A Comprehensive Machine Learning Library for Python

Unlocking Enterprise AI: Why Python is Your Key

The Grand Tapestry of Technological Innovation

Building a Machine Learning Model from Scratch Using?Python

Why Is Python Used for Machine Learning