登录查看更多内容

7 Common Data Science Mistakes and How to Avoid Them

Gregory Piatetsky-Shapiro

Part-time philosopher, Retired, Data Scientist, KDD and KDnuggets Founder, was LinkedIn Top Voice on Data Science & Analytics. Currently helping Ukrainian refugees in MA.

发布日期: 2016年2月1日

By Khushbu Shah, DeZyre.

“Mistakes are the portals of discovery.”- James Joyce(famous Irish novelist). This is true in most cases, but in case of data scientists, making mistakes help them discover new data trends and find more patterns in the data. Having said this, it is imperative to understand that Data Scientists have a very small margin for error. Data Scientists are hired after a lot of deliberation and at a high cost. Organizations cannot afford to disregard bad data practices and repeated mistakes from Data Scientists. Mistakes and bad practices in data science can cost a data scientist her/his career. It is vital for data scientists to track all data science experiments, learn from the mistakes and avoid them in future data science projects.

A famous quote by Sherlock Holmes well defines that the role of a data scientist in business is as similar as to that of a detective-

“My name is Sherlock Holmes. It is my business to know what other people don’t know.”

For a business to stay competitive, it has to do more than just Big Data Analytics. Without assessing the quality of data they have, the kind of outcome they want and how much profit they are expecting from this kind of data analysis – it becomes difficult to correctly figure out which data science projects will be profitable and which will not. When it comes to data science mistakes- it is acceptable once – considering that there is a learning curve but if these mistakes happen more than twice, it can cost the business.

Learn Data Science in Python to become an enterprise data scientist

Common Data Science Mistakes to Avoid

Confusion between Correlation and Causation

Mistaking Correlation with Causation can lead to a costly affair for any data scientist. The best example here is the analysis of Freakonomics in which getting correlation for causation wrong, led Illinois to send books to every student in the state because the analysis revealed that books available at home are directly correlated to high test marks. Further analysis showed, that students from homes which have several books performed better in their academics even if they have never read the books. This helped make corrections in the earlier assumptions with the insight that houses wherein parents usually buy books have an exhilarated learning environment.

Most of the data scientists when working with big data assume that correlation directly implies causation. It is often a good practice to use big data to understand the correlation between two variables, however, always using “cause and effect” analogy might render false predictions and unproductive decisions. To make use of big data for best results, it is necessary that data scientists understand the difference between correlation and root cause. Correlation means X and Y tend to be observed at the same time whereas Causality means X causes Y. These are two completely different things in data science, however the difference is often ignored by many data scientists. A decision based on correlation might be good enough to take an action on, without having to know the cause; but this is completely dependent on the kind of data and the problem being solved.

A lesson every data scientist must learn is that- “Correlation is not Causation in data science”. If two items appear to be related to each other, it does not mean that one causes the other.

Not Choosing the Right Visualization Tools

Most of the data scientists concentrate on learning the technical aspects of analysis. They fail to focus on understanding the data using different visualization techniques which can actually make them derive insight much faster. The value of even the best machine learning models is diluted if a data scientist does not choose the right kind of visualizations to model development, to monitor exploratory data analysis or to represent the results. In fact, many data scientists choose the chart type visual based on their aesthetic taste instead of considering the characteristic of their dataset. This can be avoided by defining the goal of the visualization as the first step.

Even if a data scientist develops an optimum and best machine learning model it will not scream out saying “Eureka”- all that is needed is effective visualization of the results to understand the difference between a data pattern and realizing its existence to be utilized for business outcomes. As the popular saying goes “A picture is worth a 1000 words.”- It is necessary that data scientists not only familiarize themselves with data visualization tools but also understand the principles of effective data visualization to render results in a compelling way.

A crucial step towards solving any data science problem is to get an insight on what the data is about, by representing it through rich visuals that can form the foundation for analysis and modelling it.

Read the rest at KDnuggets: 7 Common Data Science Mistakes and How to Avoid Them

https://www.kdnuggets.com/2016/01/7-common-data-science-mistakes.html

Martina Pugliese, PhD

9 年

This will sound pretentious, but someone who makes mistake num. 1 is not a data scientist to me.

Dr. Mark A. Biernbaum

Developmental Psychologist; Researcher; Methodologist, Statistical Analyst, Teacher; Therapist; Painter; Journalist

9 年

I didn't know that Data Scientists made mistakes! This is shaking the ground beneath my feet!

Hena Jose

Genomics | Healthtech | Lifescience | Information Technology

9 年

Good points

pradeep sharma

junior data scientist

9 年

Gregory nice to hear our mistakes

查看更多评论

要查看或添加评论，请登录

Gregory Piatetsky-Shapiro的更多文章

KDnuggets: Personal History and Nuggets of Experience

2021年12月4日

KDnuggets: Personal History and Nuggets of Experience

Dear Readers, I have big news! After 40+ years of working full time, including 35+ years of data mining/KDD/data…

160 条评论
Which Data Science Skills are core and which are hot/emerging ones?

2019年9月17日

Which Data Science Skills are core and which are hot/emerging ones?

The latest KDnuggets Poll asked 1. Which skills / knowledge areas do you currently have (at the level you can use in…

30 条评论
Gainers, Losers, and Trends in Gartner 2019 Magic Quadrant for Data Science and Machine Learning Platforms

2019年2月11日

Gainers, Losers, and Trends in Gartner 2019 Magic Quadrant for Data Science and Machine Learning Platforms

For the first time in several years the name of this highly anticipated Gartner MQ for Data Science and Machine…

10 条评论
AI, Data Science, Analytics Main Developments in 2018 and Key Trends for 2019

2018年12月4日

AI, Data Science, Analytics Main Developments in 2018 and Key Trends for 2019

As in the past, we bring you a roundup of predictions and analysis from experts. We have asked What were the main…

6 条评论
How Important is that Machine Learning Model be Understandable?

2018年11月19日

How Important is that Machine Learning Model be Understandable?

The previous KDnuggets Poll asked When building Machine Learning / Data Science models in 2018, how often was it…

10 条评论
Anticipating the next move in data science – my interview with Thomson Reuters

2018年11月18日

Anticipating the next move in data science – my interview with Thomson Reuters

Thomson Reuters has a series, AI experts, where they interview thought leaders from different areas - including…

11 条评论
Amazing consistency: Largest Dataset Analyzed / Data Mined – Poll Results and Trends

2018年10月31日

Amazing consistency: Largest Dataset Analyzed / Data Mined – Poll Results and Trends

The latest KDnuggets Poll asked: What was the largest dataset you analyzed / data mined? This poll received 1108 votes,…

5 条评论
How many Data Scientists are there and is there a shortage?

2018年9月19日

How many Data Scientists are there and is there a shortage?

(this blog was jointly written with Preet Gandhi, NYU) The 2011 McKinsey report on Big Data said that “The United…

8 条评论
Why Germany did not defeat Brazil in the final, or Data Science lessons from the World Cup

2018年7月30日

Why Germany did not defeat Brazil in the final, or Data Science lessons from the World Cup

This article is based on a KDnuggets blog jointly written with Dan Clark. The 2018 World Cup is over, with France…

45 条评论
SuperDataScience Podcast: Insights from the Founder of KDnuggets

2018年7月23日

SuperDataScience Podcast: Insights from the Founder of KDnuggets

I recently appeared on Super DataScience Podcast, where I had an interesting conversation with SDS Founder Kirill…

4 条评论

See all articles

7 Common Data Science Mistakes and How to Avoid Them

Gregory Piatetsky-Shapiro

Part-time philosopher, Retired, Data Scientist, KDD and KDnuggets Founder, was LinkedIn Top Voice on Data Science & Analytics. Currently helping Ukrainian refugees in MA.

Gregory Piatetsky-Shapiro的更多文章

社区洞察

其他会员也浏览了

Importance of Data Science Project for Job

Unleashing the Power of Data: Essential Skills for a Thriving Career in Data Science

What it takes to build a successful career in Data Science ?

Data Science for First Timers

Grow with 10x Speed in Your Career after Mastering the Art of Data Science with KIMO

6 Data Science Lessons Learned the Hard Way: A Blog about the process of learning about Data Science

Role of Data Science in the Business World

9 Tips For Data Science Success

How Data Science Came to Be

Role of Data Science in the Business World

Gregory Piatetsky-Shapiro的更多文章

KDnuggets: Personal History and Nuggets of Experience

Which Data Science Skills are core and which are hot/emerging ones?

Gainers, Losers, and Trends in Gartner 2019 Magic Quadrant for Data Science and Machine Learning Platforms

AI, Data Science, Analytics Main Developments in 2018 and Key Trends for 2019

How Important is that Machine Learning Model be Understandable?

Anticipating the next move in data science – my interview with Thomson Reuters

Amazing consistency: Largest Dataset Analyzed / Data Mined – Poll Results and Trends

How many Data Scientists are there and is there a shortage?

Why Germany did not defeat Brazil in the final, or Data Science lessons from the World Cup

SuperDataScience Podcast: Insights from the Founder of KDnuggets

社区洞察

其他会员也浏览了

Importance of Data Science Project for Job

Unleashing the Power of Data: Essential Skills for a Thriving Career in Data Science

What it takes to build a successful career in Data Science ?

Data Science for First Timers

Grow with 10x Speed in Your Career after Mastering the Art of Data Science with KIMO

6 Data Science Lessons Learned the Hard Way: A Blog about the process of learning about Data Science

Role of Data Science in the Business World

9 Tips For Data Science Success

How Data Science Came to Be

Role of Data Science in the Business World