Python vs R – Who Is Really Ahead in Data Science, Machine Learning?
Google Trends, Jan 2012 - Aug 2017, "Python Machine Learning", "R Machine Learning", "Python data science", and "R data science".

Python vs R – Who Is Really Ahead in Data Science, Machine Learning?

My recent analysis of KDnuggets Poll results (Python overtakes R, becomes the leader in Data Science, Machine Learning platforms) has gathered a lot of attention and generated a tremendous number of comments, discussion, and inevitable critique from proponents of both languages.

Some have complained that the poll is not scientific and voters represent a self-selected sample. That is obviously true. But KDnuggets has conducted polls since 2001 and reaches a large audience of several hundred thousand visitors each month. In our experience KDnuggets polls have been a good indicator of trends and developments in Data Mining and Data Science. We tracked R vs Python debate for several years, so unlike other sites we can compare the latest poll results with several previous years.

Let's examine other measures of Python vs R popularity among Data Scientists.

First, we analyze Google Trends (this was also done by DSC after the publication of our poll results).

Python is a much more popular language overall, and it is IEEE Spectrum No. 1 language of 2017 (thanks to Martin Skarzynski for the link), so it is unfair to compare Python and R searches directly, but we can compare Google Trends for search terms "Python data science" vs "R data science".

Here is the chart since Jan 1, 2012. Note that if you select the range that includes full months, and start in 2012, then you get smoothed monthly trends, rather than more chaotic weekly trends.

Fig. 1: Google Trends, Jan 2012 - Aug 2017, "Python data science" vs "R data science".

We note that R was slightly ahead in 2014 and 2015, as Data Science was gathering popularity, but "Python data science" searches moved ahead of "R data science" in late 2016 and are clearly ahead since January 2017.

Note: the statistics are the same regardless of how Data Science is capitalized: "Data Science" or "data science", but Google autocomplete suggests "data science" for both Python and R.

However, recently Machine Learning has become very popular - see my post Machine Learning overtaking Big Data? (May 2017), so let's examine Python vs R for "Machine Learning" in Google Trends.

Fig. 2: Google Trends, Jan 2012 - Aug 2017, "Python Machine Learning", "R Machine Learning", "Python data science", and "R data science".

We see that "Python Machine Learning" is way ahead of "Python data science", and both are significantly ahead of "R data science" and "R Machine Learning".

Relative search volume for Aug 2017 was

  • Python Machine Learning: 100
  • Python data science: 49
  • R data science: 33
  • R Machine Learning: 32 

Next, let's look at job ads on indeed.com. All numbers below are for jobs in USA as of Sep 11, 2017.

We represent this relationship in a Venn Diagram.

Read the rest on KDnuggets -  

Python vs R – Who Is Really Ahead in Data Science, Machine Learning? - Sep 12, 2017.      

https://www.kdnuggets.com/2017/09/python-vs-r-data-science-machine-learning.html

Nuwan Thuduwage ?????????

Senior Software Engineer @ EPAM Systems

6 年

I think it is based on the question type in Data Science, means in STATISTICAL INFERENCING I think R is leading while PREDICTIVE INFERENCING, it is PYTHON who is leading over R. Is there any forum, blog illustrate based on that ?

zuko Mthwesi

computational scientist

6 年

What is more vital in data mining?

回复
Deshan Lokuge

Senior Business Analyst at London Stock Exchange Group

7 年

Great post. Hope the insights are true since im new to data analysis and currently struggling as to which program to follow suit, R or Python.

回复
Dillon R. Johnstone

Think Tank & Government Watchdog, Founder, and Helper (beep boop).

7 年

I really want to knit-pick this - mostly because of the graphic alone. I can't stand how graphics are manipulated in such a way to seemingly increase the importance of an object in a single source. Python and R, are both great, but R analysts are often self-taught (anecdotal) from what I've seen because classrooms often focus on other data software. It's also important to ask WHY there are differences in the usage, i.e., are data scientists with certain backgrounds preferring one platform over the other for particular reasons?

Giovanni Bruni

Business Trainer, Consultant, Temporary Manager and Coach. On my own, but always looking for new adventures.

7 年
回复

要查看或添加评论,请登录

Gregory Piatetsky-Shapiro的更多文章

社区洞察

其他会员也浏览了