R Vs Python: The great debate
Kanja Farnadis
Data Analytics|Data Science|Machine Learning|Data Storyteller|Content Creator
When I was doing the google analytics certificate, I noticed that Google teaches R as the chosen preferred language. One would expect them to use Python seeing as it is the preferred language in tech. I had expected Google, as one of the greatest tech companies in the world, to teach the popular programming language. Turns out there has been an ongoing debate about which of the two programming languages is better for data analytics and data science.
In more ways than one, these two languages are pretty similar. Both are open source, free to use and quite easy to start using. They both do a good job in the data science process; beginning from data wrangling, cleaning, manipulation, visualization and automation. They both maintain their own in exploring big data as well.
What is R?
It is a programming language that was built by statisticians for visualization and statistical analysis of data. It was built by Robert Gentleman and Ross Ihaka who were both based in the University of Auckland in New Zealand. Interesting fact is that it was named R because it’s the first letter of the creators’ names. It is available for free under the General Public License and one can install it in various operating systems such as Mac, Windows and Linux. R was built off of S, a language that was developed for people who had a stronger background in statistics than programming. The greatest challenge of S is that it one had to buy the package S-PLUS. Ross and Robert preferred something open-source. And thus, R was born.
R has more than 10000 libraries, all which can be used for exploring, analyzing and visualizing data. In particular, the statistical packages are powerful and can perform complex mathematical problems. It is also useful when building statistical models. Primarily, R is used by statisticians, data analysts and data miners.
What about Python?
Python is a multi-purpose language, which can do a variety of tasks. Think of it like Java or C++ but with an almost natural syntax that is easier to learn. It is very popular in data science because of the in-built libraries for math and statistics. It is also a darling of the machine learning world especially due to scalability. Built by Guido van Rossum a little over 3 decades ago, Python remains one of the favorite programming languages across the board.
领英推荐
Python has numerous libraries depending on the field in which one is working. Scipy.Stats, Statsmodel and Pingouin are three popular statistical packages in Python. Matplotlib and seaborn are the two packages used in Python for data visualization.
So which should you Learn?
It depends on what you want to do. If the work is purely statistical, then R takes the day. Specifically designed by statisticians who had set out to build a statistical software that had computing skills, R does an exemplary job in statistical analysis. On the other hand, python was built for programmers but to make sure they use less lines of code. Do not forget though that, Python is a multi-purpose language and can be used beyond the world of data.
In my opinion, R has a steeper learning curve especially for beginners. This is unlike python which is generally recommended for beginners as its quite beginner friendly. For instance, in order to load and manipulate your dataset in R, you need to install dpylr, readr and tidyr while in Python you only need pandas to load and manipulate your data. R Studio on its own has a difficult interface which takes a while to learn how to navigate.
Verdict
Methinks they chose R since they figured python resources are available. Any self-taught data scientist worth their name will teach themselves python one way or another. Python’s popularity in data cannot be underpinned and yet the capabilities of R cannot be ignored.
My final take? Pick whichever language and get going!!
Data Scientist | Cybersecurity enthusiast | Rugby |
2 年Definitely python.
Data Governance @Diamond Trust Bank, Kenya | M.Sc. Data Analytics |
2 年Python has more capabilities for me....
Data Scientist
2 年Python