R Vs Python: The great debate

R Vs Python: The great debate

When I was doing the google analytics certificate, I noticed that Google teaches R as the chosen preferred language. One would expect them to use Python seeing as it is the preferred language in tech. I had expected Google, as one of the greatest tech companies in the world, to teach the popular programming language. Turns out there has been an ongoing debate about which of the two programming languages is better for data analytics and data science.

In more ways than one, these two languages are pretty similar. Both are open source, free to use and quite easy to start using. They both do a good job in the data science process; beginning from data wrangling, cleaning, manipulation, visualization and automation. They both maintain their own in exploring big data as well.

What is R?

It is a programming language that was built by statisticians for visualization and statistical analysis of data. It was built by Robert Gentleman and Ross Ihaka who were both based in the University of Auckland in New Zealand. Interesting fact is that it was named R because it’s the first letter of the creators’ names. It is available for free under the General Public License and one can install it in various operating systems such as Mac, Windows and Linux. R was built off of S, a language that was developed for people who had a stronger background in statistics than programming. The greatest challenge of S is that it one had to buy the package S-PLUS. Ross and Robert preferred something open-source. And thus, R was born.

R has more than 10000 libraries, all which can be used for exploring, analyzing and visualizing data. In particular, the statistical packages are powerful and can perform complex mathematical problems. It is also useful when building statistical models. Primarily, R is used by statisticians, data analysts and data miners.

What about Python?

Python is a multi-purpose language, which can do a variety of tasks. Think of it like Java or C++ but with an almost natural syntax that is easier to learn. It is very popular in data science because of the in-built libraries for math and statistics. It is also a darling of the machine learning world especially due to scalability. Built by Guido van Rossum a little over 3 decades ago, Python remains one of the favorite programming languages across the board.

Python has numerous libraries depending on the field in which one is working. Scipy.Stats, Statsmodel and Pingouin are three popular statistical packages in Python. Matplotlib and seaborn are the two packages used in Python for data visualization.

So which should you Learn?

It depends on what you want to do. If the work is purely statistical, then R takes the day. Specifically designed by statisticians who had set out to build a statistical software that had computing skills, R does an exemplary job in statistical analysis. On the other hand, python was built for programmers but to make sure they use less lines of code. Do not forget though that, Python is a multi-purpose language and can be used beyond the world of data.

In my opinion, R has a steeper learning curve especially for beginners. This is unlike python which is generally recommended for beginners as its quite beginner friendly. For instance, in order to load and manipulate your dataset in R, you need to install dpylr, readr and tidyr while in Python you only need pandas to load and manipulate your data. R Studio on its own has a difficult interface which takes a while to learn how to navigate.

Verdict

Methinks they chose R since they figured python resources are available. Any self-taught data scientist worth their name will teach themselves python one way or another. Python’s popularity in data cannot be underpinned and yet the capabilities of R cannot be ignored.

My final take? Pick whichever language and get going!!

Simon Mmari

Data Scientist | Cybersecurity enthusiast | Rugby |

2 年

Definitely python.

Micah kiprono

Data Governance @Diamond Trust Bank, Kenya | M.Sc. Data Analytics |

2 年

Python has more capabilities for me....

要查看或添加评论,请登录

Kanja Farnadis的更多文章

  • Self Joins

    Self Joins

    As a data analyst, you will have to combine data from different tables. This is where joins come in.

  • How to Clean Your Data

    How to Clean Your Data

    I’m sure we’ve all heard it, data is the new oil. But is there a better form of data? Should we use the data exactly as…

  • Why Should you Learn Python?

    Why Should you Learn Python?

    It’s 2022. Everyone wants to learn how to code.

  • Five Life Lessons Coding Bootcamp has taught me

    Five Life Lessons Coding Bootcamp has taught me

    I have never liked camps. The intensity of it, the short period, and the fast pace were not up my alley.

    4 条评论

社区洞察

其他会员也浏览了