R vs Python: What do Data Scientists prefer?
edgy.app

R vs Python: What do Data Scientists prefer?

R and Python are the most common programming languages in the data science world, but what exactly is the difference between the two?

This remains as a common topic debated within the Data Science community. Nevertheless, both programming languages have their own strengths and limitations in their application.

If you are a professional that is looking to start a career in this field, here are some key takeaways for both R and Python along with trends I am seeing in the Singapore and Hong Kong markets.  

History of the two programming languages

R

R is a statistical computing and graphics language and environment. According to R Project, it is a GNU Project – an operating system and an extensive collection of computer software – developed at Bell Laboratories. Similar to S language, R provides several options for statistical and graphic techniques.

Its functionalities include but are not limited to:

  • Linear and nonlinear modelling
  • Classical statistical tests
  • Time-series analysis
  • Classification and clustering.

Shared by R Project, R’s strengths include

  • It is a free software which runs on a wide variety of UNIX platforms namely Linus, Windows and MacOS
  • Ease with which well-designed publication-quality plots
  • Making design choices in graphics where user retains full control
  • Allows data manipulation calculation and graphical display
  • Storing and handling data

Overall, it is a simple and effective programming language which supports data scientists and experts to create and control conditionals, loops, user-defined recursive functions and input and output facilities.

Python

Python is a widely used, general-purpose, yet high-level programming language. Developed by Python Software Foundation, its main purpose was focusing on code readability to assist programmers to express concepts in a compressed form compared to Java, C++ and C. The objective is to provide code readability and advanced developer productivity.

Its functionalities include:

  • Developing and scripting code
  • Generation of code and software testing

Due to its elegance and simplicity, top technologically-driven organisations like Dropbox, Google, Quora, Mozilla, Hewlett-Packard, Qualcomm, IBM, and Cisco have implemented Python. Python is also an inspiration to the creation of many other coding languages such as Ruby, Cobra, Boo, CoffeeScript ECMAScript, Groovy, Swift Go, OCaml, Julia etc.

R vs Python: which is the preferred choice?

Dr Norm Matloff, Professor of Computer Science at University of California, wrote a paper on the key differences between the two Languages. He compared R and Python across the following multiple domains to determine which programming language was the better choice:

Elegance

Winner: Python

While this is subjective, Python greatly reduces the use of parentheses and braces when coding, making it more sleek, Matloff shared.

Machine Learning

Winner: Python (but not by much)

Python's massive growth in recent years is partially fuelled by the rise of machine learning and artificial intelligence (AI). Python offers a number of finely-tuned libraries for image recognition.

In Maltoff’s words, the Python libraries' power comes from setting certain image-smoothing operations.

Learning curve

Winner: R

Shared by Maltaff, data scientists working with Python must learn a lot of material to get started, including NumPy, Pandas and matplotlib. Nevertheless, matrix types and basic graphics are already built into base R. Novices can now be doing simple data analyses within minutes as R packages run automatically.

Statistical correctness

Winner: R (by far)

Advocates for Python – namely professionals working within machine learning – may seem to have a poor understanding of the statistical issues involved with the language. R, on the other hand, was written by statisticians, for statisticians. This suggests that subject matter experts in R will be able to ensure that the math behind analyses are as accurate as possible.

Parallel computation

Winner: It’s a draw

Matloff suggests that the base versions of R and Python do not have strong support for multicore computation. What he means by this is that both R’s parallel package, and Python's multiprocessing package is not a good workaround for its other issues. Nevertheless, external libraries supporting cluster computation are good in both languages, while Python has better interfaces to GPUs.

Libraries

Winner: Python

Python’s machine learning library – Scikit-learn – is deemed to be highly recognised as ‘gold-standard’. It provides a wide selection of supervised and unsupervised learning algorithms. Reported by Toward Data Science, this library, “by far the easiest and cleanest ML library”. Scikit learn was created with a software engineering mind-set. Its core API design revolves around being easy to use, yet powerful, and still maintaining flexibility for research endeavours. This robustness makes it perfect for use in any end-to-end ML project, from the research phase right down to production deployments.

What are the trends in Singapore and Hong Kong markets?

As a recruiter in this field, around 90% of all of the jobs that I am filling in Data Science and Analytics are looking for candidates that are well versed in Python. This is because Python offers a lot of flexibility as compared to R.

My key advice – if you are looking to grow your career in this field, it is best to focus on being familiar with the full suite of Python. Additionally, other in-demand skills for data professionals include SQL, Spark, Hadoop, Java, Amazon Web Services (AWS), Scala, and Kafka.

Huxley can help!

If you are a Data Science and Analytics professional that is looking to add top-tier talent to your team, please reach out to me as I am currently representing several star candidates in the market especially female and local/PR profiles. As well as if you are looking to progress towards the next stage in your career, do connect with me to find out about the exciting opportunities that are currently available within the most innovative organisations in the market. Feel free to connect with me on LinkedIn or via email at [email protected]. Do keep your eyes peeled for my upcoming article where I will be covering the differences between Power Bi and Tableau! 

Ajit Patil

Executive Leader | Data Polymath | Payments | AI Strategy | Engineering | Product | Innovation | People

5 年

Fantastic articulation? !! Donnie Maclary?. Preferences for any programming/modelling language change based on individuals and the Business Case they are looking to Deliver. Unless mandated by an Organization's Enterprise Architecture Group on Tech Roadmaps, developers are going to continue to develop in what works best for them.

Andy Tong

A Data Leader with a passion for Data & Analytics (a lifetime learner)

5 年

2 years ago, my manager asked me the same question. I said "Python" as it provide end-to-end solution implementation instead of just data analysis and modeling; however, my manager at that time don't agree :-)

Loganathan Ponnambalam

Nurturing ENERGY Consciousness || Being the adult I needed as a KID

5 年

Very well written Donnie Maclary? Python is my preference as well??

Donnie ?????? Maclary

#DataScienceDonnie | Improving companies one candidate at a time | US Army Veteran

5 年

Thank you Loganathan Ponnambalam?for being my sounding board for this article. #SharingisCaring?#Teamwork?#AllAboutWhoYouKnow?#BondingOverDataScience

回复

要查看或添加评论,请登录

Donnie ?????? Maclary的更多文章

社区洞察

其他会员也浏览了