R or Python, that is the question!
Afshin Ashofteh, PhD
University Professor at NOVA IMS, Nova University of Lisbon (PhD, PGDip, MBA, MSc, BSc)
I want to answer the following questions that I get a lot of, especially, in my data science classes and even from data analysts and developers that have been in the field for some years.
So, I want to tackle these questions really quickly in one sentence:
R and Python are both just tools! use whatever language you like better!
We know that R is more data science software than Python which is a general-purpose object-oriented programming language. Python is more aimed at software engineers and happens to support data science as one of its applications. However, as a data scientist you must learn both, because the choice between Python and R will depend on the specific project and the skills and preferences of the team and you must be ready. Many data scientists use both languages and choose the one that best suits the task at hand. In fact, R and Python are complementary and there is no significant difference between them.?
Why?
I came from SPSS/SAS/Splus/Minitab/Stata/Eviews/Matlab in undergrad(1996) and worked with R starting in 2002, Python in 2010, and PySpark in 2016. I am certified in Network+, Cisco Routing Switching (CCNA), Windows Server (Microsoft Active Directory), and LPIC for Linux, and I started coding in 1996. All of them with no documentation, no StackOverflow, and no GitHub repos. Only one reference book for each and some classes!
Nowadays, we have many resources available online! If you're just beginning or are a few years into your data journey, keep going with continuous learning and growth! Start with R if you like to be a data analyst or Python if you are a programmer, simply by enjoying whatever is shared with their powerful communities! You might need to use only one of these tools according to the workflow or combine them according to your final product. Remember that they are only tools and there are many others in the market! Focus more on data science concepts rather than the software and tools, and never participate in the Python vs. R debates!
Q/As:
Our discussion group: https://www.dhirubhai.net/groups/12420006/
领英推荐
Q1: According to DZone, the popularity of Python has grown much more than R's. Could this indicate that in the future Python will have more libraries than R? Is this rise in Python's popularity driven by business?
A1: In my opinion, Python is a great option for data analysis and learning the computer programming logic. If you like programming, go for Python and remember R and Python are the only vehicles to get to destination. Regarding popularity, I guess all these charts and online trends are not accurate. Why? (1) For R before 2018 as the most popular one and after that Python took over, the population, purpose, or business sector of these comparisons are not clear! (2) More online activity or search keywords are not popularity! (3) High job demand is not also all about software! Companies might prefer employees with data scientist skills that could also work as a programmer, a developer, and an engineer! They want all-in-one! Their strategy is Python might be better because it is a multipurpose language! Of course, Python might cover all of these but almost impossible for a human being! Data science is teamwork, and software never matters for a professional team! Historically, we could see different programming languages and professionals with diverse skills in one project! Knowledge of data science concepts is more important than its tools, and in my opinion, AI will manage the software part soon. So, use whatever language you like better!
Q2: I have heard that R libraries are often times peer-reviewed or not reviewed at all. Do you believe it to be a limitation of R and could that constitute a hindrance in terms of fact checking results in data analysis?
A2: Unfortunately, there is fake news about R and Python! Let me give you some resources. We have the following organization for Peer review of R packages. rOpenSci for R: https://ropensci.org/packages/all/ . We also have PyOpenSci for Python (PyOpenSci was modeled after rOpenSci). The first package submitted to pyOpenSci was in May 2019. pyOpenSci for Python: https://www.pyopensci.org/python-packages.html . You can check the peer-reviewed packages in both. For example, in 2019, we had 12,500 R packages. More or less 6000 beta version packages. It means we still had 6500 final R packages! Also, check the number of R packages available on CRAN and Python online package repository PyPI. These numbers were discussed only because of the question but remember nobody cares these numbers in R or Python communities! In Python, you can do most of your job only by NumPy and pandas! My point is the knowledge of data science concepts is more important than its tools. So, use whatever language you like better and remember, R and Python are complementary, and there is no significant difference between them! Finally, two good references if you are interested in this topic: Link1 , Link2
Summary of comments (Last update: 16/02/2023)
After exploring the use of both R and Python in financial data research, it has become clear that each language has its advantages and disadvantages. The optimal choice depends on the specific task and the user's preferences. R is highly suited to statistical modelling, analysis, and data visualization, with a robust ecosystem of libraries specifically designed for financial analysis. Python is a general-purpose programming language that offers various libraries for numerical calculus and data manipulation.
Some comments show that Python may seem cleaner and more intuitive for newcomers to programming languages, but R is highly efficient for plotting data with simple and concise code. Additionally, novice programmers often prefer Python because its code and documentation are more user-friendly. However, R can perform complex operations in a single line of code. In contrast, Python requires the user to spell out each step, which may be tedious for experienced programmers. R and Python have advantages and disadvantages, but individuals aspiring to become data scientists should learn both languages.
Using packages in R or Python enables the central aspects of algorithms and details to be incorporated into the code, eliminating the need for step-by-step programming. However, the comments emphasize that individuals must possess the required theoretical knowledge to use these packages effectively.
During the post-graduation in Data Science for Finance, students have gained knowledge of R and Python. Some find it easy to explore new solutions for these open-source programming languages online. With R, packages are available online to help with the analysis, while with Python, one needs to create some structures from scratch.
Some comments show students' interest in coding in Python for data manipulation and visualizing data using R. They also agree that many programmers prefer Python, but data scientists are not necessarily programmers. Experts often contribute to both communities to create better tools.?
Some comments show an interest in coding in SAS, Python, Spark, R, and SQL, which students found necessary for their different projects. They also mentioned MATLAB as another valuable tool in this field.
Some comments mentioned that Python has a higher preference in the worldwide market, but no concrete evidence or scientific study exists.
Investment analyst ? Fidelidade | xBNP Paribas | CFA Level III Passed
10 个月In my opinion, R is more useful for data analysis and visualization , it has wide range of packages available (not very well documented though) , while python is more versatile and excel in end-to-end production system. The choice between Python and R depends on project requirements, infrastructure, and user preferences. (m20211132)
Advisor na Banco de Portugal
1 年The text raises a pivotal choice between R and Python in the world of data science and programming. In my view, Python is an excellent starting point for aspiring programmers due to its adaptability and wide applicability. On the other hand, R is a strong choice for those looking to perform data analytics projects, in particular those with a deep theoretical knowledge in data analytics and its functioning processes. Understanding both languages is essential for those pursuing a career in data science, as they complement each other. The choice between R and Python should depend on the specific task and align with one's interests and career objectives. Moreover, Python's versatility positions it as a cornerstone of software for businesses in today's fast-paced digital landscape. Simultaneously, AI's transformative potential transcends any single language, affecting not only Python and R but numerous others. Python's growing popularity hints at future library expansion, while R's dedication to peer-review processes underscores quality. This dynamic interaction emphasizes the vital roles of Python and R while highlighting the broader importance of mastering data science concepts in an evolving, AI-driven data analytics landscape.
Cash Operations Analyst at Brinks
1 年In my opinion, I find it easier to work with R, since my first contact with both R and Python was in the class Data Science for Finance. I first started using Python and it seemed to be easier and processed the data clearly. After focusing more in R, from my perspective It has more features that we have quick acess to. R has a comprehensive collection of packages and libraries specifically designed for statistical analysis and data visualization, including "tidyverse" packages such as ggplot2 and dplyr.
Estudante de Mercados e Riscos Financeiros
1 年In my opinion, the debate between using R or Python can be compared analogously to the discussion of which is the best broker on the market (for example),?i.e.?there is no generic correct answer, but rather an answer that depends on the need that the individual in question wants to see satisfied. Considering my knowledge and contact with the two programming languages in question, Python ends up being directly more versatile in all areas of work, with special emphasis on the development of websites and software, however, due to this characteristic, in matters related to statistics, hypothesis testing and the visualization of graphs, the use of the R language is more efficient in terms of syntax and ability to work with data. That said, in my opinion, having knowledge and experience in these two programming languages is a necessary condition for any professional who is connected to the world of Data Science, where depending on their project, this type of professional will certainly be able to identify and use the language that is best suited to solving his challenge.