R programming
Darshika Srivastava
Associate Project Manager @ HuQuo | MBA,Amity Business School
R?is a?programming language?for?statistical computing?and graphics supported by the R Core Team and the R Foundation for Statistical Computing. Created by statisticians?Ross Ihaka?and?Robert Gentleman, R is used among?data miners?and?statisticians?for?data analysis?and developing?statistical software. Users have created packages to augment the functions of the R language
According to surveys like?Rexer's Annual Data Miner Survey?and studies of scholarly literature databases, R is one of the most commonly used programming language used in data mining.[6][citation needed]?As of January?2022,?R ranks 12th in the?TIOBE index, a measure of programming language popularity.[7]
The official R software environment is an open-source?free software?environment within the?GNU package, available under the?GNU General Public License. It is written primarily in?C,?Fortran, and R itself (partially?self-hosting). Precompiled?executables?are provided for various?operating systems. R has a?command line interface. Multiple third-party?graphical user interfaces?are also available, such as?RStudio, an?integrated development environment, and?Jupyter, a?notebook interface.
Contents
1 History
2 Features
2.1 Statistics
2.2 Programming
3 Packages
4 Milestones
5 Interfaces
6 Implementations
7 Communities
8 useR! conferences
9 The R Journal
10 Comparison with alternatives
11 Commercial support
12 Examples
12.1 Basic syntax
12.2 Structure of a function
12.3 Modeling and plotting
12.4 Mandelbrot set
13 See also
14 Notes
15 References
16 External links
History[edit]
R is an open-source implementation of the?S programming language?combined with?lexical scoping?semantics from?Scheme, which allow objects to be defined in predetermined blocks rather than the entirety of the code.[1]?S was created by Rick Becker,?John Chambers, Doug Dunn, Jean McRae, and Judy Schilling at?Bell Labs?around 1976. Designed for statistical analysis, the language is an?interpreted language?whose code could be directly run without a?compiler.[8]?Many programs written for S run unaltered in R.[9]?Scheme was created by?Gerald J. Sussman?and?Guy L. Steele Jr.?at?MIT?around 1975.[10]
In 1991, statisticians Ross Ihaka and Robert Gentleman at the?University of Auckland, New Zealand, embarked on an S implementation.[11]?It was named partly after the first names of the first two R authors and partly as a play on the name of S.[9]?They began publicizing it on the data archive StatLib and the?s-news?mailing list in August 1993.[12]?In 1995, statistician Martin M?chler convinced Ihaka and Gentleman to make R a?free and open-source software?under the?GNU General Public License.[12][13][14]?The first official release came in June 1995.[12]?The first official?"stable beta"?version (v1.0) was released on 29 February 2000.[15][16]
The?Comprehensive R Archive Network?(CRAN) was officially announced on 23 April 1997. CRAN stores R's executable files, source code, documentations, as well as packages contributed by users. CRAN originally had 3 mirrors and 12 contributed packages.[17]?As of January 2022, it has 101 mirrors[18]?and 18,728 contributed packages.[19]
The R Core Team was formed in 1997 to further develop the language.[9]?As of January?2022, it consists of Chambers, Gentleman, Ihaka, and M?chler, plus statisticians Douglas Bates,?Peter Dalgaard,?Kurt Hornik, Michael Lawrence, Friedrich Leisch, Uwe Ligges,?Thomas Lumley, Sebastian Meyer, Paul Murrell, Martyn Plummer,?Brian Ripley, Deepayan Sarkar, Duncan Temple Lang,?Luke Tierney, and Simon Urbanek, as well as computer scientist Tomas Kalibera. Stefano Iacus, Guido Masarotto, Heiner Schwarte, Seth Falcon, Martin Morgan, and Duncan Murdoch were members.[20]?In April 2003,[21]?the R Foundation was founded as a non-profit organization to provide further support for the R project.[9]
Features[edit]
Statistics[edit]
R and its libraries implement various statistical and?graphical?techniques, including?linear?and?nonlinear?modeling, classical statistical tests,?spatial?and?time-series analysis, classification, clustering, and others. R is easily extensible through functions and extensions, and its community is noted for contributing packages. Many of R's standard functions are written in R,[citation needed]?which makes it easy for users to follow the algorithmic choices made. For computationally intensive tasks,?C,?C++, and?Fortran?code can be?linked?and called at run time. Advanced users can write C, C++,[22]?Java,[23]?.NET[24]?or?Python?code to manipulate R objects directly.[25]?R is highly extensible through the use of packages for specific functions and specific applications. Due to its?S?heritage, R has stronger?object-oriented programming?facilities than most statistical computing languages.[citation needed]?Extending it is facilitated by its?lexical scoping?rules.[26]
Another of R's strengths is static graphics; it can produce publication-quality graphs that include mathematical symbols. Dynamic and interactive graphics are available through additional packages.[27]
Programming[edit]
R is an?interpreted language; users typically access it through a?command-line interpreter. If a user types?2+2?at the R command prompt and presses enter, the computer replies with 4.
Like languages such as?APL?and?MATLAB, R supports?matrix arithmetic. R's?data structures?include?vectors,?matrices, arrays, data frames (similar to?tables?in a?relational database) and?lists.[28]?Arrays are stored in?column-major order.[29]?R's extensible object system includes objects for (among others):?regression models,?time-series?and?geo-spatial coordinates. R has no scalar data type.[30]?Instead, a scalar is represented as a length-one vector.[31]
Many features of R derive from?Scheme. R uses?S-expressions?to represent both data and code.[citation needed]?Functions are?first-class?objects and can be manipulated in the same way as data objects, facilitating?meta-programming?that allows?multiple dispatch. Variables in R are?lexically scoped?and?dynamically typed.[32]?Function arguments are passed by value, and are?lazy—that is to say, they are only evaluated when they are used, not when the function is called.[33]
R supports?procedural programming?with?functions?and, for some functions,?object-oriented programming?with?generic functions.[34]?A generic function acts differently depending on the?classes?of the arguments passed to it. In other words, the generic function?dispatches?the?method?implementation specific to that object's?class. For example, R has a?generic?print?function that can print almost every?class?of?object?in R with?print(objectname)[35]
Although used mainly by statisticians and other practitioners seeking an environment for statistical computation and software development, R can also operate as a?general matrix calculation?toolbox – with performance benchmarks comparable to?GNU Octave?or?MATLAB.[36]
Packages[edit]
Main article:?R package
R's capabilities are extended through user-created[37]?packages, which offer statistical techniques, graphical devices, import/export, reporting (RMarkdown,?knitr,?Sweave), etc. R's packages and the ease of installing and using them, has been cited as driving the language's widespread adoption in?data science.[38][39][40][41][42]?The packaging system is also used by researchers to create compendia to organise research data, code and report files in a systematic way for sharing and archiving.[43]
Multiple packages are included with the basic installation. Additional packages are available on CRAN,[18]?Bioconductor, Omegahat,[44]?GitHub, and other repositories.[45][46][47]
The "Task Views" on the CRAN website[48]?lists packages in fields including Finance, Genetics, High Performance Computing, Machine Learning, Medical Imaging, Social Sciences and Spatial Statistics. R has been identified by the?FDA?as suitable for interpreting data from clinical research.[49]?Microsoft maintains a daily snapshot of CRAN that dates back to Sept. 17, 2014.[50]
Other R package resources include R-Forge,[51]?a platform for the collaborative development of R packages. The Bioconductor project provides packages for genomic data analysis, including object-oriented data-handling and analysis tools for data from?Affymetrix,?cDNA?microarray, and next-generation?high-throughput sequencing?methods.[52]
A group of packages called the?Tidyverse, which can be considered a "dialect" of the R language, is increasingly popular among developers.[note 1]?It strives to provide a cohesive collection of functions to deal with common data science tasks, including data import, cleaning, transformation and visualisation (notably with the?ggplot2?package).
R is one of 5 languages with an?Apache Spark?API, along with?Scala,?Java,?Python, and?SQL.