#ArtificialIntelligence No 65: Why R lost the R vs Python wars and what that tells you about where AI is going
Firstly, my apologies for the one-day delay for this newsletter
I am in NYC for the ‘tale of two cities’ hackathon
If you are here, come along
We are sharing the work we did at Oxford in partnership with Microsoft
It is great to be back to the USA post COVID and back in NYC?
I love New York!?
Considering I like animation and comics,? I even met some friends on Times Square as you can see from the pic!
Now to some more serious things :)
Despite the title of this blog, the blog is not about R vs Python
Python has clearly won?
But understanding why will give you a very good insight into where AI, Machine learning and Deep Learning are going
I saw a post from Isaac Faber which is really insightful and succinct?
the real difference:
Statistics: 4 parameters is way too many, we will totally over fit.
Machine Learning: 4 billion parameters is way too few, need at least 4 trillion.
That’s the crux of this blog and that’s also in my view, the real reason why R lost out
R community was not interested in deep learning In my view, the battle of R vs Python was lost in the deep learning space. R users were just not interested in deep learning at all, it was mostly statistics.?
In my course at Oxford, I also started off with R - but the R community were simply not interested in models like MLP, CNN etc .. with? two notable exceptions - h2o.ai and Microsoft (with their acquisition of revolution analytics).
But? sadly, they remained the only two with some support for R and deep learning
The rest of the community were largely uninterested with models with large number of parameters
?So, the real point we are trying to make is ..?
领英推荐
1) The world is going towards highly parameterised models?
2) Deep learning (and also machine learning) lean towards highly parameterised models? while statistics does not?
3) that’s the real reason why(in my view) R never picked up on deep learning
This point was explored in my previous posts
There are two cultures in the use of statistical modeling to reach conclusions from data. One assumes that the data are generated by a given stochastic data model. The other uses algorithmic models and treats the data mechanism as unknown. The statistical community has been committed to the almost exclusive use of data models. This commitment has led to irrelevant theory, questionable conclusions, and has ? kept statisticians from working on a large range of interesting current problems. Algorithmic modeling, both in theory and practice, has developed rapidly in ?elds outside statistics. It can be used both on large complex data sets and as a more accurate and informative alternative to data modeling on smaller data sets. If our goal as a ?eld is to use data to solve problems, then we need to move away from exclusive dependence on data models and adopt a more diverse set of tools.
And from the same post
In the machine learning community, we evaluate a variety of models, select the best performing model and empirically determine loss on test set with the goal of predicting the outcome for new/unseen samples.? In the statistical community, we try to understand the data generation process and select a model whose assumptions seem most reasonable for that distributions. Using goodness of fit tests, we use the model to explain the data generation process and understand the parameters
If anything, this trend towards models with more parameters has accelerated dramatically with large language models with models like GPT-3, Bloom, etc . Researchers open-source neural network with 117B parameters
In terms of parameters for a model, it's useful to recap what we mean
The end goal of machine learning is to learn a function f that maps input variables (X) to output variables (Y). Different algorithms make different assumptions or biases about the form of the function and how it can be learned.
A learning model that summarizes data with a set of parameters of fixed size (independent of the number of training examples) is called a parametric model. No matter how much data you throw at a parametric model, it won’t change its mind about how many parameters it needs.
— Artificial Intelligence: A Modern Approach, page 737
Algorithms that do not make strong assumptions about the form of the mapping function are called nonparametric machine learning algorithms. By not making assumptions, they are free to learn any functional form from the training data.
Nonparametric methods are good when you have a lot of data and no prior knowledge, and when you don’t want to worry too much about choosing just the right features.
— Artificial Intelligence: A Modern Approach, page 757
While the meaning is the same, the sheer number of parameters implies that machine learning and deep learning are diverging from statistics due to their ability to handle highly parameterized models.??
Finally, I am not discussing ethical concerns here. Yes, the statistical community leans towards explainable models. I am just pointing out that the world is leaning towards models with increasingly larger number of parameters.?
While these models will have some issues - I believe that the overall trend is very clear.?
If you want to learn with us at the University of Oxford, our AI course is about to open for the fall . Please sign up if you are interested.
I help people understand and apply AI
2 年Nice summary Ajit Jaokar and thanks for the citation
Senior Data Scientist
2 年Good post. Thank you.
NO headline
2 年In a few years I guess you won't require any programming language. AI might be able to compile your code from natural language pseudocode.
Data Science, Data insight and visualization expert
2 年It is not a war. R is used for special operations ??