登录查看更多内容

#ArtificialIntelligence No 65: Why R lost the R vs Python wars and what that tells you about where AI is going

Ajit Jaokar

发布日期: 2022年7月21日

+ 关注

Firstly, my apologies for the one-day delay for this newsletter

I am in NYC for the ‘tale of two cities’ hackathon

If you are here, come along

We are sharing the work we did at Oxford in partnership with Microsoft

It is great to be back to the USA post COVID and back in NYC?

I love New York!?

Considering I like animation and comics,? I even met some friends on Times Square as you can see from the pic!

Now to some more serious things :)

Despite the title of this blog, the blog is not about R vs Python

Python has clearly won?

But understanding why will give you a very good insight into where AI, Machine learning and Deep Learning are going

I saw a post from Isaac Faber which is really insightful and succinct?

the real difference:

Statistics: 4 parameters is way too many, we will totally over fit.

Machine Learning: 4 billion parameters is way too few, need at least 4 trillion.

That’s the crux of this blog and that’s also in my view, the real reason why R lost out

Miguel Fierro also posted some insightful analysis where he said

R community was not interested in deep learning In my view, the battle of R vs Python was lost in the deep learning space. R users were just not interested in deep learning at all, it was mostly statistics.?

In my course at Oxford, I also started off with R - but the R community were simply not interested in models like MLP, CNN etc .. with? two notable exceptions - h2o.ai and Microsoft (with their acquisition of revolution analytics).

But? sadly, they remained the only two with some support for R and deep learning

The rest of the community were largely uninterested with models with large number of parameters

?So, the real point we are trying to make is ..?

Open Data Science Conference (ODSC) 8 个月前

Time Series Analysis with SARIMAX, LSTM, and FB…

Muhammad Aftab Ahmed 1 年前

Latest AI Python Packages

Ashok Veda 1 年前

1) The world is going towards highly parameterised models?

2) Deep learning (and also machine learning) lean towards highly parameterised models? while statistics does not?

3) that’s the real reason why(in my view) R never picked up on deep learning

This point was explored in my previous posts

There are two cultures in the use of statistical modeling to reach conclusions from data. One assumes that the data are generated by a given stochastic data model. The other uses algorithmic models and treats the data mechanism as unknown. The statistical community has been committed to the almost exclusive use of data models. This commitment has led to irrelevant theory, questionable conclusions, and has ? kept statisticians from working on a large range of interesting current problems. Algorithmic modeling, both in theory and practice, has developed rapidly in ?elds outside statistics. It can be used both on large complex data sets and as a more accurate and informative alternative to data modeling on smaller data sets. If our goal as a ?eld is to use data to solve problems, then we need to move away from exclusive dependence on data models and adopt a more diverse set of tools.

And from the same post

In the machine learning community, we evaluate a variety of models, select the best performing model and empirically determine loss on test set with the goal of predicting the outcome for new/unseen samples.? In the statistical community, we try to understand the data generation process and select a model whose assumptions seem most reasonable for that distributions. Using goodness of fit tests, we use the model to explain the data generation process and understand the parameters

If anything, this trend towards models with more parameters has accelerated dramatically with large language models with models like GPT-3, Bloom, etc . Researchers open-source neural network with 117B parameters

In terms of parameters for a model, it's useful to recap what we mean

The end goal of machine learning is to learn a function f that maps input variables (X) to output variables (Y). Different algorithms make different assumptions or biases about the form of the function and how it can be learned.

A learning model that summarizes data with a set of parameters of fixed size (independent of the number of training examples) is called a parametric model. No matter how much data you throw at a parametric model, it won’t change its mind about how many parameters it needs.

— Artificial Intelligence: A Modern Approach, page 737

Algorithms that do not make strong assumptions about the form of the mapping function are called nonparametric machine learning algorithms. By not making assumptions, they are free to learn any functional form from the training data.

Nonparametric methods are good when you have a lot of data and no prior knowledge, and when you don’t want to worry too much about choosing just the right features.

— Artificial Intelligence: A Modern Approach, page 757

Source parametric and nonparametric machine learning algorithms?

While the meaning is the same, the sheer number of parameters implies that machine learning and deep learning are diverging from statistics due to their ability to handle highly parameterized models.??

Finally, I am not discussing ethical concerns here. Yes, the statistical community leans towards explainable models. I am just pointing out that the world is leaning towards models with increasingly larger number of parameters.?

While these models will have some issues - I believe that the overall trend is very clear.?

If you want to learn with us at the University of Oxford, our AI course is about to open for the fall . Please sign up if you are interested.

Artificial Intelligence

114,172 位关注者

Miguel Fierro

I help people understand and apply AI

2 年

Nice summary Ajit Jaokar and thanks for the citation

1 次回应

Mark Samuel Tuttle

Senior Data Scientist

2 年

Good post. Thank you.

1 次回应

Rakesh Mallick

NO headline

2 年

In a few years I guess you won't require any programming language. AI might be able to compile your code from natural language pseudocode.

Jack Witkowski

Data Science, Data insight and visualization expert

2 年

It is not a war. R is used for special operations ??

1 次回应

查看更多评论

要查看或添加评论，请登录

查看全部

#ArtificialIntelligence No 65: Why R lost the R vs Python wars and what that tells you about where AI is going

Ajit Jaokar

领英推荐

Artificial Intelligence

114,172 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

Breaking the Jargons: Issue 7

Essential AI Tools for Aspiring Data Scientists ????

Python’s Top 6 Machine Learning Algorithms

Understanding Vector Autoregression (VAR) and Vector Moving Average (VMA) Models: A Comprehensive Guide with Code Examples

NuminaMath 7B TIR: A New Era in AI-Powered Mathematical Problem-Solving

Library related interview questions along with brief answers:

Python library & It's Uses

Why Julia is better framework for AI?

Text Classification with Hugging Face's BERT Model in Langchain

Unleashing the Power of Python Libraries: A Quick Guide for Data Scientists

领英推荐

Artificial Intelligence

114,172 位关注者

AI Opportunities in the new Justice AI Unit in the UK

2024年11月22日

Artificial Intelligence: Generative AI, Cloud and MLOps (online) - an amazing set of speakers

2024年11月21日

My new role - Senior AI fellow - Justice AI Unit - Ministry of Justice - UK Government

2024年11月20日

Securing an AI model

2024年11月17日

Auditing and Securing an AI model

2024年11月15日

An easy way to learn Python coding using chatGPT - part two

2024年11月13日

AI - Research Perspective - A Beginners Guide to Cursor and Claude-3.5-Sonnet

2024年11月11日

AGI - Powerful AI - Tomato - Tomahto

2024年11月9日

Artificial Intelligence for Climate Change Adaptation

2024年11月5日

Low-Code AI Hackathon: Empowering Non-Developers and Domain Experts to Unlock the Power of AI

2024年11月4日

社区洞察

其他会员也浏览了

Breaking the Jargons: Issue 7

Essential AI Tools for Aspiring Data Scientists ????

Python’s Top 6 Machine Learning Algorithms

Understanding Vector Autoregression (VAR) and Vector Moving Average (VMA) Models: A Comprehensive Guide with Code Examples

NuminaMath 7B TIR: A New Era in AI-Powered Mathematical Problem-Solving

Library related interview questions along with brief answers:

Python library & It's Uses

Why Julia is better framework for AI?

Text Classification with Hugging Face's BERT Model in Langchain

Unleashing the Power of Python Libraries: A Quick Guide for Data Scientists