登录查看更多内容

#Artificial Intelligence #25 - My challenges with the definition of data centric vs model centric

Ajit Jaokar

发布日期: 2021年10月12日

Welcome to #Artificial Intelligence #25

Introduction

Like almost all of us, I also got the first experience of AI through Andrew Ng’s flagship course. However, recently, Andrew has been speaking of Data centric vs model centric approaches and has proposed a definition of Data centric AI as ?“ Data-centric AI is the practice of systematically engineering the data used to build AI systems.”. ?

I have some challenges with this – and I see it as wider issue with deep learning per se. I hope here, I can create a wider discussion which will benefit us all as an industry.

Background

On first impressions, the definition of data centric AI sounds perfect. Good data is needed for good models of course. MLOps helps to make high quality data throughout the lifecycle. All fine. ?

What’s not to agree? ?

For me, the challenge comes in the second part: Data centric vs model centric

Here are some of the reasons why:

1)????Restricted dichotomy: ?In a nutshell, the problem is framed as A vs B when there could be options C, D or E or various combinations therof as we see below

2)????Paradoxical starting point: The data centric approach is supposed to hold the model fixed and iterate on the data. This raises a paradoxical question: How exactly do we choose a model? And when do we fix the model? What is the criteria for fixing a model? Ironically, to be able to fix the model, we need to know in advance about a range of different models – which makes it like a recursive argument since in that case, models come before data.

3)????Known unknowns: ?There is a general guideline but no fixed way to choose a model. Are we saying here that we choose a model (ex: logistic regression or SVM) and keep throwing data at and tuning it (ignoring any other models) when there could be a better model downstream? (we suffer from the known unknowns problem)

4)????The baseline approach: In how to get baseline results and why they matter, Jason Brownlee says “You may need to collect more or different data from which to model. You may need to look into using different and perhaps more powerful machine learning algorithms or algorithm configurations. Ultimately, after rounds of these types of changes, you may have a problem that is resistant to prediction and may need to be re-framed.” This statement sounds more pragmatic to me i.e. many things could change including models or indeed the problem itself.

Other possible options

Then, there are a number of possibilities which are possible outside of data centric vs model centric approach.

1)????Human in the loop: The data centric v.s. model centric argument ignores other strategies like human in the loop. In the article?An Inconvenient Truth About AI won't surpass human intelligence any..., the author Rodney Brooks, makes a point for a human in the loop and that you cannot trust AI alone. “Just about every successful deployment of AI has either one of two expedients: It has a person somewhere in the loop, or the cost of failure, should the system blunder, is very low.” ?Data centric approach puts our faith entirely on data ignoring other strategies like human in the loop.

2)????Models are themselves evolving in a number of directions: ?For example causal models re work of Judea Pearl – Book of why and Bayesian approaches in general.

3)????There are also alternate views more critical of deep learning itself : Deep learning with gary marcus - and leaning to symbolic ideas Do we still need models or just more data and compute? Max Welling, A

领英推荐

What’s Next for Data + AI in 2025? 10 Predictions

Barr Moses 3 个月前

Three Ways of Performing Sentiment Analysis…

Open Data Science Conference (ODSC) 2 年前

From Theory to Practice: 4 AI and Data Science…

ángel Molina Laguna 3 个月前

The risks of Industry consensus

There is a risk of groupthink in the industry as a whole and we see it in many ways. It’s a narrative that the deep learning industry wants us to believe. Lots of data and lots of compute ?alone will drive AI. But there is a dark side to this. Large companies want your data and large companies are then in a dominant position to anyone else who have no data and cannot pay for compute. That was also the broader concern raised by?Timnit Gebru?. There is also a wider backstory to data centric vs model centric question

Deepmind says reward alone is enough . Yoshua Bengio has a very detailed NeurIPS talk (Yoshua Bengio: From System 1 Deep Learning to System 2 Deep Learning (NeurIPS 2019)) which attempts to show how neural networks / connectionist approaches will address even complex phenomenon such as system 2 (by Kahneman) . ?A recent paper,?Deep Learning for AI by Yoshua Bengio, Yann Lecun, Geoffrey Hinton also says “The performance of deep learning systems can often be dramatically improved by simply scaling them up.?“. Collectively, these approaches address the shortcomings of deep learning such as the "bigger is better" mindset (breakthroughs are based on creating larger models and datasets); machines learn in a relatively narrow way and need much more data to learn when compared to humans; neural networks are vulnerable to small changes in data ?– for example adversarial attacks etc. There are various approaches to achieve this including handling out of order distributions; attention mechanisms; transfer learning; importance of priors; composability and self-supervised learning.

Conclusion and discussion

To conclude, the definition is agreeable i.e. you agree to it intuitively but it creates a restricted dichotomy (ignores other options) and a contrived scaffolding (ignores different viewpoints)

But more than semantics, I think the wider industry viewpoint may need to evolve. I believe that the next generation of deep learning would be radically different from the last. My personal bet is on Bayesian models, Causal models and their interplay with existing deep learning models. There are more complex possibilities on the model development side for example the Bayesian Active Learning library (BAAL). So, in a wider sense, models will play a key part in the future

AI for Marketing conference

Finally, I am planning to chair an online conference at the guild. If you are in marketing: what topics would you suggest for AI and marketing? From the AI side I know, I am keen to hear from you if you have a marketing background and using AI and plan to use AI

Interesting book

I read an interesting book – about Fourier series and Fourier transforms written by Japanese music students - Who Is Fourier?: A Mathematical Adventure . The book is unique and unclassifiable and spans ?music, mathematics, physics, engineering, and complex science. In parts – hard reading – but I really liked the effort to combine domains like maths and music and create something new

Many thanks?

If you want to study with me at #universityofoxford see my courses

Digital Twins: Enhancing Model-based Design with AR, VR and MR

This course is for aspiring and seasoned simulation engineers that want to develop digital twin models of engineering components and incorporate these models into AR-VR-MR technologies.

and

Artificial Intelligence: Cloud and Edge Implementations

This is a pioneering full-stack AI course covering AI, MLOps and Edge.

The course helps developers to transition their careers to AI.

Artificial Intelligence

115,372 位关注者

Nitin Malik

PhD | Professor | Data Science | Machine Learning | Deputy Dean (Research)

3 年

My few cents Restricted dichotomy: I think ideally, the question of data-centric vs model-centric should not arise as both data quality and model quality contributes to the overall objective to be achieved/optimized. However, still, if forced into one of the camps, it's more data-driven which hopefully be model-driven in the future. Paradoxical starting point: Data comes before the model. So it has to be rinsed first. After rinsing, hold the data fixed (unless its meant for online learning) and then choose the model. The problem of known unknowns would still persist. However, it's easy to change the model compared to the data. Also, I agree with Dr Ajit that model hunting is an issue.

1 次回应

Nick Schifano

CEO @ FastCatalog.ai | Founder

3 年

Hi Ajit - good stuff To take a slightly contrarian view, focus should be on a data centric approach most of the time. It does feel like folks tend to jump too quickly in ML analysis, without first trying to invest in an analysis of the underlying of the problem at hand. What features would help characterize the system I'm analyzing, are my data balanced, etc. In other words, it seems difficult to inject information in a solution through ML algo alone, if that information is not already encapsulated in the data, even if perhaps hidden.

Kristina Vega

Cyber Money Laundering in Real Estate Investigations Corp

3 年

Model-centric AI could be defined as a AI product development approach where the model is selected by how well it fits business use goals (in my opinion). For example, to get better predictions is one of the main business uses in Enterprise AI. There are some internal business prediction use cases (e.g. predict energy use in manufacturing shop floor operations) vs. customer/supplier facing business prediction use cases (e.g. predict risk level of certain supplier and decide accordingly on which supplier is best to procure materials from). Now for the latter AI product, you need to make sure you have explainable model in place so the results of AI predictions could be explained and interpreted, especially in the event of regulatorily audit. Therefore, for internal business use prediction use case such as energy consumption predictions for your manufacturing operations, in model-centric AI ML engineer would probably opt for Random forests ML as they perform well with prediction accuracy and are a great to use for most applications that do not need extensive reasoning behind predictions.

1 次回应

Matja? Marussig

Independent Software Vendor, Certified Project Manager, DevOps Engineer, APEX Oracle developer, Oracle Forms & Reports developer, ERP specialist, Full-stack Developer, Mechanical Engineer, AI orchestrator

3 年

Excellent. I described my thoughts some time ago in this article. https://medium.datadriveninvestor.com/artificial-super-intelligence-asi-could-it-rise-spontaneously-d0df6fd110fc?source=friends_link&sk=79e5c446b6ca5843757395d36983f9e9

Ajit Jaokar

3 年

If you want to study with me at #universityofoxford see my courses Digital Twins: Enhancing Model-based Design with AR, VR and MR This course is for aspiring and seasoned simulation engineers that want to develop digital twin models of engineering components and incorporate these models into AR-VR-MR technologies. https://www.conted.ox.ac.uk/courses/digital-twins-enhancing-model-based-design-with-ar-vr-and-mr Artificial Intelligence: Cloud and Edge Implementations This is a pioneering full-stack AI course covering AI, MLOps and Edge. The course helps developers to transition their careers to AI. https://conted.ox.ac.uk/courses/artificial-intelligence-cloud-and-edge-implementations

1 次回应

查看更多评论

要查看或添加评论，请登录

Ajit Jaokar的更多文章

Are we reskilling - deskilling or unskilling developers

2025年3月22日

Are we reskilling - deskilling or unskilling developers

This week, when I presented at the European Parliament on AI - someone asked me a question after the talk Are we…

6 条评论
Demonstrating the power of deep research at EU Parliament presentation

2025年3月21日

Demonstrating the power of deep research at EU Parliament presentation

This week, I presented a talk at the EU parliament on AI In it, I shared how the task of MEP assistants could be…

7 条评论
The evolution of the AI Risk Register- the state of the art

2025年3月17日

The evolution of the AI Risk Register- the state of the art

As I write this, Alphabet is in talks to acquire a cybersecurity firm for 30 billion USD The whole #AI and…

4 条评论
Reskilling for AI - Building Tools is itself the learning experience

2025年3月16日

Reskilling for AI - Building Tools is itself the learning experience

Background The famous starting scene from Space Odyssey 2001 where the ape throws a bone which cuts into a spaceship -…

2 条评论
Creating a prompt to demonstrate meta-cognition using Role play and Socratic reasoning

2025年3月15日

Creating a prompt to demonstrate meta-cognition using Role play and Socratic reasoning

I shared this idea with my class It's adapted from a previous idea I developed for learners on Autism spectrum Using…

2 条评论
Multi-modal AI lab in collaboration with our digital twins course at the University Of Oxford

2025年3月12日

Multi-modal AI lab in collaboration with our digital twins course at the University Of Oxford

After the success of our collaboration in #AI and #agtech - which was recently covered by both Satya Nadella and Elon…

2 条评论
The responsibility of reskilling for AI is primarily with the individual

2025年3月12日

The responsibility of reskilling for AI is primarily with the individual

In the previous post Re-skilling for AI - which jobs will AI impact is the limiting question Nicolas Escherich asked ?…

5 条评论
Re-skilling for AI - which jobs will AI impact is the limiting question

2025年3月11日

Re-skilling for AI - which jobs will AI impact is the limiting question

Background Yesterday, I posted the question - Does teaching using AI call for the Inverse Bloom’s taxonomy instead of…

5 条评论
Does teaching using AI call for the Inverse Bloom’s taxonomy instead of the traditional Bloom's taxonomy?

2025年3月10日

Does teaching using AI call for the Inverse Bloom’s taxonomy instead of the traditional Bloom's taxonomy?

Background I have been sharing ideas about creating an open syllabus to teach AI and working with teachers on this…

14 条评论
Happy Womens day to the amazing women in our team at the #universityofoxford

2025年3月8日

Happy Womens day to the amazing women in our team at the #universityofoxford

Today is International Women's Day #InternationalWomensDay #womensday #womensday2025 #iwd2025 Every year, we…

3 条评论

See all articles

#Artificial Intelligence #25 - My challenges with the definition of data centric vs model centric

Ajit Jaokar

Introduction

Background

Other possible options

领英推荐

The risks of Industry consensus

Conclusion and discussion

AI for Marketing conference

Interesting book

Artificial Intelligence

115,372 位关注者

Ajit Jaokar的更多文章

社区洞察

其他会员也浏览了

Data Phoenix Digest - ISSUE 4.2024

Riches to RAGs

Data Phoenix Digest - ISSUE 3.2024

BigID's Data Leaders Series: Week 1 - AI and Leadership: Insights from the Top and Navigating Implementation

Kickstart 2025 with Fresh Insights in Data & AI: First Edition Inside!

Can Data+AI drive business success without human intelligence?

Why Decision Intelligence is the Gravity that is bringing Planet Data and Planet Process together

SYNTHETIC DATA – MY AHA MOMENT

The 7 personas of Machine Learning – and what they need from you as a leader

Voxel51 Filtered Views Newsletter - August 02, 2024

Introduction

Background

Other possible options

领英推荐

The risks of Industry consensus

Conclusion and discussion

AI for Marketing conference

Interesting book

Artificial Intelligence

115,372 位关注者

Ajit Jaokar的更多文章

Are we reskilling - deskilling or unskilling developers

Demonstrating the power of deep research at EU Parliament presentation

The evolution of the AI Risk Register- the state of the art

Reskilling for AI - Building Tools is itself the learning experience

Creating a prompt to demonstrate meta-cognition using Role play and Socratic reasoning

Multi-modal AI lab in collaboration with our digital twins course at the University Of Oxford

The responsibility of reskilling for AI is primarily with the individual

Re-skilling for AI - which jobs will AI impact is the limiting question

Does teaching using AI call for the Inverse Bloom’s taxonomy instead of the traditional Bloom's taxonomy?

Happy Womens day to the amazing women in our team at the #universityofoxford

社区洞察

其他会员也浏览了

Data Phoenix Digest - ISSUE 4.2024

Riches to RAGs

Data Phoenix Digest - ISSUE 3.2024

BigID's Data Leaders Series: Week 1 - AI and Leadership: Insights from the Top and Navigating Implementation

Kickstart 2025 with Fresh Insights in Data & AI: First Edition Inside!

Can Data+AI drive business success without human intelligence?

Why Decision Intelligence is the Gravity that is bringing Planet Data and Planet Process together

SYNTHETIC DATA – MY AHA MOMENT

The 7 personas of Machine Learning – and what they need from you as a leader

Voxel51 Filtered Views Newsletter - August 02, 2024