登录查看更多内容

Artificial Intelligence #89: How can we incorporate domain knowledge from experts in machine learning / deep learning easily?

Ajit Jaokar

发布日期: 2023年1月4日

Happy new year :)

Last week, I spoke about the Erdos research institute as a community to focussed on interdisciplinary research to learn AI

There are many problems that could be addressed under this work

One of them is: How can we incorporate domain knowledge from experts in machine learning / deep learning easily?

I have been exploring this question through the idea of extending domain driven design for machine learning through Bias Variance trade-off

1) Traditionally, Domain driven design (DDD) follows a set of steps that define a problem statement

2) This is valid but applying to ML and DL needs more than just defining the problem statement because the solution is not static (unlike a software driven system) since it depends on data

3) Learning from data is a tradeoff - (a goldilocks concept - not too hot and not too cold)

4) In other words, you need to learn enough from the data but not learn too much such that you overfit

5) Overfitting is avoided through techniques like regularization

6) This is the classic bias variance tradeoff

7) So, to enhance DDD for ML/DL we need to cater for a data driven solution

8) From the perspective of the domain expert, the system learns from data in the form of features

9) as feature selection progresses, it follows that? the most accurate feature set correspond to the best bias variance

10) So, its a qs of mapping?Feature Selection, Feature Extraction and Feature Transformation?to the idea of bias variance tradeoff

https://lnkd.in/eZy4MJDp

https://lnkd.in/ermpAXbH

https://lnkd.in/eyFTm_ep

Now,?

??Matthew Kirk who is also a part of the Artificial Intelligence: Cloud and Edge implementations course also added the following

领英推荐

How to Create/Use Great Synthetic Data for…

Vincent Granville 2 年前

Latest Trends in Machine Learning

MPN 1 年前

Mastering the Machine Learning Journey: Navigating the…

Yinka Oginni 1 年前

The core insight from Matthew is: Given that the model changes through time (generally from underfit to overfit though not guaranteed), there are different evaluations at different times that matter. A lot of attention is focussed on empirical evaluations like precision/recall/f-1 score vs predictive evaluations which is more a behavioral test. Also, we? basically ignore any qualitative evaluations which come from domain experts (the gut feel).

So, we can think of machine learning as follows?

All models start somewhere, most of the time it is a state of high bias. ?The known-states but we know that there is no learning from the data.?

Now, from this state, we have to go towards the prediction. One method to bridge the gap is using the GOMS model to build a predictive evaluation.

Goal: what is the goal that the model is aiming to achieve stated in plain english which is in line with the reverse bloom taxonomy. Starting with the end in mind.

Operator: what is the underlying algorithms available to machine learning practitioners. This is the main focus of the data scientist currently.? In the beginning it matters less what algorithm is used until a baseline is achieved through a combination of feature engineering and algorithm, .?

Using the bias/variance scaffold above:

Start with model evaluation: catboost/xgboost/logistic regression. "When I use {supervised algorithm}"

Then feature engineering: "When I use {feature selection}", "When I use {feature transformation}"

Preprocessing: "When I use {categorical transformer}"

Regularization: "When I use {regularizer}" (l2/l1)

ensembles: "When I use {boosting/bagging}"

Methods: this is the key to ATDD (application test driven development) with ML. Methods are the combinations of When steps. You can think of it kind of like a functional composition. So F(x) = When("I use CatBoost"), G(X) = When("I use UMAP") can become F(G(X)) = When("I use {algorithm} and {dimensional reduction}"). Somewhat like AutoML but I feel it's more focused on testing the output.

We can combine different operators together into one method spec. It's more like a plain english pipeline specification.?

Selection: This is a testing grid. In pytest, cucumber, and any testing suite which gives an output.?

From the predictive evaluation then becomes an empirical evaluation. This last step is where Precision/Recall/F-1 score starts to become very important. Over the course of the lifetime of a project these metrics become more important later on, in the beginning what matters is that the basis for learning and improvement is setup

The diagram above depicts the approach somewhat(we don't know the exact source). As a model progresses, it generally goes from being merely on paper to live and that is a process that involves going from high bias to low bias but keeping in mind to avoid high variance (overfitting) - however at each stage the methods of evaluation change - more specifically we want to bring domain knowledge into there evaluation stage

This is an example of a cross disciplinary industry challenge we are developing as a community in the erdos research institute .

We welcome your thoughts

Artificial Intelligence

115,361 位关注者

Alex Karlsson

Computer Science graduate looking for entry-level work. Tennessee Tech Alumn.

2 年

A very interesting read! I have not heard of the GOMS model until now, and I'm glad it's on my radar. Thanks for sharing!

Nitin Malik

PhD | Professor | Data Science | Machine Learning | Deputy Dean (Research)

2 年

To move from high bias to low bias, we need to use either boosting or stacking. And to ensure variance also remains low, boosting is preferred and more specifically Gradient boosting algorithms such as Gradient boosted decision trees, LightGBM and CatBoost

2 次回应

mohamed karim

Network Coordinator

2 年

Thank for sharing

1 次回应

Fred Simkin

Developing and delivering knowledge based automated decisioning solutions for the Industrial and Agricultural spaces.

2 年

Interesting approaches and at least it shows an appreciation for the difference between "knowledge" and "data" and the centrality of domain knowledge in developing solutions. However what I find is a missing step Knowledge Acquisition. No real life client is going to have a thorough.organized domain knowledge base ready to be digitized. The process of identifying domain knowledge sources (human and policies, practices and procedures) is manual and dependent on the developers understanding of the domain language and context. Particularly when working with human domain experts the developer needs to be cognizant of and understand how words are used, attitude, tonsignifies. Within the domain. Ontologys alone are insufficient and knowledge induction from data fails because it is subject to the quality.of data and even under the best of circumstances is devoid of all the context signifiyers.

2 次回应

David Raggett

2 年

One way to express domain knowledge is in the form of a set of questions and worked answers, where the latter are given as a sequence of derivation steps, e.g. as used for Google AI's Minerva. A complementary approach is to express domain knowledge explicitly in terms of properties, relationships and implications. This involves a mapping between the latent semantic space and the token sequences used for explicit domain knowledge. That mapping needs to be trained in parallel with the network for problem solving that deals with sequences of transformations of working memory. A related approach can be used to support operations on latent semantics, as a stepping stone to integrating a sequential rule engine for System 2 cognition. Imagine an artist conversing with an image generator to iteratively improve the image composition.

2 次回应

查看更多评论

要查看或添加评论，请登录

Ajit Jaokar的更多文章

LLMs as a wood wide web - Giant Associative Memory

2025年3月24日

LLMs as a wood wide web - Giant Associative Memory

We just announced our Oxford AI summit. If you want to meet me and our team in Oxford see The Oxford Artificial…

3 条评论
Are we reskilling - deskilling or unskilling developers

2025年3月22日

Are we reskilling - deskilling or unskilling developers

This week, when I presented at the European Parliament on AI - someone asked me a question after the talk Are we…

7 条评论
Demonstrating the power of deep research at EU Parliament presentation

2025年3月21日

Demonstrating the power of deep research at EU Parliament presentation

This week, I presented a talk at the EU parliament on AI In it, I shared how the task of MEP assistants could be…

7 条评论
The evolution of the AI Risk Register- the state of the art

2025年3月17日

The evolution of the AI Risk Register- the state of the art

As I write this, Alphabet is in talks to acquire a cybersecurity firm for 30 billion USD The whole #AI and…

4 条评论
Reskilling for AI - Building Tools is itself the learning experience

2025年3月16日

Reskilling for AI - Building Tools is itself the learning experience

Background The famous starting scene from Space Odyssey 2001 where the ape throws a bone which cuts into a spaceship -…

2 条评论
Creating a prompt to demonstrate meta-cognition using Role play and Socratic reasoning

2025年3月15日

Creating a prompt to demonstrate meta-cognition using Role play and Socratic reasoning

I shared this idea with my class It's adapted from a previous idea I developed for learners on Autism spectrum Using…

2 条评论
Multi-modal AI lab in collaboration with our digital twins course at the University Of Oxford

2025年3月12日

Multi-modal AI lab in collaboration with our digital twins course at the University Of Oxford

After the success of our collaboration in #AI and #agtech - which was recently covered by both Satya Nadella and Elon…

2 条评论
The responsibility of reskilling for AI is primarily with the individual

2025年3月12日

The responsibility of reskilling for AI is primarily with the individual

In the previous post Re-skilling for AI - which jobs will AI impact is the limiting question Nicolas Escherich asked ?…

5 条评论
Re-skilling for AI - which jobs will AI impact is the limiting question

2025年3月11日

Re-skilling for AI - which jobs will AI impact is the limiting question

Background Yesterday, I posted the question - Does teaching using AI call for the Inverse Bloom’s taxonomy instead of…

5 条评论
Does teaching using AI call for the Inverse Bloom’s taxonomy instead of the traditional Bloom's taxonomy?

2025年3月10日

Does teaching using AI call for the Inverse Bloom’s taxonomy instead of the traditional Bloom's taxonomy?

Background I have been sharing ideas about creating an open syllabus to teach AI and working with teachers on this…

14 条评论

See all articles

Artificial Intelligence #89: How can we incorporate domain knowledge from experts in machine learning / deep learning easily?

Ajit Jaokar

领英推荐

Artificial Intelligence

115,361 位关注者

Ajit Jaokar的更多文章

社区洞察

其他会员也浏览了

Lessons in Machine Learning

Comparing Machine Learning Models to Find the Best Fit

Using Machine Learning to Build Better Machine Learning: How Tech Giants are Relying on AutoML to Build Better Neural Network Architectures

Machine Learning Myths

Sam’s Machine Learning

Why Machine Learning Systems Misbehave

Unveiling the Power of DE-ELMs: Redefining Machine Learning Through Efficiency and Insight

Synthetic VIX Data Generation Using ML Techniques

Machine Learning Topic 3: Algorithms in Machine Learning: An Explanation

When You Need More Human (Not Machine) Learning

领英推荐

Artificial Intelligence

115,361 位关注者

Ajit Jaokar的更多文章

LLMs as a wood wide web - Giant Associative Memory

Are we reskilling - deskilling or unskilling developers

Demonstrating the power of deep research at EU Parliament presentation

The evolution of the AI Risk Register- the state of the art

Reskilling for AI - Building Tools is itself the learning experience

Creating a prompt to demonstrate meta-cognition using Role play and Socratic reasoning

Multi-modal AI lab in collaboration with our digital twins course at the University Of Oxford

The responsibility of reskilling for AI is primarily with the individual

Re-skilling for AI - which jobs will AI impact is the limiting question

Does teaching using AI call for the Inverse Bloom’s taxonomy instead of the traditional Bloom's taxonomy?

社区洞察

其他会员也浏览了

Lessons in Machine Learning

Comparing Machine Learning Models to Find the Best Fit

Using Machine Learning to Build Better Machine Learning: How Tech Giants are Relying on AutoML to Build Better Neural Network Architectures

Machine Learning Myths

Sam’s Machine Learning

Why Machine Learning Systems Misbehave

Unveiling the Power of DE-ELMs: Redefining Machine Learning Through Efficiency and Insight

Synthetic VIX Data Generation Using ML Techniques

Machine Learning Topic 3: Algorithms in Machine Learning: An Explanation

When You Need More Human (Not Machine) Learning