Artificial Intelligence #89: How can we incorporate domain knowledge from experts in machine learning / deep learning easily?

Artificial Intelligence #89: How can we incorporate domain knowledge from experts in machine learning / deep learning easily?

Happy new year :)

Last week, I spoke about the Erdos research institute as a community to focussed on interdisciplinary research to learn AI

There are many problems that could be addressed under this work

One of them is: How can we incorporate domain knowledge from experts in machine learning / deep learning easily?

I have been exploring this question through the idea of extending domain driven design for machine learning through Bias Variance trade-off

1) Traditionally, Domain driven design (DDD) follows a set of steps that define a problem statement

2) This is valid but applying to ML and DL needs more than just defining the problem statement because the solution is not static (unlike a software driven system) since it depends on data

3) Learning from data is a tradeoff - (a goldilocks concept - not too hot and not too cold)

4) In other words, you need to learn enough from the data but not learn too much such that you overfit

5) Overfitting is avoided through techniques like regularization

6) This is the classic bias variance tradeoff

7) So, to enhance DDD for ML/DL we need to cater for a data driven solution

8) From the perspective of the domain expert, the system learns from data in the form of features

9) as feature selection progresses, it follows that? the most accurate feature set correspond to the best bias variance

10) So, its a qs of mapping?Feature Selection, Feature Extraction and Feature Transformation?to the idea of bias variance tradeoff

https://lnkd.in/eZy4MJDp

https://lnkd.in/ermpAXbH

https://lnkd.in/eyFTm_ep

Now,?

??Matthew Kirk who is also a part of the Artificial Intelligence: Cloud and Edge implementations course also added the following

The core insight from Matthew is: Given that the model changes through time (generally from underfit to overfit though not guaranteed), there are different evaluations at different times that matter. A lot of attention is focussed on empirical evaluations like precision/recall/f-1 score vs predictive evaluations which is more a behavioral test. Also, we? basically ignore any qualitative evaluations which come from domain experts (the gut feel).

So, we can think of machine learning as follows?

All models start somewhere, most of the time it is a state of high bias. ?The known-states but we know that there is no learning from the data.?

Now, from this state, we have to go towards the prediction. One method to bridge the gap is using the GOMS model to build a predictive evaluation.

Goal: what is the goal that the model is aiming to achieve stated in plain english which is in line with the reverse bloom taxonomy. Starting with the end in mind.

Operator: what is the underlying algorithms available to machine learning practitioners. This is the main focus of the data scientist currently.? In the beginning it matters less what algorithm is used until a baseline is achieved through a combination of feature engineering and algorithm, .?

Using the bias/variance scaffold above:

Start with model evaluation: catboost/xgboost/logistic regression. "When I use {supervised algorithm}"

Then feature engineering: "When I use {feature selection}", "When I use {feature transformation}"

Preprocessing: "When I use {categorical transformer}"

Regularization: "When I use {regularizer}" (l2/l1)

ensembles: "When I use {boosting/bagging}"

Methods: this is the key to ATDD (application test driven development) with ML. Methods are the combinations of When steps. You can think of it kind of like a functional composition. So F(x) = When("I use CatBoost"), G(X) = When("I use UMAP") can become F(G(X)) = When("I use {algorithm} and {dimensional reduction}"). Somewhat like AutoML but I feel it's more focused on testing the output.

We can combine different operators together into one method spec. It's more like a plain english pipeline specification.?

Selection: This is a testing grid. In pytest, cucumber, and any testing suite which gives an output.?

From the predictive evaluation then becomes an empirical evaluation. This last step is where Precision/Recall/F-1 score starts to become very important. Over the course of the lifetime of a project these metrics become more important later on, in the beginning what matters is that the basis for learning and improvement is setup

The diagram above depicts the approach somewhat(we don't know the exact source). As a model progresses, it generally goes from being merely on paper to live and that is a process that involves going from high bias to low bias but keeping in mind to avoid high variance (overfitting) - however at each stage the methods of evaluation change - more specifically we want to bring domain knowledge into there evaluation stage

This is an example of a cross disciplinary industry challenge we are developing as a community in the erdos research institute .

We welcome your thoughts

Alex Karlsson

Computer Science graduate looking for entry-level work. Tennessee Tech Alumn.

2 年

A very interesting read! I have not heard of the GOMS model until now, and I'm glad it's on my radar. Thanks for sharing!

回复
Nitin Malik

PhD | Professor | Data Science | Machine Learning | Deputy Dean (Research)

2 年

To move from high bias to low bias, we need to use either boosting or stacking. And to ensure variance also remains low, boosting is preferred and more specifically Gradient boosting algorithms such as Gradient boosted decision trees, LightGBM and CatBoost

mohamed karim

Network Coordinator

2 年

Thank for sharing

Fred Simkin

Developing and delivering knowledge based automated decisioning solutions for the Industrial and Agricultural spaces.

2 年

Interesting approaches and at least it shows an appreciation for the difference between "knowledge" and "data" and the centrality of domain knowledge in developing solutions. However what I find is a missing step Knowledge Acquisition. No real life client is going to have a thorough.organized domain knowledge base ready to be digitized. The process of identifying domain knowledge sources (human and policies, practices and procedures) is manual and dependent on the developers understanding of the domain language and context. Particularly when working with human domain experts the developer needs to be cognizant of and understand how words are used, attitude, tonsignifies. Within the domain. Ontologys alone are insufficient and knowledge induction from data fails because it is subject to the quality.of data and even under the best of circumstances is devoid of all the context signifiyers.

One way to express domain knowledge is in the form of a set of questions and worked answers, where the latter are given as a sequence of derivation steps, e.g. as used for Google AI's Minerva. A complementary approach is to express domain knowledge explicitly in terms of properties, relationships and implications. This involves a mapping between the latent semantic space and the token sequences used for explicit domain knowledge. That mapping needs to be trained in parallel with the network for problem solving that deals with sequences of transformations of working memory. A related approach can be used to support operations on latent semantics, as a stepping stone to integrating a sequential rule engine for System 2 cognition. Imagine an artist conversing with an image generator to iteratively improve the image composition.

要查看或添加评论,请登录

Ajit Jaokar的更多文章

社区洞察

其他会员也浏览了