Artificial Intelligence No 14: Of Deep Learning and bias
image source: https://www.rootstrap.com/blog/gentle-introduction-to-the-bias-variance-trade-off-in-machine-learning/

Artificial Intelligence No 14: Of Deep Learning and bias

15K subscribers in two months

Firstly, we now have 15K subscribers in just over two months. Thanks for your support

In this newsletter, I will cover an important topic, of #deeplearning and #bias.

I hope, as usual, I will be adding a new perspective to your thinking.

This is a complex topic and hence, this edition will be quite long

Also, I got a request for my last newsletter on maths based strategies to learn AI to be translated into Chinese for students in China. Its nice to see this. In the UK, I have mentored many students from China and its always impressive to see their knowledge of maths and science and their overall dedication. Thanks Paul Lin for facilitating this.

Finally, before we start, Amsterdam gets a new Microsoft research lab led by the respected researcher Max Welling. This is great to see (and they are recruiting). By coincidence, Max Welling also features in this newsletter in another context

Of deep learning and bias

Background

On the last edition, Dr Catherine Lopes commented “We (human) are also making decisions everyday by balancing between #biases & #noise. “

Bias is a topic of discussion in AI and a very important one

The textbook meaning of bias is to know something about a concept but ignore something?else which is also relevant (often with a malicious intent)

?The statistical meaning of bias also conforms to the colloquial meaning

?Statistically, high bias – indicates a view towards a specific (often a simplistic) viewpoint - ignoring other relevant inputs – which leads to underfitting. High variance means taking too many viewpoints?into consideration?aka learning the noise – which leads to overfitting

Bias and variance are a trade-off leading to the well-known bias variance tradeoff

?What media misses about bias ..

The discussion of bias has become a bit like privacy – there is a lot of heat (talk) but little light (insights)

Most of the media misses some key points

Bias in AI can be classed into two parts:

a)????based on human intent and

b)????based on the limitations of the algorithm itself.

Most discussion is on the former.

The former is relatively easy to fix (through awareness, regulation etc).

It’s also a reflection of society – and it’s easier to blame ‘AI’ for it.

That’s the situation with the Tay robot which was created with good intents – which were hijacked. Also, policy / governance is evolving thanks to some great work on AI ethics. For example, see this proposal for Identifying and Managing Bias in Artificial Intelligence by NIST – thanks Ana chubinidze for sharing this

Bitter lesson

But the second part of bias is a lot harder to fix. In this case, it’s not that you are selective about data for training (ex: you train only using one ethnicity) – but that you do not have data to train for all possible cases. The effect is the same i.e. the algorithm is biased. But this time, the problem is a lot harder to solve and no amount of regulation can fix it

?The important idea is: this is an inherent property of deep learning itself (or in general algorithms that depend on only on data directly for inference). The limitations of deep learning are best explained by an article from Max Welling - Do we still need models or just more data and compute? where he says

“The most fundamental lesson of ML is the bias-variance tradeoff: when you have sufficient data, you do not need to impose a lot of human generated inductive bias on your model. You can “let the data speak”. However, when you do not have sufficient data available you will need to use human-knowledge to fill the gaps.”

The article is itself a response to another insightful but brief article called the Bitter lesson by Richard Sutton . Richard Sutton takes the contra view that

“The biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective, and by a large margin.”

?The compute driven strategy and its limitations

The compute driven vision (Sutton view) says that one should work on scalable methods that can leverage compute. If you have enough compute you do not need to model the world. The success of current and recent achievements in deep learning point to the validity of the compute strategy – for example AlphaGO. But Max Welling points out that all these domains are very well defined where you can generate your own data (such as in games) or there is ample data already available (speech or text).

The problem comes when

a)????the domain is not well defined

b)????you cannot easily generate your own data but

c)????you need to extrapolate.

In this case, you do not have enough data because you have not encountered the situation before.

You cannot just ‘let the data speak’ because by the (statistical) definition of bias, the outcome is not valid

Both the assumptions made by the algorithm and the inductive bias (also known as learning bias) of a learning algorithm is relevant. The inductive bias is the set of assumptions that the learner uses to predict outputs of given inputs that it has not encountered

So, you need some human knowledge to fill in the gaps because there are many exceptions in the problem that cannot be captured in the data itself. Talk of adding human knowledge appears like symbolic AI – but Max Welling offers another solution – one based on generative modelling.

Generative models to the rescue?

According to Max Welling, the key insight is that the world operates in the “forward, generative, causal direction”.

In other words,

  • Events occur and cause other events to occur and these are recorded in sensors.
  • Events are causal and follow the laws of physics.
  • Could we combine this background with generative models?

This could be relevant because

a)????Generative models are far better in generalization to new unseen domains.

b)????Causality allows us to easily transfer predictors from one domain to the next

c)????Generative models allow us to learn from a single example because we can embed that example in an ocean of background knowledge.

And Max welling expands the case for generative models by pointing out that humans have an ability to generate counter factual worlds. For example, we can imagine the consequences of our actions by ‘playing out’ a scenario in our mind. But these imaginary / counterfactual scenarios still follow the laws of physics.

Max welling then goes on to say that we could combine these ideas

  • For narrowly defined domains with enough data or a really accurate simulator, you can train a discriminative model and do very well.
  • But when you need to generalize to new domains, i.e., extrapolate away from the data, you will need a generative model.
  • The generative model will be based on certain assumptions about the world but is expected to generalize a lot better.
  • Moreover, it can be trained and/or finetuned using unsupervised learning without the need for labels.
  • As you are collecting more (labelled) data in the new domain you can slowly replace the inverse generative model with a discriminative model.

Analysis

  • I like this approach because its an attempt to solve a hard problem
  • It does not equate generative models to symbolic AI – but rather it’s a new approach.
  • There are other ways to inject human knowledge such as Knowledge graphs, probabilistic techniques, and physics-based simulators
  • I am exploring hybrid cognitive architectures as per my previous posts
  • This leads a possible way for AGI i.e. start with an unsupervised model with little prior human knowledge and also complement with computing resources.

?

To conclude

Bias is a hard problem to solve as mentioned above. ?But new ways of thinking are needed.

?Image source rootstrap

Farman S.

Sr. Software Quality Engineer | Python | javaScript | C# | HTML | CSS | SQL

3 年

Excellent article. Bias really is a significant concept in AI. Thank you for sharing ??

Julia Mann née Hoerner

Managing Director at RWTH Center for AI

3 年

Congratulations on the 15K Ajit Jaokar , this is an impressive number and shows that people are interested in what you have to say ??. We’ll done ??.

Karl Haviland

We build AI Powered, Next Gen Companies - Strategy, Software, and Sales | Principal at Haviland Software | UM Data Science

3 年

Hi Ajit Jaokar! another good one - are there specific techniques you would you use to rectify large movement/drift in the underlying data (thus I *think* introducing a high bias)? A situation where there is a timeframe when the model hasn't quite adapted to that drift and produces a large drop off in predictive accuracy. In the article, are you saying that you would attempt to generate a body of data to reflect your anticipated (played-out scenario) data to help smooth out results until the training data catches up? Appreciated, and keep it going! Thanks again - Karl

回复

The generative approach is related to the idea of mirror neurons that trigger either when an animal performs an action, and also when the animal perceives another animal performing that action. See, e.g. Kilner & Lemon "What We Know Currently about Mirror Neurons" [1]. More generally, we try to understand what we observe in terms of which actions are needed to mimic the observed behaviour, and what intents would lead to those actions. Social mimicry is ubiquitous, starting with the way that babies learn to smile. It also relates to natural language and understanding the likely intentions of the speaker. [1] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3898692/

要查看或添加评论,请登录

Ajit Jaokar的更多文章

社区洞察

其他会员也浏览了