登录查看更多内容

What makes NLP hard (and fun).

Chris Pedder

Chief Data Officer @ OBRIZUM | Board advisor | Data transformation leader | Posting in a personal capacity.

发布日期: 2020年8月31日

So it's 2020, and the much-anticipated AI-powered robot uprising is still very much in the indiscernible mists of the (possibly unreachable) future. No need to look nervously at your IoT kettle, we still have a long way to go before ml-enabled subjugation of the human race is anything more than a distant nightmare. In fact, based on my experiences in the field in the last few years, it feels like we are further from machine learning nirvana than we were a year ago. So, what happened?

Well, like all fields that suffer from early success, we seem to have run out of low-hanging fruit. And to be fair, in those five years (wow, is it really only five years?!) since the publication of the now-famous Nature paper on deep learning, a lot has happened. Humans have ceased to be the best at recognising images on ImageNet, and have lost our primacy in Go, and even before Hinton's tour de force, we had been outclassed at Jeopardy. So why on earth were these low-hanging fruit, and where are we now?

The crucial point to remember is that machine learning intelligence is still very much narrow intelligence. Whilst machine can very much outplay us at Go, we are able to build mental models which they cannot - playing go by moving pieces on a board with wavy lines would be completely beyond modern machine intelligence, but well within the capabilities of a six-year-old. And this is where things become complicated.

Ultimately, what makes human beings capable of doing more with fewer (or at the very least the same number of) computational units is the ability to generalise efficiently. In particular, we are pretty good at building Bayesian or even causal models of our world, where we infer the relatedness of patterns. To give a simple example, show a three-year old fives cats from the front, and they will be able to spot a cat from behind - something that most image recognition machine learning models really struggle with. How do we do this? We have a model for what a cat looks like, and we have another model for the world in which we live - we know how things are likely to look if we rotate them, move them away or towards us, light them differently etc. The interplay of this knowledge of the generalities of physics, and the specifics of feline species allows us to deal with a much broader array of cases than the current state of the art in machine learning would manage.

The recently released language model from OpenAI by the catchy name of GPT-3 is a case in point. It's a very impressive piece of technology, with 175bn free parameters which are trained on 2TB of textual data, comprising a significant chunk of the entire internet via common crawl. It's also remarkably good at finishing your writing, much better than GPT-2 and it's four-horned unicorns. But it still can't do this interplay of common sense and knowledge: Kevin Lacker discovered if you ask the innocuous question "How many eyes does my foot have?", GPT-3's best guess is "your foot has two eyes". Statistically, not a bad guess, but maybe not the best $4.6M humanity has ever spent...

So how do we fix this? Well, the issue clearly isn't data. There are 499bn tokens (words) in the training set for GPT-3, so about 4000 years of continual reading for an average-speed reader. It's obviously also not compute power, since OpenAI have that in spades thanks to their supporter Microsoft. So maybe it's what we're *doing* with the data. A lot of the techniques that have been used in machine learning are inspired by how human beings appear to think. The attention mechanism in NLP is based on how we seem to read - we focus on particular passages, and give them more weight in our interpretation. If you do this over big enough scales of text, presumably you can compress whole books, right? Well, maybe, but the real problem is scale - complexity is key, here, if you have lots of objects interacting in different ways, your story can become factorially complex.

How do you get around complexity? Speaking as a physicist - you build models. A good model helps you concentrate on what is essential to solving a problem, and what is irrelevant detail. When working out how to catch a ball, we should be much more concerned with accurately estimating the pitch and velocity with which it was thrown, or the wind speed than the changing gravitational field due to uneven ground. We learn such estimation behaviour by a few hundred experiences.

Fundamentally, how we do this is very poorly understood - there are many competing schools of thought in how humans develop their consciousness of the world, and little in the way of consensus between them. Unfortunately, one thing is very clear - the way in which we do this as we grow from babes in arms is very different from how we are doing machine learning. The cynic in me says this might have something to do with the companies at the forefront of machine learning research (and be in no doubt, it's companies, not universities) having enormous data centres that they would like to utilise, and with their business models at stake, it's unlikely we will see a shift to model-based learning unless there is great and certain advantage to be had.

In NLP, there are particular issues with this - mental models of the physical and emotional world that humans inhabit, along with models even of how language tends to be constructed all play a significant role in the decoding of a sentence in a given context. I'm afraid if you came here hoping for answers, I have little to offer (although I strongly encourage you to read about Hopfield networks, which are having something of a renaissance currently), other than some guidance about what to look for in upcoming developments. If you see a new model like GPT-3 come out, ask questions like "can it infer properties accurately out of domain?" (like "how many eyes does my foot have?"), or "can it accurately deal with causality?" on top of the usual "does it write amazing extended prose?" or "can it reach state of the art on XYZ dataset?". Those first two questions are much harder than the latter two (which could have been copied straight from this MIT Technology Review article), but when the answer changes from "no" to "maybe" it's time to start getting really excited...

Richard Pollock

Health Economist at Covalence Research Ltd

4 年

I think it was Maslow who said “when all you have is a data center, everything looks like a 499 billion token multilingual training corpus”.

1 次回应

要查看或添加评论，请登录

Chris Pedder的更多文章

Conform to be free.

2023年4月24日

Conform to be free.

As a sometimes awkward, sometimes I’m sure downright frustrating teenager, who just wanted to be, I always remember my…

4 条评论
What is emergence in neural networks?

2023年4月11日

What is emergence in neural networks?

Large language models & emergence. If you’re reading this, I don’t need Bayes’ theorem to tell me that there’s a very…

10 条评论
How to survive ML research

2023年1月21日

How to survive ML research

How (and why?) to stay ahead. I’ve seen numerous articles about how to “stay ahead” in ML research in the last two…

5 条评论
Why “speed” is a bad metric for success.

2022年9月28日

Why “speed” is a bad metric for success.

To start, two aphorisms: “If you want to go fast, go alone. If you want to go far, go together” - African proverb.

3 条评论
Why I love UX/UI as an ML engineer.

2022年5月24日

Why I love UX/UI as an ML engineer.

“There’s a truth, universally accepted, that an AI startup in posession of funding must be in search of good UX…
Building a data company in 2022.

2022年4月20日

Building a data company in 2022.

I've had a pretty varied career in machine learning and software development. I've worked for ten person startups and…

6 条评论
Don’t make a mesh (unless you have to…)

2022年3月27日

Don’t make a mesh (unless you have to…)

Apologies for the punny title, it’s a bit clickbaitey, but I want to talk a bit about one of the current hypes in…

9 条评论
What I learned from my first year in an innovation team.

2021年9月20日

What I learned from my first year in an innovation team.

I have spent the last year as part of Cisco's internal innovation program. As a result, I have read a lot of books and…

3 条评论
The "A" in AI?

2019年11月25日

The "A" in AI?

There’s really only one possible interpretation, and it’s “artificial”, isn’t it? For a long time, people would have…
"Fail fast" vs Machine learning.

2019年4月13日

"Fail fast" vs Machine learning.

Yep, you read that right. There can be only one.

See all articles

What makes NLP hard (and fun).

Chris Pedder

Chief Data Officer @ OBRIZUM | Board advisor | Data transformation leader | Posting in a personal capacity.

Chris Pedder的更多文章

社区洞察

其他会员也浏览了

?????? Attention Is All Graphs Need

The Marvelous Intersection of Artificial Intelligence and Deep Machine Learning: A Journey into the Realm of Intelligent Algorithms

Generating Novel Research Ideas Using LLMs

The Evolution of AI: From Rule-Based Systems to Agentic AI

Transcript of interview of Ian Goodfellow by Lex Fridman

Cracking the Code of Intelligence: From Machine Learning to Deep Learning and Beyond

AI Snake Oil - My Thoughts on the book Not another book review

AI: some basic definitions we all should now before discussing about GenerativeAI

Step-by-step guide on how to run a LLM locally

Chris Pedder的更多文章

Conform to be free.

What is emergence in neural networks?

How to survive ML research

Why “speed” is a bad metric for success.

Why I love UX/UI as an ML engineer.

Building a data company in 2022.

Don’t make a mesh (unless you have to…)

What I learned from my first year in an innovation team.

The "A" in AI?

"Fail fast" vs Machine learning.

社区洞察

其他会员也浏览了

?????? Attention Is All Graphs Need

The Marvelous Intersection of Artificial Intelligence and Deep Machine Learning: A Journey into the Realm of Intelligent Algorithms

Generating Novel Research Ideas Using LLMs

The Evolution of AI: From Rule-Based Systems to Agentic AI

Transcript of interview of Ian Goodfellow by Lex Fridman

Cracking the Code of Intelligence: From Machine Learning to Deep Learning and Beyond

AI Snake Oil - My Thoughts on the book Not another book review

AI: some basic definitions we all should now before discussing about GenerativeAI

Step-by-step guide on how to run a LLM locally