Human-compatible AI
Tirthajyoti Sarkar
VP, AI/ML, building a Digital Nervous System with Data Science and AI | Author and Mentor
Asimov’s laws: Profound but powerless
For science-fiction nerds (and a bunch of others too), Isaac Asimov needs no introduction. His three laws of robotics (first appearing in the short story Runaround in March 1942) are among the most often-quoted maxims featuring in any intellectual discourse about robots, artificial intelligence for humanity in general, and the future of civilization.
Because, we have all seen this image somewhere on the internet, right?
And we want those machines, on our side, as our friends, philosophers, and guides, not on the other side, as our enemies…
Here are the three laws, just for recap,
- First Law — “A robot may not injure a human being or, through inaction, allow a human being to come to harm.”
- Second Law— “A robot must obey orders given to it by human beings except where such orders would conflict with the First Law.”
- Third Law — “A robot must protect its own existence, as long as such protection does not conflict with the First or Second Law.”
A set of (mostly) plot-generating devices (in English)
Asimov was neither a roboticist nor a computer scientist/AI researcher. He was a biochemist by training and profession and throughout his life, he placed a far higher value on his writing than doing regular university research and teaching. The literary world is, of course, richer because of this choice of his.
Asimov was neither a roboticist nor a computer scientist/AI researcher. He was a biochemist by training and profession and throughout his life, he placed a far higher value on his writing than doing regular university research and teaching. The literary world is, of course, richer because of this choice of his.
Plots of the amazing stories that he weaved, were supremely important to him. And he employed novel plot-generating devices, the aforementioned Laws of Robotics being chief among them. They were not meant to be coded into some robotic device or even a computer program emulating an artificial-intelligence agent.
As roboticist and writer Daniel Wilson puts it succinctly — “Asimov’s rules are neat, but they are also bullshit. For example, they are in English. How the heck do you program that?”
Asimov’s laws were not designed to be coded into an AI program or inside the processor of a super-intelligent robot.
You can argue that a great many laws of natural philosophy (physical science as it was called in the earlier times) were framed in plain language and later codified in mathematical terms.
If you are convinced that the same can be done for these laws, try coming up with precise definitions (and mathematical bounds) of key phrases — inaction, harm, obey, conflict, protect, existence.
Hard? As I mentioned, they are in English.
Asimov’s stories are proof that they don’t work
Majority of Asimov’s stories revolved around the theme of — how robots might follow these apparently logical, and ethically sound codes, but still can go astray leading to unintended and disastrous consequences for humanity and society.
The genesis story of Runaround itself showed how easily the first two laws come into conflict and a third law needed to be invented to make it a happy ending.
Truly logical, scientific laws are not supposed to come into conflict this easily. They are built to be followed in a sequence, often resulting in a clear, unambiguous, actionable solution. Remember Newton’s laws? Or, Maxwell’s equations?
There are a plethora of articles on the web discussing the woeful inadequacy of these laws for building any functional super-intelligent agent (hardware- or software-centric). Check here, or here, or this one.
So, is something being done to address this?
Human-compatible AI
Are you interested in artificial intelligence? I assume the answer is (somewhat) in affirmative because you have chosen to open this story for reading.
Have you come across this textbook?
There is a good chance that you have (even if you have not had the courage to plow through the nearly thousand pages of this compendium — I am at around 30% mark myself), because this is adopted as the standard text for AI curriculum at hundreds of universities around the world.
This book has two authors.
Peter Norvig is the more familiar name around student and learners’ community, teaching us about the fundamentals of AI in a great online course or publishing those amazing Jupyter Notebooks on programming, statistics, and machine learning.
However, in this article, we will focus on the first author of the book — Dr. Stuart Russell, a professor, and a world-renowned AI researcher at the Univ. of California, Berkeley.
He is working on something called ‘provably beneficial AI’ through his initiative of Center for Human-Compatible Artificial Intelligence.
Important questions to ask
Mainstream AI researchers do not ask these questions often enough, but they are gaining the attention of many in the community, slowly but surely.
Should we be concerned about long-term risks to humanity from superintelligent A.I.? If so, what can we do about it?
You may ask “what long-term risk”.
You may argue that if intelligent, rational, moral, and ethical human beings — scientists, engineers, social activists, politicians — come together to define the ‘objectives’ of an intelligent system carefully, then it cannot go the wrong way.
Because the objective function is important, right?
If you have read and understood the core principles of modern machine learning algorithms, on which the world seems to be running, you know almost all of them (the most useful ones anyway) work by maximizing an objective function.
Therefore, if you think that for developing a superintelligent but moral and ethical AI system, all we need is a good objective function, you cannot be faulted.
Almost all modern, powerful A.I. systems and algorithms have objective functions at their core.
Surely, a powerful AI can do no harm to humanity if the objective function or goal is a harmless one.
And, what could be more harmless (and more frustratingly mundane) than setting a goal of maximizing the paperclip production?
The superintelligent paperclip maximizer
First described by Oxford philosopher Nick Bostrom in a 2003 essay, a paperclip maximizer is a system/agent endowed with Artificial General Intelligence (AGI) whose goal is to maximize the number of paperclips in its collection.
Sounds harmless enough? Let us dig a little deeper.
If it has been constructed with a roughly human level of general intelligence, the system might collect paperclips from all the sources possible on earth, earn money to buy paperclips, or even begin to manufacture paperclips.
That’s where it gets more interesting. Where would it get the raw material to manufacture paperclips from? It can buy them of course. What happens when normal supply chains are exhausted? It might try unconventional supply chain, buying up the same raw material (like aluminum) from other sources, generally reserved for other industries, say for airplane or automobile industry.
To do that, it would have to employ ingenious methods and earn money. And the cleverer it is, it would have a higher likelihood to accomplish those sub-goals.
The main goal would, therefore, generate sub-goals, which were, not explicitly programmed, and something, that the human designers did not plan for.
Therefore, it would also undergo an ‘intelligence explosion’, would work to improve its own intelligence, to satisfy the sub-goals which, in turn, help satisfy the same grand goal of maximizing paperclip production.
Having increased its intelligence, it would produce more paperclips and also use its enhanced abilities to further self-improve.
A purely goal-oriented AGI, even when given a perfectly safe and simple goal, is highly likely to generate sub-goals, which cannot be foreseen, and, therefore, cannot be planned for by human designers.
Where does it stop?
According to Bostrom, at some point, it might transform “first all of the earth and then increasing portions of space into paperclip manufacturing facilities”.
‘Provably beneficial AI’
Designing a purely goal-oriented AI system, therefore, does not seem a very good idea after all. Unexpected things may pop up.
Because, when we teach a machine to think and continuously improve itself, we cannot hope to predict how and when it may start to outthink us.
This is called the “Control Problem of AGI”. It is poorly defined and understood at the moment. And it is harder than you think. Way, way harder.
Because it is akin to the situation when a bunch of gorillas tries to think about how to control humans.
We might well be like those gorillas, in front of a superintelligent AGI.
“If a machine can think, it might think more intelligently than we do, and then where should we be? — Alan Turing, 1951.
In his highly enjoyable TED talk, Prof. Russell gives a few more such examples of goal-oriented robots (including an intelligent assistant whose goal is to keep your spouse happy all the time) and explain the potential danger and limitation of such systems.
And then, he talks about the one approach which can address this problem — provably beneficial AI.
The goal of such design approach is to ensure that eventuality like that of the paperclip maximizer cannot arise, by refocusing AI away from the capability to achieve arbitrary objectives and towards the ability and motivation to realize provably beneficial behavior.
And what is the meaning of being beneficial?
That, of course, depends on the properties and the features of humans’ and society’s collective behavior at large. This task, therefore, necessarily draws on the expertise and experience of a much larger pool of thinkers — social scientists, politicians, economists, psychologists — above and beyond the core community of AI researchers, machine learning engineers, and computer scientists.
This is the new way of thinking about designing safe and beneficial AI, and it is being christened as Human-compatible AI.
A new trio of laws
Along with the formal concept, Russell introduces three simple ideas to be at the center of the new design approach. You can call them the New Laws of Robotics if you wish,
Essentially, the intelligent agent, is not very intelligent or sure of its goals, at the beginning. It starts with a lot of uncertainty, just like a human baby, and then, slowly, it finds a sure footing, not only performing randomized searches and blindly following Q-learning strategies (as all the reinforcement learning courses will teach you) but also by trusting its parents — us humans — and observing them.
Summary and related ideas
This article has probably dragged beyond its usefulness already, but it will be unfair not to mention some relevant ideas. You are encouraged to Google and be more informed about them if you are interested in this space.
- Inverse reinforcement learning (in a co-operative game)
- The off-switch problem (for a robot)
- AI safety initiatives (refer to Max Tegmark’s book Life 3.0)
Thinking hard about AGI and its potential implications should not be limited to the academic research community. Everybody can participate with their ideas and inputs about the value system that such a future system should learn to imbibe.
When AGI comes, we may still turn into those gorillas (compared to the AGI) but have a healthy relationship nonetheless.
Some further reading
- https://www.technologyreview.com/s/527336/do-we-need-asimovs-laws/
- https://www.cs.huji.ac.il/~feit/papers/Asimov06.pdf
- https://wiki.lesswrong.com/wiki/Paperclip_maximizer
- https://www.danieldewey.net/tedxvienna.html
- https://alfredruth.com/en/ai-and-the-control-problem/
- https://futureoflife.org/wp-content/uploads/2017/01/Stuart-Russell-conference.pdf?x90991
- https://people.eecs.berkeley.edu/~russell/papers/russell-bbvabook17-pbai.pdf
Note about the original publication
Originally published in "The Startup" magazine on Medium.
Senior Software Engineer | Diversity & Inclusion Advocate
4 年Great article! I’d better save a link! Packed full of references that will help me get up to speed on the topic because I don’t work in this area. I recently started reading about the IEEE Global Initiative on the Ethics of A/IS too. My new “hobby” for 2020! Yes indeed!
Senior Director Product Management, Salesforce
5 年Good article and good point on "maximizing an objective function" at the core of AI. Personally, having worked before on resource optimizations to maximize ROI, that core understanding had made my switch to CV/deep-learning easier.