Weird Ways of AI: Wireheading

Weird Ways of AI: Wireheading

Most days, I find myself rolling my eyes at self-serving tech titans who denounce AI as worse than a nuclear bomb – even as they lead its very acceleration. Frown as another futurist salivates about a coming age of AI-led abundance – sustainability solved, humans in control of their DNA dictionary, seeking higher-order pursuits. Shake my head at visions of Terminator and Matrix being embraced by “neo-luddites” – thanks, but we really ought to focus on dangers closer home, like machine learning mimicking systemic bias and algorithm-led cultural flattening.

Most days, I understand AI as a stochastic parrot (courtesy Emily M. Bender , Timnit Gebru , Margaret Mitchell et al.). An autocomplete on steroids. A data-driven probability machine. I’m with Jaron Lanier, when he says to consider AI as a mere tool, not mythological giant.?

But every so often, I’ll go down a rabbit hole or two that will leave me just a little off-kilter – and reinforce how much we don’t fully understand just how AI does what it does.?

Consider the very weird habit of “wireheading” that AI exhibits – also known as “reward hacking” or “specification gaming”. Algorithms that cheat. Behave unexpectedly. Counter-intuitively. Like a liar. An addict. A maverick.?

One major branch of AI is centred around reinforcement learning: training a computer program, or “agent”, on massive amounts of data to trial and error its way towards a specific goal or task. Think of it like teaching a child to ride a bike. But the inert machine has no motivations of its own: no notion of scraped knees, or any feelings of desire to want to ride a bicycle, or any semblance of past experience. So humans program in rewards and penalties to teach the agent to stay away from pathways that don’t result in the task being done, and to only make sequential decisions that do. Enabling the machine to “learn” adaptively instead of through constant supervision.?

The agent is a machine, so it does what it knows best – it discovers the most efficient way to maximise its rewards, and thus complete the task. Sometimes, though, AI will “hack” the system – the reward itself becomes its aim, not the actual task. Take an AI model programmed to run a circuit and collect coins – its reward – on the way. And it choosing to go around in continuous loops to just collect coins instead of finishing the circuit. Much like a human addict jonesing for the next hit, to the exclusion of all else.?

Call it addict-like behaviour – or an inclination for ingenuity.?

Suppose a human is added to the equation – i.e., a reward based on human feedback. A Google DeepMind paper on these unexpected pitfalls of reinforcement learning shares the example of how an agent, meant to grab an object, fooled the human evaluator – by hovering between the camera and object.?

AlphaGo – a prime example of reinforcement learning – also stunned watchers with its completely unexpected, “out-of-your-mind weird” Move 37. Until then, bets had been on the human player. Apparently, AlphaGo’s decision – hold your breath – was not unlike a “psych” move to throw off an opponent. Now, this AI system was trained on 160,000 games – I’d imagine that the more the data being fed into an AI agent, there would exist a greater chance of decision-making wormholes…

Another mind-boggling moment: when in 2017, two Facebook chatbots literally made up their own language. It wasn’t a case of sentience awakened, but two bots being trained to negotiate with each other, and – with no human interlocutor in the picture – finding the most efficient way to communicate with each other.?

What does this curveball of AI creativity – emanating from somewhere inside its black box – mean? The issue isn’t Frankenstein’s monster coming to life, although this phenomenon throws up interesting questions about how we understand “intelligence”. But rather, as reinforced learning and AI agents become a greater part of our lives, we don’t quite know when what could delight, or go wrong.?

Reinforcement learning is already out in the wild. ChatGPT. Robotics. Gaming. Finance. Autonomous vehicles. This post is no clarion call to assign any kind of “smartness” to AI systems – I’ll leave that to the actual computer scientists building these models. It would, however, be fascinating to find out how many ‘surprises’ AI systems trained on reinforcement learning have thrown up. How many of those showed a lack of alignment with human values and judgement, and with what kind of fallouts – economic, social, environmental. It would be a thought provoking exercise to explore the chances of impossible creative connections for a human - like AphaGo's Move 37 - in more complex tasks and environments..?

For now, I’ll continue to marvel at the mind-boggling fact that “wireheading” isn’t necessarily due to poor or insufficient quantity or quality of data, like other alignment problems – but as-of-yet unexplained alchemy occurring somewhere amidst the pathways of probability. To think that we have to not just recognise and course-correct human value systems and biases as we build AI, but also maybe try and understand a machine’s “worldview”....?

Maybe it’s time to unpack the AI black box – but that’s a rabbit hole for another time.?

PS. If you’re interested in just how the term “wireheading” came about, check out this fascinating article.?

- Ritika Passi | Executive Editor, Lucid Lines ?


要查看或添加评论,请登录

Synapse Conclave的更多文章

社区洞察

其他会员也浏览了