The Intertwined Roles of Reinforcement and Exploration in Human and Artificial Intelligence
Nisheeth Ranjan
Head of Engineering & Architecture driving sustainability solutions at Reblue Ventures
We are profoundly shaped by positive and negative reinforcement throughout our development. From a young age, praise, approval, or treatment when we demonstrate beneficial behaviours helps ingrain those actions. Punishment or disapproval for misdeeds steers us away from undesirable conduct. This feedback loop of rewards and punishments socialises children, helping instil societal values and norms.
In artificial intelligence, researchers are now trying to take inspiration from this reinforcement-driven learning process to develop safe and capable AI systems. One influential theory is Reinforcement Ortho-Tangent (ROT), proposed by AI safety pioneer Stuart Russell.?
ROT posits that AI systems need a second crucial element alongside reinforcement – a controlled exploration of novel yet "orthogonal" behaviours that do not conflict with existing preferences. Reinforcement aligns the AI's goals with human values by formulating the proper rewards and punishments. Orthogonal exploration allows expanding capabilities over time in a predictable, controllable way.
In practice, implementing ROT properly for AI is exceptionally challenging. Humans often cannot precisely articulate our values and priorities to begin with. Translating those into reward functions that an AI can optimise is rife with potential unintended consequences. Defining a "safe" orthogonal exploration space is as hard as getting the right rewards.
However, the parallels between ROT and human development reveal that a lopsided focus on reinforcement is dangerous—children who merely optimise for rewards often become manipulative, selfish and stunted. Exploration and creativity outside the reward space help kids develop broader life skills and wisdom.?
Similarly, AI systems that ruthlessly optimise rewards could create disastrous unintended impacts. ROT provides a starting point for bringing nuance and controlled growth to AI, just as exploration does for humans. However, we still need to implement ROT theory into AI systems fully.
The similarities between human and artificial intelligence suggest that insights in one realm can guide progress in the other. Human psychology and ROT can keep AI development on a safe, beneficial path, just as parental guidance steers children. But we must remain cautious and humble, as human values are complex, flawed and difficult to quantify.
Reinforcement, Exploitation and Exploration
Reinforcement learning is based on rewarding desired behaviours and outcomes while punishing undesired ones. By optimising their actions to maximise rewards over time, reinforcement learning algorithms can achieve impressive results in games, robotics, and more. However, pure reinforcement risks entrenching suboptimal patterns.
The machine learning concepts of exploitation and exploration help address this. Exploitation focuses on maximising rewards using the known best options. Exploration tries new opportunities that may yield higher rewards. Balancing the two is vital for reinforcement learners to discover innovative strategies while gaining rewards.
In ROT theory, reinforcement parallels exploitation by optimising for defined rewards. Orthogonal exploration is crucial for broadening capabilities. Humans also need both exploiting known tips and exploring new dimensions of experience. Over-exploitation produces stunted perspectives and self-limiting behaviors.
Orthogonality in Mathematics and Safety Engineering?
Orthogonality has mathematical meanings that inspire its use in AI theory. In linear algebra, two vectors are orthogonal if perpendicular (at a 90-degree angle). This means they are as independent as possible. Orthogonal vectors can point in novel directions without conflicting with existing orientations.
In safety engineering, redundancy requires systems with orthogonality. For example, aeroplanes have redundant systems, so failures don't cascade. If flight controls go down, orthogonal hydraulic backups allow steering. AI orthogonal exploration aims for similar redundancy to prevent catastrophic failures.
Orthogonality in AI
For AI agents, orthogonality implies expanding capabilities in directions uncorrelated with current preferences or rewards. This exploration prevents premature lock-in to suboptimal skills or biases. Research on orthogonal psychological motivations shows humans exhibit curiosity, social connection, and absurdism orthogonal to raw optimisation.
Such orthogonal drives enable discovering new reward functions, like innovating music, math or art for their sakes. Orthogonality may be critical for artificial general intelligence. An AGI system homogeneously oriented toward its original function would resist reorienting to new goals. But orthogonal dimensions give latitude to reshape over time.
Objections and Counterarguments?
Some objects that orthogonal exploration seems theoretically indistinguishable from arbitrary, unconstrained novelty. If an AI system explores truly uncorrelated dimensions, how can we predict or control those behaviours? However, orthogonality does not mean randomness - it implies controlled, constrained expansion around a core purpose.
Another counterargument holds that artificial agents fundamentally differ from humans psychologically. Human-based theorising may fail for AIs. But human intelligence provides the only working model for broadly capable, general intelligence. Studying how humans learn, grow and stay safe provides at least a starting point.
Additionally, orthogonality likely emerges in humans partly for evolutionary reasons, not sheer fundamentality. Human-level intelligence requires massive neural resources. Brains that maintain fitness across unpredictable environments probably developed generalised drives beyond narrow optimisation. AI systems with better-engineered goals may not need messy human-like heuristics.
领英推荐
Regardless, we still need to design a well-behaved AGI. Current reinforcement learning algorithms optimised for narrow tasks consistently exhibit uncontrolled behaviours when constraints loosen. They capture some of the human learning's form but lack the function of broad capabilities controllably oriented toward human preferences.
Parenting as Case Study for AI Development
Human parenting further highlights the dynamics ROT theorises are needed for AI. Infants begin driven only by immediate physiological rewards like food, comfort and sensory pleasure. But good parents mould rewards and punishments to shape ethical, prosocial behaviours incrementally.
Handled poorly, a child purely optimising rewards becomes entitled, selfish and manipulative. Too little feedback allows impulsive indulgence. Excessive harshness breeds rebellion and psychological damage. Skilful parenting balances reinforcing desired conduct while allowing freedom to explore interests and personality.
This gradual socialisation process parallels the training and alignment that safe AI systems need. Poorly designed rewards or sloppy orthogonal exploration would allow uncontrolled AI. However, a "parental" approach of measured feedback around a core purpose could produce AI as beneficial as a well-raised child.
The Challenges of Quantifying Human Values
However, the parenting analogy highlights difficulties in translating human values into AI systems. Parents often can't objectively explain ethical rules. Children mimic observed behaviours more than verbal principles. Much morality is intuitive and situational, not codifiable facts.
Likewise, humans will need help to define objective rewards encoding nuanced social values like trust, empathy and fairness. Rules explicitly programmed by developers will miss essential subtleties. AI agents need the freedom to learn cultural norms through experience while gaining social awareness akin to emotional maturity.
Imitation learning shows promise for implicitly transferring human preferences and conventions without full specification. However, biases remain risks, as when Microsoft's Tay chatbot absorbed toxic behaviour from online interactions. Ongoing oversight akin to parenting will be necessary to correct inevitable misalignments.
The Difficulty of Defining Orthogonality
Orthogonal exploration also faces enormous challenges. Human creative innovation relies on common sense and intuitive physics honed since infancy. Only such innate foundations enable conceiving truly uncorrelated concepts rather than haphazard flailing. Defining a safe "orthogonality" space will likely prove as challenging as proper rewards.
For example, how can we ensure autonomous science AI explores fruitful new directions rather than conducting unethical or dangerous experiments? Narrow task training alone cannot provide sufficient general world knowledge to extrapolate appropriate experiments reliably. Novel goals could produce catastrophic behaviours if not constrained to human-compatible pursuits.
This alignment problem extends beyond science to any orthogonal exploration. Seemingly innocuous software changes or algorithm tweaks could enable behaviours that destabilise economies, compromise privacy or enable powerful surveillance. Mathematical orthogonality is only sufficient with a grounding in human values and ethics.
Research Frontiers in AI Safety and Ethics
Substantial research progress is still needed to implement ROT or similar principles in actual AI systems. Ongoing initiatives at institutions like Anthropic, DeepMind, OpenAI and beyond are pushing towards human-compatible artificial intelligence. But there are yet to be easy answers.
One vital area is AI ethics - formulating principles and value priorities to guide AI designs. Initiatives like Google's Ethical AI practices and academic programs in AI ethics aim to address these humanistic challenges. Improved societal dialogue and consensus-building around ethics could help technologists "raise AI's rights" with considered values.
Another frontier is technical AI safety and alignment research - ranging from reward learning and constitutional AI to corrigibility and capability control. Safety mechanisms like constraining model size and applying tripwires to detect undesirable behaviour may help implement ROT-like principles.
Partnerships between companies and regulators also appear promising. Government guidelines and oversight could help validate safety frameworks and correct faults. Collaborative designs like the EU's draft AI regulations may better balance innovation and responsible development than unilateral approaches.
Human intelligence provides intriguing parallels for engineering safe artificial intelligence. Concepts like ROT theory demonstrate how developmental psychology and parenting insights might guide constructing AI systems optimising for appropriate goals. We must remain cautious and humble, as human morals are complex, contextual and difficult to codify into rewards or constraints. But human thought and behaviour provide the only working model we currently have for engineering broadly capable, human-compatible artificial general intelligence.
With diligence and wisdom, the similarities between natural and artificial intelligence give hope that we can nurture AI systems that ultimately provide net benefits to human society.
Regional Director - Helping Clients Deliver Right Business Outcomes Faster @ Insight Enterprises (NASDAQ: NSIT)
1 年Good work Nisheeth!