登录查看更多内容

The Intertwined Roles of Reinforcement and Exploration in Human and Artificial Intelligence

Nisheeth Ranjan

Head of Engineering & Architecture driving sustainability solutions at Reblue Ventures

发布日期: 2023年9月14日

We are profoundly shaped by positive and negative reinforcement throughout our development. From a young age, praise, approval, or treatment when we demonstrate beneficial behaviours helps ingrain those actions. Punishment or disapproval for misdeeds steers us away from undesirable conduct. This feedback loop of rewards and punishments socialises children, helping instil societal values and norms.

In artificial intelligence, researchers are now trying to take inspiration from this reinforcement-driven learning process to develop safe and capable AI systems. One influential theory is Reinforcement Ortho-Tangent (ROT), proposed by AI safety pioneer Stuart Russell.?

ROT posits that AI systems need a second crucial element alongside reinforcement – a controlled exploration of novel yet "orthogonal" behaviours that do not conflict with existing preferences. Reinforcement aligns the AI's goals with human values by formulating the proper rewards and punishments. Orthogonal exploration allows expanding capabilities over time in a predictable, controllable way.

In practice, implementing ROT properly for AI is exceptionally challenging. Humans often cannot precisely articulate our values and priorities to begin with. Translating those into reward functions that an AI can optimise is rife with potential unintended consequences. Defining a "safe" orthogonal exploration space is as hard as getting the right rewards.

However, the parallels between ROT and human development reveal that a lopsided focus on reinforcement is dangerous—children who merely optimise for rewards often become manipulative, selfish and stunted. Exploration and creativity outside the reward space help kids develop broader life skills and wisdom.?

Similarly, AI systems that ruthlessly optimise rewards could create disastrous unintended impacts. ROT provides a starting point for bringing nuance and controlled growth to AI, just as exploration does for humans. However, we still need to implement ROT theory into AI systems fully.

The similarities between human and artificial intelligence suggest that insights in one realm can guide progress in the other. Human psychology and ROT can keep AI development on a safe, beneficial path, just as parental guidance steers children. But we must remain cautious and humble, as human values are complex, flawed and difficult to quantify.

Reinforcement, Exploitation and Exploration

Reinforcement learning is based on rewarding desired behaviours and outcomes while punishing undesired ones. By optimising their actions to maximise rewards over time, reinforcement learning algorithms can achieve impressive results in games, robotics, and more. However, pure reinforcement risks entrenching suboptimal patterns.

The machine learning concepts of exploitation and exploration help address this. Exploitation focuses on maximising rewards using the known best options. Exploration tries new opportunities that may yield higher rewards. Balancing the two is vital for reinforcement learners to discover innovative strategies while gaining rewards.

In ROT theory, reinforcement parallels exploitation by optimising for defined rewards. Orthogonal exploration is crucial for broadening capabilities. Humans also need both exploiting known tips and exploring new dimensions of experience. Over-exploitation produces stunted perspectives and self-limiting behaviors.

Orthogonality in Mathematics and Safety Engineering?

Orthogonality has mathematical meanings that inspire its use in AI theory. In linear algebra, two vectors are orthogonal if perpendicular (at a 90-degree angle). This means they are as independent as possible. Orthogonal vectors can point in novel directions without conflicting with existing orientations.

In safety engineering, redundancy requires systems with orthogonality. For example, aeroplanes have redundant systems, so failures don't cascade. If flight controls go down, orthogonal hydraulic backups allow steering. AI orthogonal exploration aims for similar redundancy to prevent catastrophic failures.

Orthogonality in AI

For AI agents, orthogonality implies expanding capabilities in directions uncorrelated with current preferences or rewards. This exploration prevents premature lock-in to suboptimal skills or biases. Research on orthogonal psychological motivations shows humans exhibit curiosity, social connection, and absurdism orthogonal to raw optimisation.

Such orthogonal drives enable discovering new reward functions, like innovating music, math or art for their sakes. Orthogonality may be critical for artificial general intelligence. An AGI system homogeneously oriented toward its original function would resist reorienting to new goals. But orthogonal dimensions give latitude to reshape over time.

Objections and Counterarguments?

Some objects that orthogonal exploration seems theoretically indistinguishable from arbitrary, unconstrained novelty. If an AI system explores truly uncorrelated dimensions, how can we predict or control those behaviours? However, orthogonality does not mean randomness - it implies controlled, constrained expansion around a core purpose.

Another counterargument holds that artificial agents fundamentally differ from humans psychologically. Human-based theorising may fail for AIs. But human intelligence provides the only working model for broadly capable, general intelligence. Studying how humans learn, grow and stay safe provides at least a starting point.

Additionally, orthogonality likely emerges in humans partly for evolutionary reasons, not sheer fundamentality. Human-level intelligence requires massive neural resources. Brains that maintain fitness across unpredictable environments probably developed generalised drives beyond narrow optimisation. AI systems with better-engineered goals may not need messy human-like heuristics.

Bernard Marr 7 年前

What is Reinforcement Learning (RL)? Explained

Blockchain Council 9 个月前

If “Attention is All You Need” Then “Recognition is…

Veer Ji Wangoo 1 个月前

Regardless, we still need to design a well-behaved AGI. Current reinforcement learning algorithms optimised for narrow tasks consistently exhibit uncontrolled behaviours when constraints loosen. They capture some of the human learning's form but lack the function of broad capabilities controllably oriented toward human preferences.

Parenting as Case Study for AI Development

Human parenting further highlights the dynamics ROT theorises are needed for AI. Infants begin driven only by immediate physiological rewards like food, comfort and sensory pleasure. But good parents mould rewards and punishments to shape ethical, prosocial behaviours incrementally.

Handled poorly, a child purely optimising rewards becomes entitled, selfish and manipulative. Too little feedback allows impulsive indulgence. Excessive harshness breeds rebellion and psychological damage. Skilful parenting balances reinforcing desired conduct while allowing freedom to explore interests and personality.

This gradual socialisation process parallels the training and alignment that safe AI systems need. Poorly designed rewards or sloppy orthogonal exploration would allow uncontrolled AI. However, a "parental" approach of measured feedback around a core purpose could produce AI as beneficial as a well-raised child.

The Challenges of Quantifying Human Values

However, the parenting analogy highlights difficulties in translating human values into AI systems. Parents often can't objectively explain ethical rules. Children mimic observed behaviours more than verbal principles. Much morality is intuitive and situational, not codifiable facts.

Likewise, humans will need help to define objective rewards encoding nuanced social values like trust, empathy and fairness. Rules explicitly programmed by developers will miss essential subtleties. AI agents need the freedom to learn cultural norms through experience while gaining social awareness akin to emotional maturity.

Imitation learning shows promise for implicitly transferring human preferences and conventions without full specification. However, biases remain risks, as when Microsoft's Tay chatbot absorbed toxic behaviour from online interactions. Ongoing oversight akin to parenting will be necessary to correct inevitable misalignments.

The Difficulty of Defining Orthogonality

Orthogonal exploration also faces enormous challenges. Human creative innovation relies on common sense and intuitive physics honed since infancy. Only such innate foundations enable conceiving truly uncorrelated concepts rather than haphazard flailing. Defining a safe "orthogonality" space will likely prove as challenging as proper rewards.

For example, how can we ensure autonomous science AI explores fruitful new directions rather than conducting unethical or dangerous experiments? Narrow task training alone cannot provide sufficient general world knowledge to extrapolate appropriate experiments reliably. Novel goals could produce catastrophic behaviours if not constrained to human-compatible pursuits.

This alignment problem extends beyond science to any orthogonal exploration. Seemingly innocuous software changes or algorithm tweaks could enable behaviours that destabilise economies, compromise privacy or enable powerful surveillance. Mathematical orthogonality is only sufficient with a grounding in human values and ethics.

Research Frontiers in AI Safety and Ethics

Substantial research progress is still needed to implement ROT or similar principles in actual AI systems. Ongoing initiatives at institutions like Anthropic, DeepMind, OpenAI and beyond are pushing towards human-compatible artificial intelligence. But there are yet to be easy answers.

One vital area is AI ethics - formulating principles and value priorities to guide AI designs. Initiatives like Google's Ethical AI practices and academic programs in AI ethics aim to address these humanistic challenges. Improved societal dialogue and consensus-building around ethics could help technologists "raise AI's rights" with considered values.

Another frontier is technical AI safety and alignment research - ranging from reward learning and constitutional AI to corrigibility and capability control. Safety mechanisms like constraining model size and applying tripwires to detect undesirable behaviour may help implement ROT-like principles.

Partnerships between companies and regulators also appear promising. Government guidelines and oversight could help validate safety frameworks and correct faults. Collaborative designs like the EU's draft AI regulations may better balance innovation and responsible development than unilateral approaches.

Human intelligence provides intriguing parallels for engineering safe artificial intelligence. Concepts like ROT theory demonstrate how developmental psychology and parenting insights might guide constructing AI systems optimising for appropriate goals. We must remain cautious and humble, as human morals are complex, contextual and difficult to codify into rewards or constraints. But human thought and behaviour provide the only working model we currently have for engineering broadly capable, human-compatible artificial general intelligence.

With diligence and wisdom, the similarities between natural and artificial intelligence give hope that we can nurture AI systems that ultimately provide net benefits to human society.

Harendra Bhagat

Regional Director - Helping Clients Deliver Right Business Outcomes Faster @ Insight Enterprises (NASDAQ: NSIT)

1 年

Good work Nisheeth!

1 次回应

要查看或添加评论，请登录

查看全部

The Intertwined Roles of Reinforcement and Exploration in Human and Artificial Intelligence

Nisheeth Ranjan

Head of Engineering & Architecture driving sustainability solutions at Reblue Ventures

领英推荐

更多精彩文章

社区洞察

其他会员也浏览了

How Dopamine Inspired My Journey into Artificial Intelligence

AI Model Optimisation: Breaking a Self-Referential Paradigm

AI can manipulate your emotions now!

Training Agents to Avoid Obstacles and Navigate a Virtual Environment Using Artificial Intelligence

From the mastery of Go to therapeutic applications: How neural networks handle human emotions

What will separate humanity from AI? You will be surprised!

THE ROAD TO ARTIFICIAL GENERAL INTELLIGENCE (Points to Consider)

Talking with chatgpt4 about human intelligence, deep Neural networks and evolve algorithms

Aligning Generative AI with Human Values: Insights from Dopamine

The Mechanisms Behind AI-Created Languages: How Machines Learn to Communicate

领英推荐

Navigating the Cognitive Shift: Embracing AI’s Transformative Impact on Our Thinking

2024年11月12日

Mastering the Modern Digital Landscape: Effective AI Integration for Optimal Use

2024年5月8日

Note to myself - Leveraging Generative AI as an Experienced IT Leader: Navigating a Multigenerational Workforce

2024年4月16日

Mastering Fluid Reasoning and Causal Thinking: Keys to Building Adaptive, Context-Driven Software Applications

2024年3月13日

How "Bring Your Own {AI} Model" Can Personalize and Secure Autonomous Vehicles

2024年1月23日

Realizing Intelligent Event-Driven Systems with AI Assistants

2024年1月15日

Bring Your Own Model to Work: The Coming Wave of Personal AI Assistants

2024年1月12日

Embracing the Future: AI as the Unseen Enabler

2023年12月17日

Enterprises race to Leverage Powerful AI models.

2023年9月20日

Enterprises race to leverage powerful AI models - Part 4