The road to AGI (and beyond): it's all about human alignment
Javier Marin
AI Innovation Leader & Business Catalyst | Turning Complex Tech into Market-Moving Solutions | 20+ Years Building Tomorrow's Digital Infrastructure
"Being better at making decisions is not the same as making better decisions. No matter how excellently an algorithm maximizes, and no matter how accurate its model of the world, a machine's decisions may be ineffably stupid, in the eyes of an ordinary human, if its utility function is not well aligned with human values."
Stuart Russell. Professor of Computer Science, Director, Center for Intelligent Systems, Smith-Zadeh Chair in Engineering, UC Berkeley
What do you think about machines that think? (back in 2015)
edge.org is a place that describes itself as arriving at the frontier of the world's knowledge, seeking out the most complex and intelligent minds, putting them in a room together, and having them ask each other the questions they are asking themselves). Back in 2015, they introduced an interesting annual question: What do you think about machines that think? Some notable scientists (known as "edgies") penned their responses when AI was still on the edge of knowledge and LLMs—generative AI did not exist (transformer architecture was developed three years later). It's worth reading some of these answers to understand how difficulties remain mostly unchanged, despite our great advances in this area. Despite this, there are significant differences between today's reality and what was projected just nine years ago. For example, data privacy was not a big deal a decade ago, but it is one of the most important concerns in 2024 (and 2025 and so on). Another striking distinction is that nine years ago, no one believed AI could be creative due to its deterministic behavior founded on classical physical laws. However, it is now accepted that advanced LLM models are capable of innovating and even creating (and hence have free choice).
AI models aligned with human values
Stuart Russell , a consistent and clever AI voice, has been telling us for years that one of the most pressing concerns facing AI is its alignment with our (humans, companies, etc.) interests. Creating complex, accurate, and not "stupid" machines remains a big challenge for AI model developers, with alignment being a top priority for the world's leading AI labs (including OpenAI and its superalignment program). Indeed, LLM could provide user-unhelpful, toxic, or otherwise false outcomes because they aren't primarily designed with end users in mind. The first models introduced in 2022 were rather easy to align. Since then, we have made progress on aligning language models by training them to act in accordance with the user’s intention. In many applications, fine-tuning the models has been sufficient to ensure that their responses are well aligned with the prompter's requirements. As AI systems become more sophisticated (with more parameters and complex?algorithms), alignment becomes increasingly challenging. If we also include many agents interacting with one another in order to provide a response, the problem becomes more challenging.
领英推荐
Beyond reinforcement learning from human feedback (RLHF)
Currently, RLHF (reinforcement learning from human feedback) is one of the most used methods. The basic idea of RLHF is as follows: an AI system is taught new things, evaluated by humans for how good or bad responses are, and then trained again to reward good behavior and penalize bad. In such an approach, it picks up on people's tastes. But this method will soon have some limitations. As Leopold Aschenbrenner shows in his intelligent personal project SituationalAwareness.ai, as AI systems become more intelligent, current techniques like RLHF will fail. Consider this: we challenge a super intelligent model to solve a long-standing mathematical problem, such as Hilbert's thirteenth problem (find an algebraic solution to a 7th-degree equation involving two parameters and a variable that is both continuous and algebraic in nature). Because we humans actually don't know the answer, we are unable to rank the output as good or bad; hence, we can't reward the model's good behavior and penalize bad behavior. We can ask the world's best mathematicians about these kinds of problems. There will most likely be no more than ten people today able to understand this problem; therefore, the verification process appears to be rather inefficient. However, there will come a point when the models generate such complex responses that no one will be able to validate or reject them. This technical challenge is currently a top priority in the most cutting-edge AI labs. Aschenberger explains it very clearly: "People often associate alignment with some complicated questions about human values, or jump to political controversies, but deciding on what behaviors and values to instill in the model, while important, is a separate problem. The primary problem is that for whatever you want to instill the model (including ensuring very basic things, like “follow the law”!) we don’t yet know how to do that for the very powerful AI systems we are building very soon."
A business vision
It appears that there is now some competition in the business world to see who can implement the most powerful models. And the task is not like that. The challenge is to align the models, even for the most simple questions. Business scenarios sometimes entail complex, multifaceted decisions that are difficult to distill into simple reward functions. The interactions of multiple stakeholders (employees, consumers, shareholders, and regulators) lead to a complex value landscape. Business settings change quickly, making it difficult to stay aligned over time. Models trained using historical data and existing knowledge may become mismatched as business contexts change. Many companies struggle to close the gap between AI expertise and domain-specific business knowledge. This might lead to a misalignment between AI capabilities and business requirements. Additionally, AI decisions might have far-reaching and unforeseen consequences. Current alignment approaches may not fully account for these long-term, systemic consequences.
Current AI alignment methods have limits in corporate environments, and addressing these constraints needs a multifaceted approach. It requires new approaches to regulation, ethics, and organization, in addition to technological solutions. The emphasis must move from easy set up to careful, comprehensive alignment that takes into account the complex, ever-changing nature of business environments and the various stakeholders engaged as AI systems grow increasingly powerful and ubiquitous in the corporate world. Building AI systems that are in sync with corporate goals, ethical principles, and society's values is more important than simply deploying powerful AI.