Solving the alignment problem in modern AI and a framework for going forward.
Photo by Andrea De Santis on Unsplash

Solving the alignment problem in modern AI and a framework for going forward.

The tsunami is approaching, and we can all see it. Artificial general intelligence is just around the corner, and we are experiencing the swell's impact before it strikes, unless numerous factors occur simultaneously to prevent it, including regulation, self-control, humility, and restraint from all universities, companies, and governments – not to mention the thousands of enthusiasts and unaffiliated individuals with internet access, a few graphics cards, and sufficient interest to forge ahead. However, I am not counting on this.

Throughout history, we have used our past experiences, laws, religions, and philosophies to guide us into the future. This approach has predominantly served us well. However, there are instances where our past and thought processes fail us. The models we construct, both individually and as entire societies, to extend our vision into the future beyond the visible horizon, have limitations.

Historical breakthroughs, such as harnessing fire, inventing the wheel, advancing through the metal ages, the rise of empires, the Gutenberg printing press, the Industrial Revolution, and the internet, as well as challenges like pandemics, plagues, wars, natural disasters, economic collapses, and droughts, all strain or break our models the first time we encounter them. These events can cause our incremental, day-to-day "routine" models to falter. Consequently, relying solely on hindsight and projecting forwards will inevitably fail us at some point or another.

Many important thinkers and minds including Stephen Hawking, Elon Musk, Nick Bostrom, Bill Gates, Max Tegmark, Sam harris and many others have voiced very publicly their concerns about going to fast towards the cliff of AGI without knowing what to do once we reach the precipice.

The reality of the intelligent behaviour we are starting to see nascent in LLMs (Large Language Models) surprised the builders of these technologies as much as the general public and media outlets around the world. The feeling, to many observers of the progress of AI, is like while arguing over the height of a mountain a bunch of veterans and expert climbers are escalating, one of the younger climbers looking around finds a door in the side of the mountain with a button saying "Lift to Peak" It looks almost too easy to be true and nobody saw it coming that way.

The actuality of the intelligence is a totally separate argument we shall address elsewhere (if at all possible). The reality of the situation is if you are going to race and there are no rules the fastest machine will almost always win against humans, and assuming machines are allowed to compete the technology to solve a problem as well as it can be described usually happens by the very fact that the problem is defined well.. it's a systems thinking staple. The fundamental stumbling block we face now is called the alignment problem.

The assumption that this or a soon to come AI framework emulates and eventually supersedes human cognitive capabilities is not a far cry, some fear it already started. Either way the problem is defined by this question :

"How do we ensure AGI systems align with our values despite their potential to surpass human intelligence and act in ways contrary to our interests?"

While at first glance it may appear to be a totally new problem, I believe that it is not. In my opinion, it is an old problem that humans have faced since they lived in packs or groups: parenting. The potential of raising children who are stronger, smarter, and better than you, yet aligned with your values and principles, and hopefully healthy and friendly, is a common hope for good or at least decent parents. The problems of parenting and developing artificial general intelligence (AGI) are remarkably analogous. We need to turn to the parts of our knowledge that apply to this domain to look for answers, guidelines, and frameworks.


Our understanding of educational theory has evolved immensely from innatism to the Renaissance with humanism, to the pan-theoretical explosion in the beginning of the 20th century with psychoanalysis, behaviourism (and their messy ethical experimentation regime), and then moving forward to modern approaches to education theories. In the late 1960s, education theory stopped looking at people like puppets of sexual/hunger desires, hosts for daemons, or short-term behavioural agents that can be explained with a small set of variables. More complex, "systems-like" approaches started to be explored and understood, where the child is an active part of forming the self in response and with direct interaction to the environment, not forgetting different proclivities that are carried through genetics and innate cognitive processes and structures that have evolved in the human hardware itself to jumpstart learning and formation.

In the context of AGI, we need to go through the same process of growth. These systems, to succeed in becoming a constructive agent in our societies to whatever degrees we are ready to allow and accept them, need to grow as we would a child.


AGI training has always been more or less stuck in the behaviourist paradigm, even down to the likeness of the Turing test itself. The arguments put forward by Searle, arguably a re-proposition of previous arguments by Leibniz and others, are strongly correlated to the concepts put forward by Skinner in his approach to education. Essentially, the system inside is either ignored or not even a "real thing," and the only scientific approach is the external behaviour of the system under investigation, be it an animal, child, or AGI. While it is not an obvious thing to argue, this approach withered a bit in the late 1970s thanks to the likes of Chomsky who started in the 1950s (particularly with respect to knowledge representation and language). He argued that not everything can be reduced to direct or indirect rewards that can be measured, but the complex behaviour humans exhibit (especially in relation to human language) still grows and self-defines based on internal capabilities. Chomsky's critique led to the development of computational and cybernetic interpretations of the machinery we humans are born with to build sentences from a young age, even sentences we have not heard before, out of words and syntax we learn. These approaches integrate both nature and nurture into the processes of learning.

Today's AI systems are largely based on large language models, with a focus on text and abstract thought. While other techniques are emerging for handling other types of media, I will focus this discussion on text and abstract thought as pertaining to the alignment problem. This is by no means attempting to be complete but indicative.

Learning from the theories and practices of Constructivism (Jean Piaget), Cognitivism (Ulric Neisser), and more recently Connectivism (George Siemens), among others, we can use the new modern frameworks and ideas we have for growing and educating children as a first framework to teach the system what is acceptable and non-acceptable behaviour for approaching the alignment problem.

However, there is an elephant in the room when it comes to children. We do not all agree on what is acceptable or non-acceptable behaviour across borders, or even within the same local society. Some people like leaving their children to make mayhem while others are very adamant about manners and politeness. Some parents are very direct and crude with their children, to sometimes what some might argue is "too much truth," while others would build up to a pedagogical approach where some information is forbidden until it's not, if ever. These same "issues," or rather divergences, are present in governments and religions, so this is not entirely a new problem.

In my honest opinion, the debates around acceptable behaviour are analogous to the problem of "one size fits all" versus customisability, a similar problem that many software systems and other products face. In the movie Interstellar, the writers and storytellers came up with a simple yet amazing idea of customising their AIs on the fly as the context or their comfort level required. For example, a statement such as "90% honesty and 75% humour" would give the AI a limit to honesty in favour of caring for other people's feelings, and humour would allow it to joke about things. We humans learn to modulate these on the fly, but a good start would be customisability settings for sensitivity, truth, political correctness, honest debate, and maybe many others.

The ideal situation would be to have AGIs that grow and develop, teach each other, and can adjust parameters depending on the context and situation. This might be the best starting point for moving forward. Teaching our digital children honesty, humour, politeness, and collaborative approaches could help us solve the problem of alignment.

Richard Bagnall, Hon FCIPR, FPRCA, FAMEC

PR Measurement Expert | Co-Founder of CommsClarity Consulting | Media Intelligence and Insights Leader | Former AMEC Chair | CIPR President’s medal | AMEC Don Bartholomew Award | Provoke Media Lifetime Achievement Sabre

1 年

Great thoughts David.

要查看或添加评论,请登录

David Saliba的更多文章

社区洞察

其他会员也浏览了