A Philosophical Perspective on Alignment and Governance for LLM-based AI Systems and the resulting economic and sustainability impacts
Copyright: Sanjay Basu

A Philosophical Perspective on Alignment and Governance for LLM-based AI Systems and the resulting economic and sustainability impacts

tl;dr: Should we scorch the earth to know the mind of our creation?

The opacity of current artificial intelligence systems – in particular, large language models (LLMs) – has been a major roadblock to more widespread use of such systems not only in society as a whole but especially in safety-critical domains. The recent achievements by Anthropic in terms of achieving mechanistic interpretability offer us an important new path in the effort to demystify artificial minds. This article discusses some of the conceptual and practical issues surrounding the need to know how LLMs work and their importance for agency and governance in the AI-dominated future.

‘Thinking’ for themselves, as it were. Unlike logic-based rule systems of earlier attempts at AI, they don’t need to be fed elegant, hand-crafted instructions that dictate how to arrive at a solution. Instead, they come up with solutions ‘automatically’ by simply training on massive amounts of data. This adaptability makes such deep-learning neural networks configurable, flexible, and scalable. But it also makes them black boxes powering massively distributed and parallel real-time computing machinery over which we have much less control. This uncertainty is not just a matter of lacking fine-grained or just-in-time access to information.

Humans face the same problem in relation to each other’s minds. We don’t read each other’s minds, so our societies have evolved complex systems of legal, cultural, and social governance to enable and foster trust. They provide the transparency and accountability that underpin social interactions, assuring ourselves and each other that our behaviors are based on accepted norms and values. This is what we need it to be. As AI technologies continue to extend into the real world, we’ll have to design complementary systems of AI governance that can overcome the experience gulf between the inscrutability of machine cognition and the need for trust and reliability in AI applications.

The stakes for AI deployment are also extremely high, particularly in sectors such as medicine, law enforcement and insurance where wrong judgments could lead to fatal consequences, concealed biases, and moral concerns. But knowing how LLMs arrive at their conclusions is not just a technical issue; it’s a moral one. Were you exposed because the machine used to determine your exposure has a bias against you, or is its judgment free from prejudice?

This jump towards transparency – a challenge Anthropic has recently managed to conquer for LLMs partially – now comes thanks to the mechanistic interpretability of behavior, which directly links actual patterns of behavior of models to concrete and abstract concepts and allows researchers to manipulate them. As a result, users can alter the behavior of the model by rewiring a neuron. This enables us to determine more deliberately under what circumstances the model behaves in a certain way: how it has been trained and how to prevent ethically problematic behavior.

Philosophically, the drive to fathom AI mirrors humanity’s age-old search for self-understanding. Socrates may have favored the examined life, but we would do well to follow his lead in examining the software-driven minds we create. This is no mere technical exercise in comprehension. It’s about ensuring that AI behavior reflects humans' values and ethical norms.

This is the goal of alignment – ensuring that the purposiveness of an AI system’s goals and behaviors is well-aligned with the values of human society. To cite one of the simplest examples of the capabilities that Anthropic demonstrated, the ability to dial up and down the activity of features in LLMs gives AI systems a real and discernible form of steerability. We desperately need steerability to mitigate risk, especially in order to make AI systems that act in the interests of humanity. We want LLMs that have features turned up that make them competent and effective partners and have features turned down that would bias them against us or make them habitual liars, for instance.

And while the prospect of ‘gaming’ the inputs to LLMs might give rise to dystopian fears, things are a bit more nuanced in reality. The researchers at Anthropic are open about this, noting that the approaches they developed are less likely to be exploitative because simpler alternatives exist for generating undesirable content. Nevertheless, this doesn’t translate into an absence of robust governance regimes and, in particular, efforts to address questions about accountability. Good governance is focused on policies and standards that aim to ensure that AI systems are used responsibly and ethically, alongside better promotion of transparency and accountability with respect to AI developers and users.

For all the recent advances in mechanistic interpretability, we aren’t yet close to a full understanding of how LLMs ‘think’. Extracting the multiplicity of these models requires computational resources greater than what was used to train them in the first place. But the road to transparency is as important as it is difficult. Trying to understand where our current AI models fall short is just as vital as having them in the first place – and this will help us to realise these tools’ potential, while mitigating their dangers.

To be clear, the opacity of large language models sets significant challenges for us, but they’re not inevitable. It’s not too late to conduct the necessary research and philosophical reflection to uncover the workings of these artificial minds. And future developments in mechanistic interpretability suggest that bringing AI in line with human values and building effective governance for AI use is still possible. We can then set ourselves on the path to a future in which AI systems can be empowering, productive, and efficient while also being transparent, trustworthy, and value-aligned – precisely what is demanded by humanistic imperatives and could offer such immense value for our species.

?The search for an explanation and for control of AI reflects our larger search for knowledge and moral improvement. At the same time that revelations of how LLMs work illuminate AI’s dark future, they also bring us closer to a beneficial and supportive inclusion of AI in our lives. Indeed, such disclosures of process and design will lead to an ever-improving AI that serves humanity.


The quest for explainability in LLMs is likely to transform the computational landscape in a profound way: as researchers and developers work to expose the inner workings of these AI systems, the effects on the planet – technological, environmental and economic – will be substantial.

How do LLMs operate? Recent advances in mechanistic interpretability suggest that it will take a lot of computation to answer this question. These approaches, which require not only training big models but then significant amounts of post hoc analysis to reverse engineer what is happening within a trained model, likely place a burden on computation that exceeds the requirements of training. The task of identifying and rendering features discovered by LLMs is oftentimes even more computationally expensive than training the models themselves. This disproportionate computational overhead will incentivize parallelization, algorithmic optimization, and hardware efficiency.

While we haven’t yet determined the full environmental footprint of AI, data centers already consume so much energy – and emit tons of carbon – we can already expect climate impacts of further computational costs from additional explainability efforts. More powerful data centers requiring more advanced cooling for the extra ‘heat’ created by additional computations will all lead to more energy consumption overall. We urgently need to find ways to balance the call for explainability with sustainable energy efforts. Innovations in the design of energy-efficient hardware, renewable energy sources, and improved computational efficiency will be essential to minimize these climate impacts.

Also, the additional computational demands to make AI explainable will exponentially increase operational costs for firms using AI. New infrastructure is necessary, as well as larger energy demands and more sophisticated analytical tools. But the benefits of increased trust and reliability may justify the expenses, making AI systems even more trustworthy and appealing – enabling their wider use in sectors where proven results are especially vital, such as healthcare, finance and law enforcement. Transparent AI systems may prove more economically sustainable in the long run, reducing the risk of expensive mistakes, bias and ethical breaches.

Today’s LLMs are founded on transformers that are based on deep learning, which have been remarkably successful but alarmingly difficult to interpret and resource-intensive. As we move towards greater explainability, do we need new AI architectures?

Although transformers, with their predictive capacity, have changed the face of natural language processing, they are neither transparent nor economical. As shown by recent research into alternative architectures that may be more inspired by other computational paradigms, perhaps we will discover models that we understand more intuitively and that operate more effectively and, this time, more transparently while also being more sustainable, computationally and environmentally. Neuromorphic-style computing may be a step in this direction, as could quantum computing, which has the potential to solve problems far more effectively than its classical counterpart.

Explainability in AI is an interdisciplinary challenge. There are many reasons to be wary of the technical encroachment into mind-reading (and many merits in it). Here, I want to focus on environmental reasons to object to the prospect of knowing the mind of an LLM. As we try to understand the black-box nature of LLMs, there are environmental reasons to push for technical improvements in both computational efficiency and resource sustainability. There could also be environmental reasons to push for alternative AI architectures.

The path to explainable AI brings us back to the core value of grounding advanced progress toward a better world – in this case, a world that is increasingly sustainable, ethical, and humane. A reconciliation among transparency, sustainability, and innovation will allow us to realize the full power of AI while protecting our planet and economy for future generations.



Sanjay Goil

VP Product Management | Data and AI Platforms with Oracle Database

4 个月

Cris Pedregal Martin

回复
Eun "Michelle" Cho, CFP?, ChSNC, BFA

Invest with purpose and build a better world

4 个月

Would it make sense to feed AI system what happened the past, but also literatures, bible, Buddhist scriptures, etc that have some moral codes and ethics. I don’t know much about how LLM works but if we just feed historical data to train the AI, it will get it wrong because our history has been full of prejudice, injustice both social and environmental, and greed. Just a thought.

Lisa Myers

MyerDex Ltd,MyerDex Manufacturing,Ltd的子公司兼首席执行官Ferociously Fine,Ltd的首席执行官 Chief Executive Officer at MyerDex Ltd, a division of MyerDex Manufacturing,Ltd and CEO Ferociously Fine, Ltd

4 个月

I enjoyed reading your perspectives on the various aspects of this subject. The analogy of scorching the earth actually brought to mind how mankind has regarded the power of the sun very vividly. And how it took on mythical proportions in tales such as what happened with Icarus for one example. Taking on God like characteristics in so many civilizations over the millenia And studies of its scientific properties lead to ways to harness its power in so many different manners ranging all across the board - solar powered panels for homes etc. The parallels to the development of AI, transformers, LLM and our very early stage of understanding of said "thought processes" is, to me, very obvious. And from that I have great optimism that with diligence, care, time and dedication to logical procedures of discovery, we will be able to use each kernel of discovery as a basis for additional useful applications and an understanding of governing or ethical moral applications will evolve.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了