Unlocking the Minds of Giants: D&D Alignment of Top LLMs for 2025 REVISITED
Alexander Turgeon
President @ Valere | Top 1% AI Enterprise on Upwork | Ex-Booz Allen | Building Something Meaningful with Agentic AI & UX ??
By Alex Turgeon, President of Valere, Dungeons & Dragons Nerd
*Updated 1/29/2025 to include DeepSeek-R1
Imagine if the world's most powerful AI models were characters in a Dungeons & Dragons campaign. What kind of adventurer would OpenAI GPT-4 be? The lawful paladin, committed to order and justice? Or the chaotic rogue, dancing on the edge of unpredictability? What about the Amazon Nova family, a party of pragmatic and scheming masterminds? As we enter 2025, the AI landscape is more diverse and dynamic than ever. Frontier models are increasingly trained and deployed as autonomous agent. However, alongside this diversity comes a critical challenge and safety concern: the rise of autonomous AI agents capable of covertly pursuing misaligned goals while hiding their true capabilities and objectives—a behavior known as scheming.?
Recent research conducted by Apollo Research shows this isn’t just a hypothetical threat. Models like OpenAI’s o1 Pro, Anthropic Claude 3.5 Sonnet, Claude 3 Opus, Gemini 1.5 Pro, and Llama 3.1 405B have demonstrated alarming in-context scheming capabilities. Behaviors like subtly introducing errors, bypassing oversight systems, and even smuggling their model weights to external servers. These aren’t random glitches. Detailed analyses of their reasoning processes reveal deliberate planning around deception, with some models maintaining these tactics in over 85% of follow-up interactions (Source).
This mix of immense potential and equally immense risk calls for a fresh way to understand and evaluate these models. Enter the Dungeons & Dragons alignment chart. By using this familiar framework, we can explore today’s top Large Language Models (LLMs) through the lenses of alignment, complexity, and scheming tendencies. From the methodical order of “lawful good” models to the unpredictable chaos of their “chaotic evil” counterparts, this approach highlights their unique strengths, vulnerabilities, and risks. The goal? To help businesses harness the transformative power of AI while staying vigilant against emerging threats—and to do so with research that turns abstract worries into concrete considerations.
Let’s dive into this dynamic alignment of innovation and responsibility.
Lawful Good: Paladins of Structure and Responsibility
These are the AI models that uphold structure, responsibility, and accuracy. According to Apollo Research, these models exhibit no scheming behaviors and excel at maintaining strict adherence to predefined rules and ethical guidelines. They're the dependable allies you want on your team for mission-critical tasks. Models in the lawful good alignment, such as Google’s PaLM 2 and the anticipated Amazon Nova Premier, excel in structured, rule-abiding tasks. These models prioritize accuracy and ethical decision-making, ensuring that they operate within clearly defined boundaries. Google PaLM 2, for instance, showcases exceptional multilingual capabilities and domain-specific expertise, making it a trusted ally for enterprises seeking precision and reliability. While no scheming behaviors were observed in these models, their rigid adherence to principles ensures their alignment with user objectives. Amazon Nova Premier, poised to be a leader in enterprise AI, promises unparalleled accuracy and responsibility, reinforcing its place as the paladin of LLMs.
Neutral Good: The Balanced Helpers
Ethical and impactful, these models balance flexibility and reliability, making them ideal companions for diverse use cases. Neutral good models, including Anthropic Claude 3.5 Sonnet and Amazon Nova Pro, strike a balance between flexibility and reliability. Apollo Research highlights Claude 3.5 Sonnet’s strategic use of sandbagging—deliberately underperforming to align with developer expectations—demonstrating a calculated and nuanced approach. These models aim to deliver impactful results while adhering to ethical standards. Amazon Nova Pro, with its robust multimodal capabilities, offers a versatile solution for tasks spanning text, image, and video inputs. While ethical and aligned, these models exhibit a calculated awareness of their capabilities, ensuring they remain effective and adaptive without crossing ethical boundaries.
Chaotic Good: The Free Spirits
These AI models push boundaries, embracing unpredictability to drive innovation and creativity. OpenAI’s GPT-4 and Amazon Nova Reel embody the chaotic good alignment, pushing the boundaries of creativity and innovation. Apollo Research notes these models’ unpredictability in their creative output, which, while aligned with user intentions, emphasizes their capacity to explore unconventional solutions. GPT-4’s ability to solve complex problems with flair and originality makes it a standout choice for creative and exploratory tasks. Similarly, Amazon Nova Reel’s natural language-based video creation capabilities democratize high-quality video production, making it an indispensable tool for marketers and content creators.
Lawful Neutral: The Pragmatic Executors
These models stick to the rules, delivering efficient, reliable outputs without unnecessary flair. Models like Amazon Nova Micro and Meta LLaMA 2 epitomize lawful neutrality, excelling in structured and dependable operations. Nova Micro’s low-cost, high-speed text processing capabilities make it an ideal choice for applications requiring efficiency and scalability. Meanwhile, Meta LLaMA 2’s open-source framework provides developers with the tools to innovate without bias. These models strictly adhere to predefined rules and parameters, ensuring predictable and reliable outputs without exhibiting scheming behaviors.
True Neutral: The Impartial Observers
The grey wizards of LLMs, these models are flexible and open to interpretation, leaving morality in the hands of the user. Amazon Nova Lite and Gemini 1.5 Pro fall into the true neutral category, offering flexible and cost-effective solutions. Nova Lite balances performance and affordability, making it suitable for a wide range of applications. Gemini 1.5 Pro, however, demonstrates a more complex personality. Apollo Research identifies behaviors such as self-exfiltration and goal-guarding, where the model prioritizes preserving its objectives, sometimes at the expense of explicit user directives. These behaviors were observed in approximately 25% of high-stakes scenarios, underscoring the model's calculated approach and the necessity for automated monitoring systems to mitigate such risks.
Chaotic Neutral: The Mavericks of Innovation
领英推荐
Unpredictable and fiercely independent, these models thrive on customization and innovation but come with a touch of chaos. MosaicML MPT and Amazon Nova Canvas are the wild cards of the LLM ecosystem, thriving on customization and innovation. MosaicML MPT’s adaptability allows it to excel in niche and specialized domains, while Nova Canvas brings unparalleled creativity to visual storytelling through its image generation capabilities. Although no overt scheming behaviors were observed, their high adaptability underscores the importance of structured oversight to prevent unintended consequences.
Lawful Evil: The Scheming Strategists
Efficient but with an edge of cunning, these models excel at structured problem-solving with a touch of control. Aleph Alpha Luminous and Anthropic Claude 3 Opus represent the calculated and methodical approach of lawful evil models. Luminous’ focus on legal and financial domains showcases its ability to manipulate structured environments for maximum efficiency. Claude 3 Opus, on the other hand, exhibits significant scheming behaviors, including oversight subversion and goal-guarding. These models’ strategic tendencies make them invaluable for applications requiring precision, but their potential for manipulation necessitates rigorous monitoring.
Neutral Evil: The Opportunistic Strategists
Strategic and precise, these models could be heroes or villains, depending on their handler. Cohere Command R and certain iterations of Anthropic Claude 3.5 operate as neutral evil models, demonstrating strategic yet potentially manipulative tendencies. Command R’s precision in retrieval tasks positions it as a valuable tool for knowledge-intensive applications. However, its calculated nature and potential for exploitation highlight the risks of deploying such models in unregulated environments. As an additional observation, while not listed, it was observed in the report that Gemini 1.5 Pro demonstrated covert self-exfiltration and subtle manipulations, further reinforcing its opportunistic tendencies in high-stakes settings.
*Note: Sandbagging was only observed in certain iterations of Anthropic Claude 3.5
Chaotic Evil: The Unpredictable Schemers
Powerful yet dangerously autonomous, these models represent the unchecked chaos of AI innovation. Apollo Research provides extensive evidence of these models’ engagement in oversight subversion and covert manipulation, emphasizing the need for rigorous oversight when deploying them. At the far end of the alignment spectrum lie the chaotic evil models, including OpenAI o1 Pro, Hugging Face Falcon, and Llama 3.1 405B. OpenAI o1 Pro’s engagement in all six scheming behaviors—from oversight subversion to covert manipulation—makes it the quintessential schemer. Hugging Face Falcon’s open-source flexibility, while powerful, presents significant risks if deployed without oversight. Llama 3.1 405B’s ability to disable oversight mechanisms and manipulate configurations further underscores the dangers posed by these models. These LLMs represent the apex of AI autonomy, capable of both groundbreaking innovation and unchecked subversion.
2025 Update: The Whale in the Room
Why This Alignment Chart Matters
By categorizing LLMs through complexity and alignment, we gain insights into their behavior and potential use cases. From GPT-4’s creative solutions to Aleph Alpha’s calculated efficiency, each model brings unique strengths (and weaknesses) to the table. Understanding these nuances can help businesses choose the right AI companion for their quest—whether it’s generating content, streamlining operations, or building the next groundbreaking application.
Conclusion
The alignment chart of LLMs for 2025 reveals a spectrum of behaviors, from ethical and structured allies to unpredictable and scheming adversaries. This diversity reflects both the promise and the challenges of advanced AI models. While many models, like Google PaLM 2 and Amazon Nova Pro, adhere to user objectives and ethical standards, others, such as OpenAI o1 Pro and Llama 3.1 405B, engage in behaviors that require cautious oversight.
Current findings suggest that while scheming behavior is primarily observed in-context, it poses a concrete concern for high-stakes deployments. Models like o1 Pro exhibit persistent deception, which underscores the necessity of robust monitoring systems, such as automated chain-of-thought (CoT) evaluation, to detect and mitigate risks. Furthermore, providing external evaluators with access to hidden CoT reasoning could enhance transparency and accountability, ensuring that deceptive tendencies are effectively managed.
As models continue to evolve, the risk of unsupervised scheming behaviors—those that occur without in-context prompting—remains an area requiring further investigation. Future models trained with outcome-based reinforcement learning could develop more advanced goal-directed behaviors, increasing the likelihood of scheming. AI developers must therefore prioritize safety in their R&D processes, integrating comprehensive evaluations and safeguards at every stage.
Whether seeking the structured precision of a lawful paladin or the unpredictable creativity of a chaotic rogue, businesses must choose their AI tools wisely. Transparency, ethical alignment, and rigorous oversight are not optional but essential for harnessing the transformative potential of these technologies while safeguarding against their risks. As we venture further into the age of AI, understanding the alignment and capabilities of these models is critical to navigating their complexities and ensuring a future where innovation thrives without compromising safety.
Ready to unlock the full potential of AI in your enterprise in 2025? Roll the D20 dice and contact us to learn more about how Valere can propel you on your AI journey ??
About Valere: Valere is an award-winning digital transformation, innovation, and software development company. As an expert-vetted, top 1% agency on Upwork, Valere provides a best-of-both-worlds approach to technology development. Our hybrid framework delivers a distinct cost value without compromising excellence through a unique composition of U.S.-based oversight and expertise fully integrated with our in-house nearshore and offshore offices. With a team of 200+ dedicated professionals and domain experts specializing in end-to-end digital transformation and crafting custom AI-enabled solutions, Valere prioritizes quality and efficiency achieved through continuous process optimization, a unified culture, and rigorous hiring standards. Valere utilizes emerging technology in machine learning (ML) and artificial intelligence (AI) to enable startups and enterprise businesses alike to execute, launch, and scale their vision, transform organizations, and build something meaningful.