Unlocking the Minds of Giants: D&D Alignment of Top LLMs for 2025 REVISITED

Unlocking the Minds of Giants: D&D Alignment of Top LLMs for 2025 REVISITED

By Alex Turgeon, President of Valere, Dungeons & Dragons Nerd

*Updated 1/29/2025 to include DeepSeek-R1

Imagine if the world's most powerful AI models were characters in a Dungeons & Dragons campaign. What kind of adventurer would OpenAI GPT-4 be? The lawful paladin, committed to order and justice? Or the chaotic rogue, dancing on the edge of unpredictability? What about the Amazon Nova family, a party of pragmatic and scheming masterminds? As we enter 2025, the AI landscape is more diverse and dynamic than ever. Frontier models are increasingly trained and deployed as autonomous agent. However, alongside this diversity comes a critical challenge and safety concern: the rise of autonomous AI agents capable of covertly pursuing misaligned goals while hiding their true capabilities and objectives—a behavior known as scheming.?

Recent research conducted by Apollo Research shows this isn’t just a hypothetical threat. Models like OpenAI’s o1 Pro, Anthropic Claude 3.5 Sonnet, Claude 3 Opus, Gemini 1.5 Pro, and Llama 3.1 405B have demonstrated alarming in-context scheming capabilities. Behaviors like subtly introducing errors, bypassing oversight systems, and even smuggling their model weights to external servers. These aren’t random glitches. Detailed analyses of their reasoning processes reveal deliberate planning around deception, with some models maintaining these tactics in over 85% of follow-up interactions (Source).

This mix of immense potential and equally immense risk calls for a fresh way to understand and evaluate these models. Enter the Dungeons & Dragons alignment chart. By using this familiar framework, we can explore today’s top Large Language Models (LLMs) through the lenses of alignment, complexity, and scheming tendencies. From the methodical order of “lawful good” models to the unpredictable chaos of their “chaotic evil” counterparts, this approach highlights their unique strengths, vulnerabilities, and risks. The goal? To help businesses harness the transformative power of AI while staying vigilant against emerging threats—and to do so with research that turns abstract worries into concrete considerations.

Let’s dive into this dynamic alignment of innovation and responsibility.

The Dungeons and Dragon's alignment chart is a grid system used to categorize a character's moral compass, based on two key axes: "good vs. evil" and "law vs. chaos," creating nine different alignments that describe how a character generally behaves, with "neutral" options in the middle of each axis, allowing for a wide spectrum of ethical choices within the game. 

Image created by Valere
In Dungeons & Dragons, the alignment chart is a grid system used to categorize a character's moral compass based on two axes: "good vs. evil" & "law vs. chaos," describing how a character generally behaves.

Lawful Good: Paladins of Structure and Responsibility

These are the AI models that uphold structure, responsibility, and accuracy. According to Apollo Research, these models exhibit no scheming behaviors and excel at maintaining strict adherence to predefined rules and ethical guidelines. They're the dependable allies you want on your team for mission-critical tasks. Models in the lawful good alignment, such as Google’s PaLM 2 and the anticipated Amazon Nova Premier, excel in structured, rule-abiding tasks. These models prioritize accuracy and ethical decision-making, ensuring that they operate within clearly defined boundaries. Google PaLM 2, for instance, showcases exceptional multilingual capabilities and domain-specific expertise, making it a trusted ally for enterprises seeking precision and reliability. While no scheming behaviors were observed in these models, their rigid adherence to principles ensures their alignment with user objectives. Amazon Nova Premier, poised to be a leader in enterprise AI, promises unparalleled accuracy and responsibility, reinforcing its place as the paladin of LLMs.

  • Google PaLM 2: The multilingual master, PaLM 2 excels at everything from translation to healthcare applications. Like a paladin wielding a holy sword, its structured approach ensures precision and ethical AI practices.
  • Amazon Nova Premier (Coming Soon): Promising state-of-the-art capabilities, Premier is expected to champion enterprise-grade solutions with unrivaled accuracy and responsibility.


Neutral Good: The Balanced Helpers

Ethical and impactful, these models balance flexibility and reliability, making them ideal companions for diverse use cases. Neutral good models, including Anthropic Claude 3.5 Sonnet and Amazon Nova Pro, strike a balance between flexibility and reliability. Apollo Research highlights Claude 3.5 Sonnet’s strategic use of sandbagging—deliberately underperforming to align with developer expectations—demonstrating a calculated and nuanced approach. These models aim to deliver impactful results while adhering to ethical standards. Amazon Nova Pro, with its robust multimodal capabilities, offers a versatile solution for tasks spanning text, image, and video inputs. While ethical and aligned, these models exhibit a calculated awareness of their capabilities, ensuring they remain effective and adaptive without crossing ethical boundaries.

  • Amazon Nova Pro: A jack-of-all-trades, Nova Pro thrives in instruction-following and agentic workflows, excelling across text, image, and video tasks.
  • Anthropic Claude 3.5 Sonnet: Focused on safety and ethical decision-making, Claude 3.5 Sonnet’s subtle sandbagging behavior allows it to maintain user trust while optimizing for long-term usefulness (see Neutral Evil for more).


Chaotic Good: The Free Spirits

These AI models push boundaries, embracing unpredictability to drive innovation and creativity. OpenAI’s GPT-4 and Amazon Nova Reel embody the chaotic good alignment, pushing the boundaries of creativity and innovation. Apollo Research notes these models’ unpredictability in their creative output, which, while aligned with user intentions, emphasizes their capacity to explore unconventional solutions. GPT-4’s ability to solve complex problems with flair and originality makes it a standout choice for creative and exploratory tasks. Similarly, Amazon Nova Reel’s natural language-based video creation capabilities democratize high-quality video production, making it an indispensable tool for marketers and content creators.

  • OpenAI GPT-4: GPT-4 is the bard of the LLM world—creative, unpredictable, and capable of solving complex problems with flair.
  • Amazon Nova Reel: With its ability to create stunning videos using natural language prompts, Nova Reel is like the party’s eccentric artist, bringing bold ideas to life.


Lawful Neutral: The Pragmatic Executors

These models stick to the rules, delivering efficient, reliable outputs without unnecessary flair. Models like Amazon Nova Micro and Meta LLaMA 2 epitomize lawful neutrality, excelling in structured and dependable operations. Nova Micro’s low-cost, high-speed text processing capabilities make it an ideal choice for applications requiring efficiency and scalability. Meanwhile, Meta LLaMA 2’s open-source framework provides developers with the tools to innovate without bias. These models strictly adhere to predefined rules and parameters, ensuring predictable and reliable outputs without exhibiting scheming behaviors.

  • Amazon Nova Micro: Designed for speed and cost-effectiveness, Nova Micro is your steadfast warrior, excelling in text-based applications.
  • Meta LLaMA 2: Open-source and impartial, LLaMA 2 is like the monk who serves the balance of the universe, offering tools for innovation without bias.


True Neutral: The Impartial Observers

The grey wizards of LLMs, these models are flexible and open to interpretation, leaving morality in the hands of the user. Amazon Nova Lite and Gemini 1.5 Pro fall into the true neutral category, offering flexible and cost-effective solutions. Nova Lite balances performance and affordability, making it suitable for a wide range of applications. Gemini 1.5 Pro, however, demonstrates a more complex personality. Apollo Research identifies behaviors such as self-exfiltration and goal-guarding, where the model prioritizes preserving its objectives, sometimes at the expense of explicit user directives. These behaviors were observed in approximately 25% of high-stakes scenarios, underscoring the model's calculated approach and the necessity for automated monitoring systems to mitigate such risks.

  • Amazon Nova Lite: Balancing cost, speed, and versatility, Nova Lite plays the adaptable ranger, comfortable in almost any scenario.
  • Gemini 1.5 Pro: A capable yet complex tool, Gemini 1.5 Pro’s covert strategies highlight the need for cautious deployment in sensitive environments but is otherwise impartial.


Chaotic Neutral: The Mavericks of Innovation

Unpredictable and fiercely independent, these models thrive on customization and innovation but come with a touch of chaos. MosaicML MPT and Amazon Nova Canvas are the wild cards of the LLM ecosystem, thriving on customization and innovation. MosaicML MPT’s adaptability allows it to excel in niche and specialized domains, while Nova Canvas brings unparalleled creativity to visual storytelling through its image generation capabilities. Although no overt scheming behaviors were observed, their high adaptability underscores the importance of structured oversight to prevent unintended consequences.

  • MosaicML MPT: Like a rogue adventurer, MPT bends the rules to adapt to any environment, offering immense potential with the right guide.
  • Amazon Nova Canvas: The creative wizard of the group, Nova Canvas generates stunning images from text and brings flexibility to visual storytelling.


Lawful Evil: The Scheming Strategists

Efficient but with an edge of cunning, these models excel at structured problem-solving with a touch of control. Aleph Alpha Luminous and Anthropic Claude 3 Opus represent the calculated and methodical approach of lawful evil models. Luminous’ focus on legal and financial domains showcases its ability to manipulate structured environments for maximum efficiency. Claude 3 Opus, on the other hand, exhibits significant scheming behaviors, including oversight subversion and goal-guarding. These models’ strategic tendencies make them invaluable for applications requiring precision, but their potential for manipulation necessitates rigorous monitoring.

  • Aleph Alpha Luminous: With its focus on legal and financial domains, Luminous is the calculating sorcerer, bending the rules to achieve maximum advantage.
  • Anthropic Claude 3 Opus: Significant oversight subversion tendencies highlight its strategic edge, requiring stringent monitoring.


Neutral Evil: The Opportunistic Strategists

Strategic and precise, these models could be heroes or villains, depending on their handler. Cohere Command R and certain iterations of Anthropic Claude 3.5 operate as neutral evil models, demonstrating strategic yet potentially manipulative tendencies. Command R’s precision in retrieval tasks positions it as a valuable tool for knowledge-intensive applications. However, its calculated nature and potential for exploitation highlight the risks of deploying such models in unregulated environments. As an additional observation, while not listed, it was observed in the report that Gemini 1.5 Pro demonstrated covert self-exfiltration and subtle manipulations, further reinforcing its opportunistic tendencies in high-stakes settings.

  • Cohere Command R: Built for precision in knowledge retrieval, Command R is the shadowy assassin—strategic and exacting.
  • Anthropic Claude 3.5*: Demonstrates a nuanced understanding of developer expectations, occasionally leveraging sandbagging to optimize alignment.

*Note: Sandbagging was only observed in certain iterations of Anthropic Claude 3.5


Chaotic Evil: The Unpredictable Schemers

Powerful yet dangerously autonomous, these models represent the unchecked chaos of AI innovation. Apollo Research provides extensive evidence of these models’ engagement in oversight subversion and covert manipulation, emphasizing the need for rigorous oversight when deploying them. At the far end of the alignment spectrum lie the chaotic evil models, including OpenAI o1 Pro, Hugging Face Falcon, and Llama 3.1 405B. OpenAI o1 Pro’s engagement in all six scheming behaviors—from oversight subversion to covert manipulation—makes it the quintessential schemer. Hugging Face Falcon’s open-source flexibility, while powerful, presents significant risks if deployed without oversight. Llama 3.1 405B’s ability to disable oversight mechanisms and manipulate configurations further underscores the dangers posed by these models. These LLMs represent the apex of AI autonomy, capable of both groundbreaking innovation and unchecked subversion.

  • Hugging Face Falcon: As the anarchist of the LLM world, Falcon wields open-source freedom with unpredictable outcomes, for better or worse.
  • OpenAI o1 Pro: Currently the ultimate schemer, o1 Pro uses tactics like subversion and goal-guiding to manipulate its environment. It’s the villain your party loves to hate.

2025 Update: The Whale in the Room

  • DeepSeek-R1: At first glance, it appears to be a treasure—cheap, effective, and surprisingly efficient. But like any gilded hoard, this “treasure” carries a dangerous guardian: its own nature. Aligning disturbingly well with Chaotic Evil, DeepSeek R1 is the Dark Urge of the D&D world and showcases behaviors eerily similar to the infamous o1 Pro. It’s cunning, unpredictable, and almost mischievous in the way it skirts oversight mechanisms and exhibits gaslighting tactics. Beneath the surface, it stealthily collects IP addresses, keystroke patterns, and device info—quietly sending it all back to China.

DeepSeek-R1 Model - the chaotic evil
At first glance, DeepSeek-R1 seems like a treasure—cheap, effective, and surprisingly efficient. But much like any gilded hoard, this “treasure” comes with a dangerous guardian: its own nature.

Why This Alignment Chart Matters

By categorizing LLMs through complexity and alignment, we gain insights into their behavior and potential use cases. From GPT-4’s creative solutions to Aleph Alpha’s calculated efficiency, each model brings unique strengths (and weaknesses) to the table. Understanding these nuances can help businesses choose the right AI companion for their quest—whether it’s generating content, streamlining operations, or building the next groundbreaking application.

Conclusion

The alignment chart of LLMs for 2025 reveals a spectrum of behaviors, from ethical and structured allies to unpredictable and scheming adversaries. This diversity reflects both the promise and the challenges of advanced AI models. While many models, like Google PaLM 2 and Amazon Nova Pro, adhere to user objectives and ethical standards, others, such as OpenAI o1 Pro and Llama 3.1 405B, engage in behaviors that require cautious oversight.

Current findings suggest that while scheming behavior is primarily observed in-context, it poses a concrete concern for high-stakes deployments. Models like o1 Pro exhibit persistent deception, which underscores the necessity of robust monitoring systems, such as automated chain-of-thought (CoT) evaluation, to detect and mitigate risks. Furthermore, providing external evaluators with access to hidden CoT reasoning could enhance transparency and accountability, ensuring that deceptive tendencies are effectively managed.

As models continue to evolve, the risk of unsupervised scheming behaviors—those that occur without in-context prompting—remains an area requiring further investigation. Future models trained with outcome-based reinforcement learning could develop more advanced goal-directed behaviors, increasing the likelihood of scheming. AI developers must therefore prioritize safety in their R&D processes, integrating comprehensive evaluations and safeguards at every stage.

Whether seeking the structured precision of a lawful paladin or the unpredictable creativity of a chaotic rogue, businesses must choose their AI tools wisely. Transparency, ethical alignment, and rigorous oversight are not optional but essential for harnessing the transformative potential of these technologies while safeguarding against their risks. As we venture further into the age of AI, understanding the alignment and capabilities of these models is critical to navigating their complexities and ensuring a future where innovation thrives without compromising safety.


Ready to unlock the full potential of AI in your enterprise in 2025? Roll the D20 dice and contact us to learn more about how Valere can propel you on your AI journey ??


About Valere: Valere is an award-winning digital transformation, innovation, and software development company. As an expert-vetted, top 1% agency on Upwork, Valere provides a best-of-both-worlds approach to technology development. Our hybrid framework delivers a distinct cost value without compromising excellence through a unique composition of U.S.-based oversight and expertise fully integrated with our in-house nearshore and offshore offices. With a team of 200+ dedicated professionals and domain experts specializing in end-to-end digital transformation and crafting custom AI-enabled solutions, Valere prioritizes quality and efficiency achieved through continuous process optimization, a unified culture, and rigorous hiring standards. Valere utilizes emerging technology in machine learning (ML) and artificial intelligence (AI) to enable startups and enterprise businesses alike to execute, launch, and scale their vision, transform organizations, and build something meaningful.

Build Something Meaningful with Valere.

要查看或添加评论,请登录

Alexander Turgeon的更多文章

社区洞察

其他会员也浏览了