Training Agentic Graph Systems for Orchestration: Beyond Hardcoded Workflows
Jon Brewton
Founder and CEO - USAF Vet; M.Sc. Eng; MBA; HBAPer: Data Squared has Created the Only Hallucination Resistant and Fully Explainable AI Solution Development Platform in the world!
At Data2, we're confronting one of the most significant challenges in modern AI: the gap between true agency and the hardcoded workflows that merely mimic it. While much of the industry celebrates "AI agents," we've recognized that most of these systems are fundamentally constrained by predetermined orchestration patterns rather than exhibiting genuine decision-making autonomy.
The Illusion of Agency
Current AI systems labeled as "agents" typically operate within strict boundaries:
- They follow rigid, predetermined sequences of operations
- They lack the ability to dynamically determine when to use different capabilities
- They struggle with contradictory information from different sources
- They create the illusion of agency through complex but ultimately inflexible workflows
As we've learned through our work with knowledge graphs and AI integration, Large Language Models (LLMs) possess powerful capabilities like reasoning, search, memory, and planning, but they're not trained to orchestrate these abilities effectively. The result? Systems that appear intelligent but break down when facing novel scenarios requiring adaptive capability deployment.
From Prompt Engineering to Reward Engineering
The field of AI is witnessing a paradigm shift from traditional prompt engineering to a more powerful approach that could be called "reward engineering." This fundamental change in how AI systems learn to orchestrate their capabilities offers several significant advantages:
- Outcome-Focused Design: Rather than prescribing exact procedural steps, this approach defines what success looks like and allows the system to discover how to achieve it
- Experiential Learning: Models can discover optimal orchestration strategies through experience and iteration
- Adaptive Behavior Development: Reinforcement learning enables truly adaptive behaviors to emerge from simple reward signals
- Novel Situation Handling: The resulting systems can handle previously unseen scenarios without requiring explicit retraining
The Search-R1 methodology exemplifies this approach, demonstrating how reinforcement learning can teach LLMs to autonomously decide when to search for information during reasoning. Instead of following explicit rules about when to search, models learn through experience to recognize situations where searching would provide value—a capability that proves difficult to encode through traditional prompting methods.
Link to video overview of the process: https://youtu.be/JIsgyk0Paic?si=ddPGN7LMWxudx_OY
Research Paper: https://arxiv.org/abs/2503.09516
Special thanks to Anthony Alcaraz for the information share
Why Knowledge Graphs Are Essential for True Agency
Our work with graph-based systems has revealed why they form the optimal foundation for agentic AI:
- Rich State Representation: Graphs provide clear state representations for RL algorithms, where nodes represent knowledge states and edges represent actions or transitions.
- Decision Pathway Modeling: Unlike linear sequences, graphs can represent branching decision paths, maintaining rich contextual relationships between options.
- Flexible Traversal: Graph structures support dynamic navigation through complex decision spaces, essential for adapting to novel scenarios.
- Context Preservation: Knowledge graphs maintain relationships between concepts, enabling more sophisticated reasoning about when to deploy specific capabilities.
Our reView platform leverages these properties to create AI systems that go beyond mere tool use to achieve genuine orchestration intelligence.
Technical Innovations: Retrieved Token Masking
A critical innovation in our approach is "retrieved token masking," which prevents optimization of tokens from external sources while allowing the model to learn effective query generation and reasoning strategies. This technique solves a fundamental challenge in applying reinforcement learning to systems that incorporate external knowledge:
- It ensures policy gradient updates affect only the model's orchestration decisions
- It prevents the model from gaming the reward function by manipulating external content
- It maintains the integrity of retrieved information while optimizing how it's utilized
- It creates a clean separation between knowledge access and decision-making
This approach allows our systems to learn when and how to access external knowledge sources without compromising the quality or reliability of the information retrieved.
Beyond Correctness: Outcome-Based Reward Functions
Our research has shown that simple outcome-based reward functions focused on final correctness can lead to the development of surprisingly sophisticated behaviors. Instead of trying to engineer every aspect of agent behavior, we define success metrics and allow the system to discover optimal pathways.
This approach produces AI systems that:
- Dynamically decide which capabilities to employ based on the specific situation
- Optimize resource utilization by only deploying expensive operations when necessary
- Adaptively respond to novel scenarios without explicit reprogramming
- Develop emergent strategies that human engineers might not have considered
The Path Forward
As we continue to advance the state of the art in agentic AI, our focus remains on developing systems that exhibit true orchestration intelligence rather than merely following predefined workflows. By combining graph based knowledge structures with reinforcement learning approaches to capability orchestration, we're creating AI agents that can:
- Autonomously decide when different capabilities would be valuable
- Effectively integrate information from multiple sources while handling contradictions
- Learn from experience to improve orchestration strategies over time
- Adapt to novel scenarios without requiring explicit reprogramming
Contact our team to learn how Data2 can help your organization move beyond hardcoded AI workflows to true agentic systems that deliver robust, adaptive intelligence for your most challenging use cases.
Jon Brewton automated workflows guided by graphs will be key to deploying "agents" or whatever we want to call them into meaningful production.