Models and Agents
Generated with LinkedIn Design

Models and Agents

Foundation models continue to get better. While we knew that models improve as a function of size, compute and data, recent open sourced DeepSeek model proved the benefit of engineering strategies - including compute efficient model architecture, efficient key values storing, hyperparameter engineering to avoid mixture of experts model collapse, predicting multiple tokens, RL encoded in training data to get the model to 'think' as it works through the answer, efficient training via mixed precision framework, HW engineering including cross-chip communication and optimized communication schedule. While these methods were independently published in various papers, compelling engineering helped to create a model that combines benefits for impressive results across benchmarks. We continue to see accelerated releases of foundation models that are diverse in model size, training cost, inferencing cost and capabilities. This is promising as diverse models will help us address use cases of varied RoI targets and model commoditization will accelerate Gen AI adoption.

Given the capabilities of model, adoption is based on benefits realized in application layer. One of the promising paths for substantial value delivery is via Gen AI Agents. To help us explore what Agents can do, let us first define what Agents are. Say:

Agents are digital entities which can autonomously accomplish goals via context understanding, planning, reasoning, acting, reflecting, collaborating and utilizing tools at disposal as required.

In this definition, the key agent capabilities are context understanding, planning, reasoning, acting, reflecting, collaborating and utilization of tools. And, these concepts are loosely termed as "intelligence" of agents. If Agents possess these capabilities, in theory, it will have the ability achieve the goals as per context autonomously. Note that I specifically skipped memory, speed of information retrieval and ability to develop new content via recombination given models excel at them.

Agents today have varying degrees of these "intelligence" with none possessing all these in general context. Let us delve into each of these "intelligence" components.

Context Understanding: Agents predominantly understand context based on what the models learned during pretraining and the information provided during configuration and via prompt. Most enterprise contexts are novel for models as they weren't exposed during pretraining. While design of Agents provide a structure for providing context during configuration and at the time of specific inferencing call, there are limitations to how much of context can be provided, how they can be persisted to improve Agents via continuous learning and feedback over time. For accurate understanding of context, which is critical for acceptable behavior of Agents, we are forced to divide complex enterprise contexts into smaller, easily describable modules. This turns the design of Agents into a pipeline with AI, Gen AI and non-AI components.

Planning: Current Agents don't plan as in traditional sense. But, they can learn specific steps involved and their order for a goal based on pretraining data, in context examples, detailed definition of tools at disposal and other instructions. While this learning is feasible, Agents can't develop repeatable, consistent and accurate plan every single time. Therefore, we must include methods to review the plan, challenge its elements and help the Agents to refine the plan for accuracy. This forces us to limit the planning landscape of Agents so that we can incorporate automatic evaluation.

Reasoning and Reflecting: Effective Agents must accommodate new situations, understand uncertainties, evaluate resultant risks from those uncertainties to determine course of action. Models don't reason or reflect in traditional sense as they lack multiple concepts including mentalizing, understanding and alignment of values and vulnerabilities, mutual agreement of common purpose and vested interest to explore consequence, risk and severity of uncertainties. Currently, we claim to have reasoning when the model can evaluate the accuracy of its work defined or executed thus far. This is very limited as this type of reasoning is primarily learned via frameworks which limits range of uncertainties Agents can understand and determine mitigating actions. To ensure consistent, repeatable, accurate reasoning and reflecting to adjust course of plan and action, Agents need substantial definition of possible scenarios, potential outcomes and its effect on goal. Therefore, except for the very well defined goals with ample clarity, true reasoning and reflecting need to be offloaded to Human-In-Loop (HIL).

Acting and Utilizing Tools: Current Agents can act on user's behalf by calling functions, APIs etc. Complexity of the task can cause ambiguity which can result in less consistent and accurate actions. Therefore, modularizing action horizon to help Agents pick the right set of actions in the right order is key. Current Agent deployments overcome this with HIL review prior to action. Every HIL step in Agent pipeline results in collaboration of human and Agent therefore, results in complexities detailed in collaboration section below.?

Collaboration: Agents can interact with each other and with human. Current models can support multi modal collaboration for information retrieval and for specific well defined task automation. For effective collaboration with HIL, Agents must explain their work to help HIL understand, assess results, reason, reflect and course correct agents. This collaboration is quite nuanced and far richer than sharing information. At the minimum, Agents needs to understand what HIL sees, communicates while ensuring the differences in communication modalities and speed between Agents and HIL aren't causing inefficiency resulting in errors and frustration. More importantly, for continued collaboration, Agents must learn on the go and work with HIL to identify which new learnings should be retained and utilized in the future. There is much work to be done to achieve such Agents.

Given the current state of "intelligence", we can't yet drop an Agent in an enterprise environment and expect it to act autonomously as per our prior definition. Therefore, say, we define Agents as follows:

Agents are digital entities that accomplish goals by distilling into steps, execute and evaluate them using AI, Gen AI models and tools it has access to and, based on their knowledge, instructions and in collaboration with HIL.

Agents that satisfy this new definition are feasible today provided they are designed with detailed configuration and effective HIL collaboration for their success. There are plenty of use cases that gain substantially with such Agents. What goals will we consider for such Agents, how will we configure, teach and collaborate with them? That is the focus of next post.

Terry Miles

A Product Approach to Organizational Change and Business Transformation

1 个月

Viji, a question regarding reviews of plans agents make. In the context of such reviews, do the agents themselves offer self-assessments, any explanations of why they chose the route they took? Viji, great post!

Krishna Gopal

Lead Consultant - Data & AI (Retail, CPG & Travel), Global | Consumer Business Group | Transformation Partner - Cloud, Data engg, AI/ML, Devops | Cloud & Data Solutions Architect

1 个月

Very good perspective - Enterprise context is novel, and Reasoning & Reflection to be offloaded to human in the loop.

Very well-articulated. Looking forward for your next post on this subject.

Shyam Santhanam

Product Management | Alumni @StanfordGSB | Oracle Cloud

1 个月

Practical insights for building agentic applications. Thanks Viji Krishnamurthy Ph.D.

要查看或添加评论,请登录

Viji Krishnamurthy Ph.D.的更多文章

  • Enterprise Agents

    Enterprise Agents

    Apps run enterprises. Users predominantly interact with Apps via UX, chatbot and code/API.

    23 条评论
  • LLM: What it is and more importantly, it isn’t?

    LLM: What it is and more importantly, it isn’t?

    Language is a unique human trait as while it is not genetically coded, the defined structure of language, namely…

    6 条评论
  • One Among Many?

    One Among Many?

    One of the intriguing interpretations in Quantum Mechanics is the existence of multiverse that enables physical…

    1 条评论
  • Where are we with Web3?

    Where are we with Web3?

    There is a lot happening related to Web3. In case you are catching up like I am, this is a summary of what it is, goals…

    2 条评论
  • Bharati to Matvei

    Bharati to Matvei

    Growing up in Tamil Nadu, it is impossible to miss Subramania Bharati - great poet, independence and women rights…

    5 条评论
  • What we make it to be

    What we make it to be

    It is well known that we humans are social animals. We have always related to the small intimate community we live in.

    2 条评论
  • Oracle IoT Meetup - Year 2018

    Oracle IoT Meetup - Year 2018

    Here is a recap of Oracle IoT meetup events in 2018. We began the year with hosting ProxToMe where Carlo Capello and…

    1 条评论
  • Where do we go from here?

    Where do we go from here?

    Where do we go from here? This is the question on evolution many ask in various terms namely, are we still evolving?…

    1 条评论
  • In the world of machines, what will you do?

    In the world of machines, what will you do?

    Living in the tech bubble of bay area, it is hard to miss the achievements of machines. There is plenty of coverage on…

    3 条评论
  • Meetup: Oracle IoT, AR, ProxToMe

    Meetup: Oracle IoT, AR, ProxToMe

    Come and see us at Oracle IoT meetup. We will demonstrate how Oracle IoT Cloud extends Supply Chains applications with…

社区洞察

其他会员也浏览了