登录查看更多内容

Agentic AI evaluation framework

Ravishankar N

Senior Product Manager at Microsoft

发布日期: 2025年1月20日

Disclaimer: This article represents my thoughts alone as an individual and not as a representative of my employer or their business direction. This guidance is provided without any implicit or explicit promises/guarantees. Please exercise caution while using the content in this article and use your own due diligence. This article was envisioned and created by me, with refinements made using Generative AI

What is Agentic AI?

There is a lot of hype around Agentic AI from the foundation model makers and software makers alike. There are many definitions and interpretations of Agentic AI, so let's first define what Agentic AI means for the purpose of this article. "AI enabled applications that can take inputs in one or more forms, built to achieve specific goals using a range inputs and environmental variables."

In plain speak, Agentic AI are apps that are built to achieve the intended goals with minimal or no supervision (own agency) and with a large array of input and environmental variables by refining them, reasoning over them, generating steps to solve them and executing the tasks.

All the foundation model creators have touted this as the year of Agentic AI and software makers are gearing up for the wave of customers wanting to build Agentic AI to replace their apps.

Why should businesses build Agentic AI?

With the ability to self-reason any given goal/objective, break it up into smaller steps, execute the steps and iteratively come up with a solution, Agentic AI does present exciting opportunity for companies to tackle areas such as web scraping, knowledge search, monitoring, reporting generation from data sources and many more areas. As you can see, these are typical areas where the type and number of inputs can be large and defining traditional way of computing those input and environmental combinations become laborious and cumbersome, which are ideally suited for generative AI based agents.

But should you though?

Not all use cases need tackling large variety of inputs and environmental factor combinations, thus requiring generative AI type processing. Many of the day-to-day tasks are deterministic a.k.a well-defined set of inputs, processes and outputs, thus Agentic AI would either be not a good investment and resources or simply wasteful.

While it is tempting to rip and replace all the deterministic apps a.k.a traditional apps in use today in favor of Agentic AI, businesses need to stop and take stock of the below factors:

Cost of rebuilding
Time to rebuild
Effort to reintegrate with process and people training
Integration with existing ecosystem of apps and services
Life stage of the app
Return on Invesment and
Upside to rebuilding - Does it bring value to my business and users?

Agentic AI evaluation framework

This framework proposes an evaluation framework based on various factors listed below to determine if you should replace existing apps with Agentic AI. The framework can and should be adapted and extended to specific business task(s)/app requirements and the level of details needed as part of the assessment.

The assessment topics in this framework should be evaluated against a range of one till 100, with one being the most suited for Agentic AI rebuilding, while 100 being the least suited.
The suggested weights - How much each assessment dimension topic contributes to the overall assessment
Rating ranges - For each assessment dimension, I recommend using steps of 20 for recording the variance for a task or app.
Factor and Indicator scores - We consider a list of factors that influence the decision to rebuild an app. These factors are scored between One and Five

Core Assessment Dimensions

Input variability (Suggested weight - 20%) (A)

Rate how variable or unpredictable the inputs to the system are:

--> 1-20: Highly variable inputs (unstructured text, complex user requests)

--> 21-40: Moderately variable inputs with some patterns

--> 41-60: Mix of structured and unstructured inputs

--> 61-80: Mostly structured inputs with occasional variations

--> 81-100: Completely structured, predictable inputs

2. Process complexity (Suggested weight - 25%) (B)

Evaluate the complexity of decision-making and processing required:

--> 1-20: Complex reasoning with multiple decision paths

--> 21-40: Multiple interconnected processes with some ambiguity

--> 41-60: Mix of straightforward and complex processes

--> 61-80: Mostly straightforward processes with few variations

--> 81-100: Simple, linear processes with clear rules

3. Human Interaction Requirements (Suggested weight: 15%) (C)

Assess the level and complexity of human interaction needed:

--> 1-20: Continuous dialogue and complex interactions

--> 21-40: Regular interactions with context understanding

--> 41-60: Periodic interactions with clear objectives

--> 61-80: Minimal interactions with structured inputs

--> 81-100: No human interaction needed

4. External Dependencies (Suggested weight: 15%) (D)

Evaluate the system's reliance on external factors:

--> 1-20: Multiple dynamic external dependencies

--> 21-40: Several semi-stable external dependencies

--> 41-60: Mix of stable and dynamic dependencies

--> 61-80: Few, well-defined external dependencies

--> 81-100: No external dependencies

5. Error Tolerance (Suggested weight: 25%) (E)

Assess the system's tolerance for errors and variations:

--> 1-20: High tolerance, approximate results acceptable

--> 21-40: Moderate tolerance, minor variations acceptable

--> 41-60: Mixed requirements for precision

领英推荐

Prepare your business for enterprise agentic AI now.

Aquent 1 个月前

The Reality of Agentic AI—Hype vs. Practical Adoption

Tripp Smith 1 个月前

Agentic AI: Exploring the Benefits and Challenges of…

Dr Dave G. 2 个月前

--> 61-80: Low tolerance, few errors acceptable

--> 81-100: Zero tolerance, exact results required

Implementation Considerations

Cost Factors Assessment

Effort involved in converting the apps/task, which translates to time, man hours, expertise and resources (Ex: GPU) used in the process.

1. Development Complexity Score (1-5) (F): Scoring based on the rebuilding effort involved

?? --> 1: Simple conversion

?? --> 3: Moderate redesign

?? --> 5: Complete rebuild

2. Training Data Requirements (1-5) (G): Scoring on the effort involved to train the Agentic AI

?? --> 1: Minimal data needed

?? --> 3: Moderate data collection required

?? --> 5: Extensive data collection needed

3. Integration Complexity (1-5) (H): Scoring on the effort involved integrating Agentic AI with other systems

?? --> 1: Standalone system

?? --> 3: Moderate integration needs

?? --> 5: Complex integration requirements

ROI Indicators

Scoring on Return on investment (ROI) for rebuilding the app/task

1. Automation Potential (1-5) (I):

?? --> 1: Minimal automation gains

?? --> 3: Moderate efficiency improvements

?? --> 5: Significant automation potential

2. Maintenance Requirements (1-5) (J):

?? --> 1: Low maintenance

?? --> 3: Moderate maintenance

?? --> 5: High maintenance needs

Scoring Formula

This final section calculates the scores to predict the viability of converting the app/task

Primary Score Calculation:

Prim. Score = (A* 0.20) + (B * 0.25) + (C * 0.15) + (D * 0.15) + (E * 0.25)

Implementation Feasibility Score:

Impl. Score = (F + G +H) /3

ROI Score:

ROI Score = (I * 2 - J) / 3

Interpretation Guide

Below is the recommendation guidance based on the scores

Implementation Recommendations:

1. If Primary Score < 40 and ROI_Score > 3:

?? - Proceed with Agentic AI implementation

?? - Expected autonomy level: 70-90%

2. If Primary Score 40-60 and ROI_Score > 3:

?? - Consider hybrid approach by using a mix of Agent and traditional app

?? - Expected autonomy level: 40-70%

3. If Primary Score > 60 or ROI_Score < 3:

?? - Maintain traditional application

?? - Expected autonomy level: < 40%

Phani Kumar

Data Scientist | ML/Ai Engineer|Aws

2 个月

Hello Ravishankar, Beautiful.... enjoyed through out till end, really great blog.

要查看或添加评论，请登录

Ravishankar N的更多文章

The Real Challenges of AI Adoption

2025年2月8日

The Real Challenges of AI Adoption

Across industries, companies are racing to integrate AI—whether by laying the groundwork for innovation, building…
Bing + ChatGPT - Let’s talk productivity

2023年2月27日

Bing + ChatGPT - Let’s talk productivity

Disclaimer: This article reflects my opinion and mine alone, not that of Microsoft or Bing or OpenAI, don’t treat this…
Azure Security Services - A primer

2018年10月2日

Azure Security Services - A primer

Azure cloud services are undoubtedly one of the most cutting edge cloud services in the market, so naturally customers…
Azure Stack Misconceptions and Confusions!

2018年7月17日

Azure Stack Misconceptions and Confusions!

I wanted to write a quick post on some of the common misconceptions that customers have when it comes to Azure Stack as…

5 条评论
Lessons from field: Azure Stack services integration - Beyond the basics!

2018年7月5日

Lessons from field: Azure Stack services integration - Beyond the basics!

Azure stack is a advanced "private cloud as a appliance" that helps customers use and reap benefits from day one…

1 条评论
Selling Azure Stack to Public & Financial Sectors

2018年7月2日

Selling Azure Stack to Public & Financial Sectors

As Azure stack sees wider adoption, more and more government and financial institutes are evaluating the solution for…

2 条评论
Microsoft Ignite - Software Defined X Sessions

2017年10月2日

Microsoft Ignite - Software Defined X Sessions

With wrapping up of 2017's installment of Microsoft Ignite, Microsoft has made available all the sessions for on-demand…
Project Honolulu in 1-2-3!

2017年9月28日

Project Honolulu in 1-2-3!

This week has an avalanche of new anouncements from Ignite 2017 happening right now in Florida. Anouncements range from…
Part 3 - Designing a Microsoft SDDC - Software Defined Networking (SDN)

2017年7月11日

Part 3 - Designing a Microsoft SDDC - Software Defined Networking (SDN)

After a long hiatus, here is the third and final installment of this blog series. You can catch up on the previous…
RDMA – Datacenter Bridging – SMB Direct & Live Migration - Putting all together!

2017年4月22日

RDMA – Datacenter Bridging – SMB Direct & Live Migration - Putting all together!

Advantages of RDMA in a datacenter environment is pretty straightforward and simple, provides low latency, lossless…

3 条评论

See all articles

Agentic AI evaluation framework

Ravishankar N

Senior Product Manager at Microsoft

What is Agentic AI?

Why should businesses build Agentic AI?

But should you though?

Agentic AI evaluation framework

Core Assessment Dimensions

领英推荐

Cost Factors Assessment

ROI Indicators

Scoring Formula

Interpretation Guide

Ravishankar N的更多文章

社区洞察

其他会员也浏览了

The AI Contribution Rating System (AICRS): A Framework to Measure AI Involvement

GPTs and business process industrialization

AI Agents, Beyond the Hype

Changing the Game for Enterprise AI Adoption with Elloe's AutoRAG

DeepSeek-V2 R1: Unveiling the Unknown in AI Innovation

Agentic Workflow: All You Need To Know About Building AI Agents

2025 Agentic AI Outlook

Unleashing the Future: How Different Agentic Design Patterns Are Revolutionizing Autonomy & Intelligence

AI Transformation Excellence & Latest AI Tools ??

Go From “Just Another Chatbot” to an AI That Thinks. Avoid These Common Mistakes while Building an AI Agent

What is Agentic AI?

Why should businesses build Agentic AI?

But should you though?

Agentic AI evaluation framework

Core Assessment Dimensions

领英推荐

Cost Factors Assessment

ROI Indicators

Scoring Formula

Interpretation Guide

Ravishankar N的更多文章

The Real Challenges of AI Adoption

Bing + ChatGPT - Let’s talk productivity

Azure Security Services - A primer

Azure Stack Misconceptions and Confusions!

Lessons from field: Azure Stack services integration - Beyond the basics!

Selling Azure Stack to Public & Financial Sectors

Microsoft Ignite - Software Defined X Sessions

Project Honolulu in 1-2-3!

Part 3 - Designing a Microsoft SDDC - Software Defined Networking (SDN)

RDMA – Datacenter Bridging – SMB Direct & Live Migration - Putting all together!

社区洞察

其他会员也浏览了

The AI Contribution Rating System (AICRS): A Framework to Measure AI Involvement

GPTs and business process industrialization

AI Agents, Beyond the Hype

Changing the Game for Enterprise AI Adoption with Elloe's AutoRAG

DeepSeek-V2 R1: Unveiling the Unknown in AI Innovation

Agentic Workflow: All You Need To Know About Building AI Agents

2025 Agentic AI Outlook

Unleashing the Future: How Different Agentic Design Patterns Are Revolutionizing Autonomy & Intelligence

AI Transformation Excellence & Latest AI Tools ??

Go From “Just Another Chatbot” to an AI That Thinks. Avoid These Common Mistakes while Building an AI Agent