Part 1: The Complete Guide to Applying Generative Artificial Intelligence in Organizations, Key Performance Indicators (KPIs), and the TBRV Model
Generative Artificial Intelligence KPIs (Key Performance Indicator) evaluation metrics for organizations; TBRV model? 2024 Dr. Michal G. Carmi

Part 1: The Complete Guide to Applying Generative Artificial Intelligence in Organizations, Key Performance Indicators (KPIs), and the TBRV Model

An Introduction to Accelerating the Adoption of Generative Artificial Intelligence Capabilities

- Part 1 -

This guide introduces the methodology for implementing generative artificial intelligence within organizations and enterprises. It outlines the Key Performance Indicators (KPIs) specific to tasks involving this technology and introduces the TBRV model. This article includes the first part, which details the unique KPI (Key Performance Indicator) metrics for tasks related to generative artificial intelligence and presents the TBRV model – the newest and most comprehensive framework yet for identifying all organizational influence circles impacted by generative artificial intelligence, along with the associated KPIs in each circle.

The organizational artificial generative transformation is a planned initiative scheduled for an 18-month implementation period. The objective is to establish a strategic hybrid ecosystem that seamlessly integrates Generative Artificial Intelligence (GenAI) tools with existing human and technological infrastructures, thereby enhancing organizational leverage across all fronts.

Background

Generative AI represents an innovative subset of artificial intelligence, leveraging advanced machine learning techniques, particularly deep learning, to enable models to autonomously generate new and original content across all forms and media of human output. This field marks a new era in AI, characterized by its remarkable creative capabilities. Generative AI can produce content, analytics, and other outputs with human-like quality and on a vast scale, while also offering additional multimodal and technical functionalities. Generative intelligence facilitates the use of intelligent and autonomous agents at varying levels. These capabilities present both opportunities and challenges. Among the challenges are issues of credibility and 'hallucinations', along with operational dilemmas. Additionally, there is the important question of how to establish the necessary measures of responsibility required for its use and operation, as well as how to develop the key performance indicators. The spectrum of benefits and risks is enormous and will continue to expand as generative artificial intelligence becomes the elaborative and mediative layer in both individual and organizational interactions with the world. Moreover, as this technology evolves to be more effective and human-like, its impact is expected to become increasingly significant.

Leading and harnessing the full potential of generative AI, and integrating it into the organizational ecosystem, will be carried out according to the foundational principles of the TBRV model, which is presented here for the first time. This approach includes the entire spectrum of Generative KPIs, representing the broadest range of these indicators. It's important to emphasize that the application of these KPIs should be as versatile as composing building blocks, tailored to the unique nature of each task. The TBRV model not only guides organizations in implementing generative technology but also steers them through a learning journey that cultivates a culture of generative literacy. This journey entails the responsible adoption and usage of GenAI tools, recognizing their potential for leverage and innovation. It also involves navigating this significant paradigm shift with both enthusiasm and caution.

Generative Quality Metrics: KPIs for Gen AI

The process of evaluating the success of specific activities that involve the use of GenAI tools within an organization entails monitoring and assessing them through performance indicators. However, the determination of these indicators must be grounded in a thorough understanding to accurately define the necessary criteria and considerations. To underscore this point - the creation of metrics should not be based solely on functionality but also on a comprehensive grasp of the immense transformative process underway. Such awareness is crucial for developing metrics that align with the unique characteristics of generative artificial intelligence. This will enable the creation of relevant metrics for each case, be they quantitative or qualitative, while guided by the overarching objective of incrementally leading the organization toward a hybrid future of human-machine intelligence.

The principles

1.????? In tasks and pilots involving generative intelligence, performance can be measured using the organization's standard Key Performance Indicators (KPIs), with the addition and emphasis on unique indicators specific to intelligence and generative hybridity. The special characteristics of generative AI necessitate customized KPIs to ensure efficiency and quality.

2.???? It should be emphasized that some indicators are comparative in nature. That is, it is necessary to quantify the output of the generative agent by weighting the human agent. Given that the level of automation in the generative agency is not yet absolute, the output is essentially from a hybrid operation. There is a wide array of indicators and parameters for the generative agency such as quantity, quality, time, cost, exposure, risk, reliability, performance over time, dependence on an external/internal party, leverage, contribution to decision-making, value creation, impact on the human work, and more, as will be elaborated on below.

3.???? Another aspect to consider is that these tools are based on learning, and capable of self-improvement or improvement through interaction with human agents. It is necessary, therefore, to establish a means of measuring the learning curve, both generative and human/organizational, and act through iteration to optimize the integrated activity.

4.???? Since this tool can be both reflective and communicative, incorporating monitoring indicators as part of its task enables continuous and rapid adjustments, facilitating on-the-fly performance tuning. This approach will serve as a quality foundation for the next pilot, allowing for continuous improvement driven by real-time feedback and processing.

5.???? Developing metrics capable of identifying toxic data is essential. Generally, this involves monitoring beyond the standard endpoints of application interfaces to detect drift.

6.???? Quality assessment based on comparative experiments in a generative environment: The experimental interface can and preferably should be based on past tasks. Conducting pilots in a virtual environment of tasks with a complete human process history, including their evaluation metrics, is crucial. Running these tasks under generative execution, supervised by an experimenter simulating human activity, will enhance the agility and insights of the learning process.

7.???? The generative agent as an organizational quality controller: Utilizing generative intelligence to assess the performance of all the organization's tasks.

TBRV Model - The Four Circles of Influence for KPIs in Generative AI

The TBRV model provides an overarching perspective for understanding the full range of GenAI's organizational influence circles, offering a bird's-eye view of the terrain and the boundaries of the sector. This model should act as the strategic map, guiding the development of tactics for establishing specific indicators based on the nature of the project, and under the relevance of the influence circle one intends to, or can, evaluate.

TBRV Model –

The Four Circles of Influence for KPIs in Generative AI

Generative Artificial Intelligence KPIs (Key Performance Indicator) evaluation metrics for organizations/enterprises/business; TBRV model? 2024 Dr. Michal G. Carmi

The GenAI Performance Indicators are based on the standard quality indicators, supplemented with unique metrics and adjustments for generative artificial intelligence, and address the following areas:

1.???? Task Execution

Task Space: The specific indicators for the task space include an assessment of the performance quality and the achievement of the concrete task objectives, for tasks that are carried out through generative artificial intelligence and involve human participation to varying extents.

2.??? Scalability, Consistency, and Continuity

The Scalability and Continuity Space: Indicators in this domain are designed to provide a broader perspective beyond a particular task's results. They aim to grasp measurement that can provide some insights that might go beyond the mere assessment of specific tasks. These indicators are means for gaining a wider understanding of generative AI tools' capabilities and execution, extending past the narrow measurement of individual tasks. The focus of these indicators is on understanding the model’s range of capabilities, its wider applicability, and gaining insights into the spectrum of its potential uses as well as its limitations.

3.??? Exposure and Risk

One of the most notable dangers associated with generative artificial intelligence lies in its unique risks. These risks are distinct and potentially significant due to the advanced and autonomous nature of generative AI systems. The indicators in this domain are specifically developed to gauge and measure these risks, providing a clearer understanding of the level of exposure. This involves assessing the potential for unintended consequences, the reliability of generated outputs, and the security implications inherent in deploying such AI systems. Through these indicators, stakeholders can better navigate the complexities and mitigate the risks associated with generative AI technologies.

4.??? Proactive Meaning Agency

The central value of GenAI should lie in enhancing the various means of an organization's path to success, including productivity, profitability, innovation, and organizational culture. Accordingly, the metrics must evaluate GenAI's impact on operational channels and human capital, determining the success in addressing challenges and gauging the effects of these successes or failures on both the organization and its employees.

The TBRV MODEL features a range of indicators in each of its segments, as will be detailed below. While these indicators primarily reflect the underlying thought process and approach, they are designed to be adaptable. Customizing these indicators to fit the specific context and requirements of a project is essential for an accurate and relevant assessment. Therefore, when selecting specific indicators for a project, it may be necessary to adjust or even develop bespoke indicators suited to the project's content domain. The TBRV model should be utilized as a flexible guiding framework, not as a rigid checklist, to effectively address the unique challenges and opportunities of each project. This flexibility is crucial in ensuring the model's efficacy across different content domains and project types.

?

Performance within the task Space

I.??? Task

Task Execution

?1.????? Comparability: This involves the effective utilization of Task Performance Indicators (TPIs) for assessing tasks, along with the measurement and evaluation of hybrid output in relation to human output, facilitating a comprehensive comparison and analysis between hybrid and human-driven results.

2.???? Degree of the Human Agency Involvement: The integration not only involves a combination of absolute and comparative metrics for hybrid activities (GenAI agency + Human agency), but it also requires a set of indicators pertaining to the human dimension. These indicators should cover its scope, contribution, and the extent and type of its required future involvement.

3.???? ROI Cost/Benefit Analysis with the total costs associated with work time: This process entails not only understanding resource consumption but also determining the total costs associated with work time. It should include accounting for the task team's time, evaluating any reduction in human labor hours, ascertaining if time is being reallocated elsewhere as a result, and identifying any impact on additional work time in potentially affected areas, such as programming, legal, and other departments.

4.???? The Completion Rate (CR): Measures the proportion of successfully completed activities compared to the total number of initiated activities. It forms a part of the fundamental triad of quantity, time, and value, and is specifically quantified as the percentage of tasks that are actually completed.

5.???? Latency: The latency indicator refers to the response time, scope, parallel processing capabilities, distribution infrastructure, and availability. It includes the duration required to produce a response, as well as the time from data entry to the production of the final output, such as a comprehensive sales or financial report.

6.???? Spectrum of hallucinations: This entails characterizing the hallucinations or errors and classifying them into various measurement categories, along with monitoring their distinct characteristics and frequency. It involves tracking processes that facilitate their reduction and ensuring consistent control over the accuracy range.

7.???? Sensitivity and Textuality Indices: These include metrics like the F1 Score, which measures the precision and recall of the model's outputs in comparison to the true results. This evaluation encompasses various aspects, such as the accuracy of data extraction. Additionally, other metrics can be used to assess the model's abilities in summarization, lexical diversity, and fluency, as well as its linguistic and conceptual coherence, and understanding of context.

8.???? Cost of Defective Output: The 'sorting and disposal of waste' cost – the cumulative cost of dealing with irrelevant outputs.

9.???? CSAT and User Engagement: Feedback from end-customers outside and within the organization (value and user experience), plus CSAT (Customer Satisfaction Score) regarding usefulness and relevance, the NPS (Net Promoter Score), and user engagement.

10.? Conversion Rate: Measuring the percentage of users who perform the desired actions after interacting with the output.

??

Performance in the scalability and continuity space

II.??? Beyond

Performance in the Scalability and Continuity Space

?1.????? Consistency: Is there consistency in knowledge extraction and performance? What is the degree of persistence or variability in the observed level and pace of performance?"

2.???? Limitations: Identifying the tool's limitations, usability gaps throughout the product/task lifecycle, and monitoring its changes and development.

3.???? Tool Quality: The quality of the LLM used in this task: level of information/conclusions/summaries/analysis, decision-making process, degree of information filtering, and user-friendliness in terms of source reflection. Measuring the accuracy and quality of outputs, such as assessing grammar, coherence, and relevance of the text, the number of tokens generated, the range of possibilities presented in the analysis, and more.

4.???? Mechanism for Pseudo-Human Improvement: Language models are continuously learning. For instance, in the case of analysts, the goal is to transform the generative agent into an 'analyst on steroids'. However, for this to occur, human analysts must effectively imprint themselves onto the model. Ultimately, the expectation is to produce an output as 'human' as possible, empowered by the computational and phenomenal processing capabilities of the generative agency. Therefore, it is necessary to measure whether this is happening during the project, and whether the human input of those involved is expressed in a way that allows for the improvement of the model through 'internalization', understanding, and identification of the generative intelligent agent with the human model that operates and guides it.

5.???? Scalability: It is also necessary to assess the model's scalability in its ability to process and evaluate systems of various types and scopes, incorporating real-time data from various sources, etc.

6.???? Flexibility: The ability of this integrated artificial-generative and human intelligence system, to cope with varying scopes of demand and to adapt to different types of tasks or inputs, as well as the capability to respond to evolving task requirements and changing field conditions.

7.???? Model Prompting Index: This measures the frequency, duration of conversations, number of interactions within each conversation, instances of repetition, abandonment rates, and user satisfaction.

8.???? User Prompting Index: This involves monitoring the quality of prompts from the users' perspective. It assesses not only the quality of conversations generated by the model but also the effectiveness of the prompts provided by the users.

9.???? Initiative, Creativity, and Generative Innovation Index: This index measures the model's self-initiated contribution to innovative outputs, problem identification, and solutions, etc. For tasks requiring creative output, such as graphic design or content creation, the indicator could be the level of innovation or uniqueness in the results produced.

10.? Integration: To ensure optimization, an assessment must be conducted on how well the hybrid model integrates with the technologies, workflows, and internal processes.

11.?? Multimodality: Does the model feature multimodality, and is the human agent making effective and appropriate use of it? Concurrently, it can be examined whether there are additional domains and activities within the enterprise where these modalities can be applied.

?

?Performance in the exposure and risk space

? III.??? Risk

Performance in the Exposure and Risk Space

?1.????? Risk and Exposure Circles: Exposure/Risk/Safety. It is necessary to develop indicators for direct or indirect risk and damage circles. Additionally, in cases where compliance with government standards and any other ethical or legal requirements is required, measuring the outputs' compliance with these directives is needed.

2.???? Real-Time Response: The development and management of the task should be under continuous monitoring. The goal is, of course, for ongoing improvement, but it should also serve as a mechanism to minimize damages and manage risks, allowing for immediate response in case of failure, collapse, hostility, or any type of disruption.

3.???? Human-Oriented Reliability Indicators: While it challenges the traditional view of man-machine differentiation, it's essential to approach GenAI based on its functionality rather than its composition and internal processing. Consequently, this tool should be evaluated with a mindset akin to assessing job candidates and employees. This comparison, though seemingly radical, is somewhat analogous to the criteria used by the Secret Service in hiring agents, focusing on levels of secrecy and ethics. Therefore, the evaluation should highlight the strengths and weaknesses of the language model used, its susceptibility to toxic prompts, and its openness to unethical requests.

4.???? Dependency Indicator: This involves assessing the extent of dependency on both external and internal factors, considering their availability and reliability. If system integration is required, it is essential to identify the specific dependencies involved, determine who or what the system relies on, how these dependencies function, and understand the potential implications. Such a comprehensive approach aims to provide a nuanced understanding of dependencies within a system or process.

5.???? Bias Indicator: Models often contain inherent biases that can affect the accuracy of their outputs. These biases might be based on factors like gender, skin color, or other demographic and sociocultural elements. It's important to identify and address these biases to ensure fair and accurate results.

6.???? Data Contamination: An indicator to assess the quality of base data is also essential. This is crucial for monitoring issues such as data drift, contamination, sudden changes, or toxicity in the dataset.

7.???? Cyber Resilience: It is also essential to have an indicator that assesses the model's resilience in the face of hostile cyber-attacks.

?

Performance in the Proactive Meaning Agency space

IV.??? Meaning

Performance in the Proactive Meaning Agency Space

?1.???? Creating Value for the Organization:

1.1 The impact on 'end-of-path' outcomes: When employing GenAI for organizational business intelligence (BI), value creation for the organization should manifest in the impact on 'end-of-path' outcomes, namely the influence on decision-making. The contribution should enhance decision-making based on a foundation of knowledge, analysis, insights, innovation, speed, and efficiency, significantly surpassing what is achievable with only a human base.

1.2 Access to Untapped Knowledge: Utilizing generative AI to 'melt the knowledge iceberg' involves addressing the concealed, inaccessible, and unprocessed knowledge within every organization. GenAI has the necessary capabilities to manage this knowledge effectively, owing to its proficiency in processing vast data sets and conducting sophisticated operations. Measuring the achievements of these tools is crucial to monitor the extent of 'lost' knowledge that GenAI recovers and processes. In a similar vein, GenAI can be employed to 'Uncover the Iceberg of Ignorance', providing senior executives with an accurate, synthesized view of the fundamental challenges confronting the organization. The insights should be presented in a user-friendly manner and continuously updated by the generative AI.

1.3 Generative KPI Agent: Utilizing GenAI as a KPI tool for task assessment involves establishing metrics that are utilized by the GenAI tools. This generative agent would function as an organizational performance quality overseer, steering the control processes for ongoing tasks. With the potential to offer unbiased and objective performance evaluations, the generative agent's key advantage is its ability to conduct real-time quality control, facilitating immediate assessments that pinpoint issues and guide the way to their resolution and improvement.

1.4 Generative Workflow Streams: The objective is to create an organization where generative intelligence is deeply integrated, enriching all its channels of activity and operations. Therefore, it is essential to measure how extensively generative practices are adopted within the organization's workflows.

2.??? Creating Value for Individual Employees and the Workforce as a Whole:

2.1? The Meaningfulness Indicator: This tool is designed to assess and monitor the effectiveness of generative agency work as a source of meaningful impact on employees. This indicator should evaluate and track the effectiveness of generative agency work for employees, as a contributor to meaningful experiences and the actual change in the scope and quality of tasks as well as the job description itself. It actually focuses on the extent to which the GenAI contribution is empowering employees and leveraging their skills for more complex and challenging tasks, thereby enhancing their overall contribution to the system. Such an outcome not only fosters a stronger sense of connection and commitment but also cultivates a feeling of significance and achievement among the workforce.

2.2 Dispersion: Alongside its contributions, the generative agency also poses certain downsides, particularly regarding human capital, and they must be acknowledged and measured. The GenAI introduces management challenges, such as employees potentially becoming unfocused and dispersed within the model; such as the risk of employees feeling threatened by the possibility of the intelligent generative agent 'taking over' their roles; furthermore, there's the concern of a 'Columbia Effect,' where reliance on technological formats, like the PowerPoint format that contributed to overlooking crucial warnings in the Columbia space shuttle disaster, leads to erosion of critical thinking and responsiveness. These are only some of the various challenges concerning human capital that need to be monitored and addressed.

3.??? Creating Value for Organizational Generativity (Implementing the Intelligent Transformation):

3.1? Transitioning from a Digital to an Intelligent Organization: The macro-level indicators should not only address the outcomes of individual projects and the degree of success in tasks but also measure the extent and pace at which the organization is transforming its system. This transformation is from one largely based on digital and artificial intelligence to one rooted in reliable and responsible generative intelligence. The goal is to develop an organization whose system operates with partial autonomy and balanced hybridity alongside human agency. As an analogy, one can draw parallels to the transition to paperless processes in the 1980s, where the pace of progress was measured by the internal organizational shift from physical components and processes to digital ones.

3.2 Organizational Learning: Monitoring the progress of organizational generative learning and the informed, responsible implementation of generative agents is essential. This process includes tracking the performance, strengths, and weaknesses of each major language model or neural network used. The goal is to identify the most suitable and effective models that align with the organizational DNA and optimize performance.

4.???? Reflection: Integrating metrics into the task definitions and the ongoing output of the project means that one of the generative agent's responsibilities is to proactively engage in continuous reflection and its reporting. Given that generative intelligence is communicative in nature, reflection can be incorporated throughout the entire project lifecycle, including routine operations, accompanied by continuous optimization.

5.???? Agility and Iterativity: As a general rule, managing projects that involve GenAI should embrace an agile and iterative approach. This is essential due to the real-time dynamics introduced by the generative agency. Accordingly, there's a need for metrics that align the characteristics and pace of generative intelligence with those of human management. A continuous iterative approach is crucial, where ongoing reflection and feedback contribute to fine-tuning each iteration throughout the project's lifespan. This methodology is particularly significant for ongoing or future projects within the organization. In this context, it's also vital to establish reference points for performance assessment and necessary adjustments during the project's lifecycle.

6.???? Strategic Advantage: Alongside other macro-indicators, a holistic assessment is essential. This assessment should gauge the contribution to long-term strategic objectives and the creation of a competitive edge, a goal also pursued by rivals seeking to leverage these powerful tools. It should evaluate innovations in tasks or areas previously less accessible with older technology and assess the potential for significant value creation.

?

?

?


要查看或添加评论,请登录

社区洞察

其他会员也浏览了