The State of Generative AI in the Enterprise: Experimentation, Adoption, and the ROI Question
Image by Andreea Ch

The State of Generative AI in the Enterprise: Experimentation, Adoption, and the ROI Question

Artificial intelligence in the enterprise is undergoing a fundamental transformation. The rise of generative AI and large language models (LLMs) marks a significant shift from traditional, rule-based systems to probabilistic ones. This change presents both unprecedented opportunities and complex challenges for enterprises seeking to implement AI solutions at scale.


The Paradigm Shift: From Deterministic to Probabilistic Systems

Historically, enterprise software and applications have relied on deterministic systems with hard-coded rules guiding their behavior. These systems offered predictability and consistency but lacked flexibility and the ability to handle novel situations.

A prime example of a deterministic system is the traditional spam filter in email applications. These filters typically use a set of predefined rules to categorize incoming messages as spam or legitimate. For instance, an email might be flagged as spam if it contains certain keywords, comes from a blacklisted sender, or has a suspicious subject line. While effective for known patterns, this approach struggles with new, sophisticated spam tactics.

The advent of probabilistic AI systems has represented a significant shift in how applications function. In the case of email spam filtering, modern 'traditional' AI-powered systems use machine learning algorithms to analyze patterns in large datasets of emails. Instead of relying solely on fixed rules, these systems calculate the probability of an email being spam based on numerous factors, adapting to new patterns over time.

This shift allows for more nuanced and adaptive spam detection, offering improved flexibility in the face of evolving spam tactics. However, it also changes the nature of the system's decision-making process. While traditional rule-based filters could make consistent decisions based on predefined criteria, AI-powered filters make probabilistic judgments that can vary based on learned patterns. This means an AI-powered spam filter might occasionally categorize emails differently than a human would expect, either flagging a legitimate email or allowing a well-crafted spam message through.

This illustrates a key difference between deterministic and probabilistic systems: the trade-off between rigid, predefined behavior and flexible, adaptive decision-making. Neither approach is infallible, but they present different strengths and challenges in managing complex, evolving problems like spam detection.


Accessibility and Its Implications

The proliferation and accessibility of Generative AI models and tools (which are probabilistic systems) have led to widespread experimentation with the technology across industries and enterprises. Organizations can quickly generate seemingly coherent and useful outputs with relatively low effort through various means:

  1. Model providers like OpenAI, Anthropic, and Google Cloud provide easy-to-use APIs.
  2. Open-source models are available on platforms like Hugging Face, which can be fine-tuned or deployed with minimal setup.
  3. No-code or low-code platforms that allow non-technical users to interact with and implement AI models.

This ease of accessibility has spurred enterprise GenAI adoption. According to a recent McKinsey survey, 65% of respondents report that their organizations regularly use generative AI, nearly double the percentage from just ten months prior in 2023. This rapid increase highlights the excitement around these technologies' transformative potential.

However, this ease of initial implementation belies the complexities of deploying these systems in production environments.

According to recently reported data:

  • Only 14% of enterprises have moved beyond the proof-of-concept stage to development. (O'Reilly, 2023)
  • Less than 10% of companies are true AI-first "innovators" with integrated strategies for cost reduction and unlocking generative AI's potential through GenAI-enabled product design and development. (Mckinsey, 2024)
  • A mere 2% of generative AI proofs-of-concept successfully transition to production. (Benhamou Global Ventures internal benchmarks, 2024)

These figures highlight the significant gap between experimentation and production-ready AI solutions. The core challenge lies not in generating model responses and getting started using and experimenting with Genrative AI in your applications, but in ensuring that model outputs are consistently reliable, relevant, useful, and safe in real-world applications/environments.


Managing Scale: Production Challenges

As AI solutions mature and their adoption grows, managing scaling workloads becomes increasingly complex. Enterprises must balance the necessity of high performance and accuracy with practical demands for cost-efficiency while ensuring consistent and reliable outputs. This challenge is particularly acute in mission-critical applications where the consequences of errors can be severe in terms of both financial and reputational repercussions.

The probabilistic nature of generative AI introduces several key challenges when scaling to production:

  1. Output Variability: Unlike deterministic systems, probabilistic AI may produce different outputs for the same or similar inputs across multiple runs. This variability can be problematic for industries and applications requiring strict consistency, such as legal document generation or financial reporting.
  2. Reliability: It is crucial to ensure the system avoids hallucinations and consistently produces accurate and trustworthy results. As the scale of operations increases, maintaining high reliability becomes more challenging, especially when dealing with edge cases or unexpected inputs.
  3. Safety: Preventing the generation of harmful, biased, or inappropriate content is a critical concern. As the volume of interactions increases, the risk of encountering edge cases that might trigger unsafe outputs also grows.
  4. Explainability: Understanding and explaining how the model arrives at its outputs becomes more complex at scale. This challenge is a significant issue for regulatory compliance, user trust, and system improvement.
  5. Performance at Scale: Maintaining low latency and high throughput as the number of requests increases is a significant technical challenge. This often requires sophisticated infrastructure and optimization techniques.
  6. Cost Management: Balancing the computational resources required for high-quality outputs with budget constraints becomes more challenging as usage grows. This is especially true for models that require significant computing power.

These challenges are exacerbated in complex, multi-step workflows and agentic systems, where errors can compound and lead to unexpected outcomes. As organizations move from proof-of-concept to large-scale deployment, addressing these challenges becomes crucial for the successful implementation of AI systems in production environments.


Orchestration: Managing System Complexity

To address these challenges, many enterprises are turning to orchestration layers. These act as routing mechanisms, coordinating various AI components and ensuring they work together seamlessly. Effective orchestration is crucial for managing the complexity of AI workflows, especially in agentic systems that involve chaining, multi-step task execution, and reasoning.

Recent research has proposed innovative approaches and frameworks to enhance the capabilities and efficiency of LLMs through sophisticated orchestration, including:

  1. RouteLLM: This innovative framework seeks to address the challenge of balancing cost and performance when deploying LLMs. Key features include a routing system that determines which LLM should handle each query based on cost and capability. Preference data is utilized to train routers, allowing the system to learn which queries can be handled by less powerful models. Significant cost reductions (up to 85% on some benchmarks) are achieved while maintaining 95% of GPT-4's performance.
  2. Mixture of Agents (MOA): This approach proposes a collaborative method to leverage the strengths of multiple LLMs: A layered architecture where each layer comprises multiple LLM agents. An iterative refinement process where agents in each layer use outputs from the previous layer to generate improved responses. A unique categorization method for models is used at each layer, as the authors classify models into "proposers" (tasked with generating initial responses) and "aggregators" (synthesizing responses). The authors achieved superior performance compared to SOTA models on benchmarks like AlpacaEval 2.0, surpassing GPT-4 Omni, while offering a more cost-effective approach that resulted in high-quality results at reasonable costs.


Diagram from the 'Route LLM' study

These research directions highlight the potential for orchestration layers to not only manage complexity but also to optimize performance and cost-efficiency in AI systems. However, effective orchestration is just one piece of the puzzle. To truly harness the capability of Generative AI in enterprise applications, organizations need to consider not just how they coordinate various components of their AI systems, but also how they structure the overall decision-making process of their AI systems.


Cognitive Architectures: Architecting Intelligence

While orchestration focuses on coordinating the various components of an AI system, the cognitive architecture, or the system architecture of an LLM-based business application, exhibits and is responsible for determining how the application/system processes information and makes decisions. As highlighted by Langchain CEO Harrison Chase, "Cognitive architecture is how your system thinks — in other words, the flow of code/prompts/LLM calls that takes user input and performs actions or generates a response."

Understanding and designing appropriate cognitive architectures for different applications and software is crucial for successful AI implementation, as it allows enterprises and teams to tailor their AI systems to specific business needs and use cases. Chase outlines several levels of cognitive architectures, each with increasing complexity and flexibility:

  1. Single LLM call: GenAI applications with basic chatbot functionality often fall into this category.
  2. Chain of LLM calls: More complex systems breaking problems into steps, such as advanced RAG (Retrieval-Augmented Generation) pipelines.
  3. Routers: Introduces decision-making about which actions and steps to take and under what circumstances, adding unpredictability.
  4. State machine: Combines routing with loops, allowing for more complex behaviors.
  5. Autonomous agent: The most flexible but also the most unpredictable and error-prone, with a model or multiple models working in unison towards a single defined objective, at different steps deciding which actions/steps to execute, what instructions to follow in order to retrieve the relevant information required to return a high-quality response, and what to do in case of a failure to generate a response.


Harrison Chase's Levels of Application Autonomy


Choosing the appropriate cognitive architecture for a specific business use case is critical to successful AI implementation. It involves balancing the need for flexibility and creativity with the requirement for predictability and control. As organizations move from simpler to more complex architectures, they gain in capability but also face increased challenges in ensuring reliability and maintaining oversight.

The interplay between orchestration and cognitive architecture is critical. While orchestration manages the coordination of AI components, the chosen cognitive architecture determines how these components interact and make decisions. Together, they form the backbone of an enterprise AI application or system, enabling it to handle complex tasks while remaining aligned with business objectives and constraints.


AI ROI, The Trillion Dollar Question

The tech industry needs time to collectively innovate, experiment, and develop widely accepted cognitive architectures, system optimizations, and best practices for GenAI to truly start yielding returns in the magnitude that previous revolutionary technologies have.

The probabilistic nature of AI models has also given rise to growing concerns and questions about the potential return on investment (ROI) of the ever-increasing and significant investments that are flowing into all layers of the AI stack. David Cahn's Sequoia Capital article "AI's $600B Question" estimates that over $1 trillion will be spent on AI capex in the coming years, necessitating $600 billion in annual revenue to justify these investments.

This $600 billion breaks down as:

  • $100 billion: Expected new AI-related revenue from major tech companies such as Microsoft, Google, Amazon, etc.
  • $500 billion: The gap or "hole" that needs to be filled by additional AI-related revenue from the broader AI ecosystem.

Meanwhile, a recently released 31-page Goldman Sachs report titled "Gen AI: Too Much Spend, Too Little Benefit?" presents distinctly contrasting views on AI's potential benefits and economic impact:

  • Joseph Briggs projects a 9.2% increase in US productivity (total factor productivity, TFP) and a 6.1% boost in GDP growth over the next decade due to the pervasive use and widespread adoption of Generative AI.
  • MIT professor Daron Acemoglu forecasts a mere 0.53% increase in TFP, translating to a mere 0.9% GDP impact over the next ten years.

This stark contrast in projections underscores the uncertainty surrounding AI's economic impact and the challenges in accurately predicting its future value.

The nondeterministic and probabilistic nature of AI models is a key factor driving this narrative of potentially delayed or diminished returns among investors and enterprises. The risk of models behaving unpredictably in production environments, potentially leading to harmful or unwanted outputs when interacting with users and customers, cannot be eliminated entirely today. This uncertainty, coupled with the high training and operational costs of state-of-the-art models for SOTA models, makes the economics of AI implementation particularly challenging.

However, it's crucial to view these challenges in the context of historical technological revolutions. The adoption of general-purpose technologies often follows a pattern of initial excitement, followed by a (often lengthy) period of adjustment, adaptation, and refinement before widespread productive use. Historical examples illustrate the prolonged time frames involved in the widespread adoption of these technologies:

  1. Steam Engine: It took approximately 100-150 years from when James Watt patented his improved steam engine design in 1769 to widespread industrial adoption of steam power in the mid-to-late 19th century.
  2. Electricity: From Benjamin Franklin's experiments with lightning and electricity in 1752 to widespread household adoption in the 1940s-1950s, the timeline spanned roughly 200 years, with about 80-100 years from the invention of the practical light bulb to ubiquity in US households.
  3. Internet: Even this more recent transformative technology required about 30-40 years from the establishment of ARPANET in 1969 to become truly common in homes and businesses in the early 2000s.

It's important to note the sheer length of these adoption periods. While Generative AI is likely to be adopted more quickly due to our interconnected global economy and rapid pace of information dissemination and consumption, historical trends suggest that a period of at least a decade is a reasonable expectation for the true adoption and scaling of such a revolutionary technology.

We are merely two years into the generative AI revolution spurred by the launch of ChatGPT in November of 2022. While immediate ROI may not yet be apparent, the learnings from current experimentations and proofs-of-concept will likely have compounding benefits in the future. The industry needs time to collectively innovate, experiment, and develop widely accepted cognitive architectures, system optimizations, and best practices for GenAI to truly start yielding returns in the magnitude that previous revolutionary technologies have.


Conclusion: Harnessing Potential, Mitigating Risks

The shift to probabilistic AI systems, particularly generative models, represents a fundamental change in how enterprises approach application development and deployment. While these systems offer unprecedented capabilities in terms of intelligence, creativity, and reasoning, they also introduce new challenges in reliability, consistency, and control.

Most advancements and innovations in AI are now focused on harnessing the inherent creativity and intelligence within foundation models while delivering their capabilities in a more reliable, controlled fashion. This balance is crucial for moving AI applications from experimental pilot phases into production environments that can provide the opportunity to deliver real business value.

Success in this new paradigm requires:

  1. A deep understanding of both the potential and limitations of probabilistic AI systems
  2. Investment in robust orchestration and cognitive architectures
  3. Continuous innovation in deployment and operational strategies
  4. Careful consideration of new user experiences and interaction patterns
  5. Realistic expectations about ROI and economic impact

As we collectively navigate this complex landscape, it's important to maintain a long-term perspective. The adoption of transformative technologies often takes longer than initially expected, but their impact can be profound and far-reaching. The hyper-scale companies financing the AI/GenAI revolution have the requisite fortress balance sheets and cash flows to sustain a few years of subpar returns, betting on the long-term benefits of increased productivity and efficiency.

For enterprises embarking on AI initiatives, the focus should be on building foundational capabilities, gathering insights from current experiments, applying lessons learned, rapid iteration, and preparing for the eventual maturation of the technology. While the challenges are significant, the potential for transformation is immense.

The AI revolution is still in its early stages, and like the general-purpose technologies that came before it, its full impact will likely unfold over decades rather than years. Patience, perseverance, and continuous innovation will be key as we work towards realizing the full potential of this transformative technology.

Iván Martínez Callado

SAP Consultant @ KPMG | International Computer Engineering & BBA @ La Salle Campus Barcelona

8 个月

Super interesting. Thanks for sharing!!

Alexander Pileggi

Senior Project Manager

8 个月

Nice write-up and thank you for sharing! The large differences in amount of organizations who say they use Generative AI (65%) vs. actuals of PoC (14%) / Innovators (10%) highlight to me the fact that a lot of large enterprises still use these terms like GenAI, LLM's, etc. as buzzwords and can't truly find use cases for such a technology. And it's a good point that you brought up next - what is the challenge to deploy these at scale, whether it's reliability of the predictive responses, data security, cost mgmt, or could it be that the use case may not match what the tech provides? When every company is using the same interchangeable terms to boost their share price and keep up with the competition, it will be interesting to see what technologies are adopted as opposed to tech that has been around for years i.e. automation, algorithms, predictive analytics, etc that may just be masked as "the next big thing"

要查看或添加评论,请登录

Dipankar Nakarmi的更多文章

社区洞察

其他会员也浏览了