Phased Approach | Why most Gen AI experiments never hit production
Richard Skinner
CEO @ PhasedAI | Helping Enterprise Transform Operations with Generative AI
Welcome to another edition of Phased Approach where we try to break down the challenges of introducing working AI models and techniques into your business.
Let's go ??
FACING THE CHALLENGES
Why do most Generative AI experiments stay in the "friend zone"
In our reading of some recent reports from the likes of Deloitte and PWC we see that many companies are putting massive investments into generative AI projects.
And yet you might be surprised by...
A recent survey reveals a startling statistic: 68% of organizations have moved 30% or fewer of their GenAI experiments into production. That’s right—nearly three-quarters of companies are stuck in the proof-of-concept stage. So, what’s stopping them from moving forward?
So this week we will discuss what are the barriers preventing these experiments from going live and what techniques and tools are needed to improve this for your organization.
Managing Risk in a non-deterministic world
Getting some promising results when running tests on Generative AI products can be very gratifying. Many companies are now building internal tools that they are having employees use almost like an internal beta that they hope will eventually be rolled out to external users.
Why though do only a third of projects make it to production?
Well we work with a lot of companies that are in the same boat. Lets discuss some of the issues and how we might begin to solve them.
Quality Assurance and Testing:
The problem: In software development we are used to deterministic output. In other words, you query the same database once or a hundred times you get the same output. This can be tested with a simple automated test in you CI/CD pipeline.
What can you do : Hopefully you caught our article a couple of weeks ago on How to evaluate AI Output, if not please go and check it out. But in short you need to use a mixture Evals, Human generated benchmarks, Synthetic data and humans in the loop and add these processes to your CI/CD pipeline. This is a new type of QA that is not always deterministic. You have to be comfortable with that and manage the risk. This is a major reason stakeholders are afraid to push the button on projects going live.
Regulatory and Data Governance
Despite the growing adoption of GenAI, a mere 23% of organisations feel equipped to manage the associated risks. As this VentureBeat article highlights this concern, noting that with the introduction of regulations like the EU AI Act, many companies are finding themselves in uncharted waters. As one executive aptly put it, “You need a strategy to handle risks, and most companies just aren’t there yet.”
The issue is finding an expert to be able to catigorize your AI tool or application and then being able to comply with the EU Act and the other regulations that will follow.
What is needed is a tool that will automatically check for compliance at the CI/CD pipeline layer as well as live monitoring for things like personally identifiable information (PII) and other restricted data.
Model testing and Management
Models keep getting better and changing. In most applications we want to use the cheapest fastest model possible that will get the work done such as Claude Haiku or GPT-4o Mini. Sometimes we need the big guns like Sonnet 3.5 or we might have a custom Llama model for certain tasks. We need to be able to test these models, test our existing prompts against new ones and measure output evaluation with benchmarks.
Tools like Claude Workbench help evaluate models and prompts, unfortunately this does not work with models not created by anthropic. New evaluation tools are going to be needed to tackle this problem long term.
领英推荐
Monitoring and auditing
We need to have audit logs of all output generated by models so that should there be any safety or legal questions we can see, how was the output generated, what was the model, prompt version, user etc. Currently this has to be done with custom implementation that provide this kind of audit.
Live monitoring is possible for metrics like toxicity, bias and hallucination using frameworks like DeepEval but a lot of human customization is still needed for these. Here is an interesting article that covers some of these.
Data management
Data is another critical hurdle. GenAI relies on vast amounts of data, but data quality, privacy, and security issues are causing 55% of organisations to avoid certain use cases altogether. The fear of mishandling sensitive data is real, and it’s preventing companies from fully leveraging their AI capabilities.
Where do we begin?
Avoiding High-Risk Tools and Use Cases When it comes to GenAI, one of the first steps many organisations are taking is to carefully select the tools and use cases they engage with. To avoid potential regulatory headaches, it’s wise to steer clear of use cases that might attract additional scrutiny. Some companies are even going a step further, restricting access to specific GenAI tools for their staff to mitigate risk altogether.
Minimising Data Exposure For organisations that heavily depend on their intellectual property, caution is key when it comes to using GenAI models. The risks of exposing sensitive data are significant, so it's crucial to establish strict guidelines to ensure that staff don’t inadvertently input organizational data into public large language models (LLMs). This approach helps protect proprietary information while still leveraging the benefits of GenAI.
Leveraging Technology for Control Another effective strategy is to invest in custom solutions within the company’s technology stack. This allows for greater control over how GenAI is deployed and used. Additionally, building ‘walled gardens’ in private cloud environments, equipped with safeguards, can prevent data from leaking into the public cloud, thereby maintaining a secure operational perimeter.
Building Robust Frameworks Developing strong, comprehensive frameworks that integrate compliance, risk management, and privacy considerations is essential. These frameworks serve as the backbone for safely scaling GenAI within an organisation, ensuring that risks are proactively managed and that the company stays compliant with all relevant regulations.
Managing Regulatory Uncertainty Given the rapidly evolving regulatory landscape, it's important to collaborate with partners to create ecosystem solutions that ensure compliance across the board. Organisations must also prepare to address multiple regulations simultaneously, creating a flexible strategy that can adapt as new laws and guidelines come into play.
The Need for Tools That Do More
Given these challenges, it’s clear that organisations need better tools—tools that can manage the entire lifecycle of a GenAI project. From rigorous testing and validation to comprehensive risk management and regulatory compliance, these platforms need to cover all the bases.
One major sticking point is measuring value. Without clear metrics and KPIs, it’s tough to justify the continued investment in GenAI projects. According to the PWC, 41% of organisations are struggling to define and measure the impact of their GenAI initiatives. This lack of clarity can lead to a loss of momentum and, ultimately, stalled projects.
But it’s not all bad news. The technology to manage these challenges is out there and is becoming more usable. We’re seeing the emergence of more sophisticated tools that help organisations bridge the gap between potential and performance. These platforms offer centralised governance, robust data management, and integrated risk assessment—everything you need to take your GenAI projects from the lab to the real world.
Moving from Potential to Performance
The potential of GenAI is enormous, but realising that potential requires more than just enthusiasm. It requires a strategic approach, the right tools, and a willingness to tackle the complexities of scaling AI across an enterprise.
Organisations that succeed with GenAI will be those that not only embrace the technology but also take a proactive stance on managing its risks. This means moving beyond the proof-of-concept phase and investing in the tools and frameworks that will help us realise the full potential of GenAI.
ANNOUNCEMENTS
Are you looking to move an experiment out of the POC zone?
Join us for a free web event on September 27th for companies that want to learn about tools, techniques and frameworks to move Generative AI experiments into working live applications safely. You will learn about
Places are limited, reserve yours now!