登录查看更多内容

Phased Approach | Why most Gen AI experiments never hit production

Richard Skinner

CEO @ PhasedAI | Helping Enterprise Transform Operations with Generative AI

发布日期: 2024年8月27日

Welcome to another edition of Phased Approach where we try to break down the challenges of introducing working AI models and techniques into your business.

Why most GenAI Experiments are not getting to production
Webinar series : POC to Production - Tools and methods

Let's go ??

FACING THE CHALLENGES

Why do most Generative AI experiments stay in the "friend zone"

In our reading of some recent reports from the likes of Deloitte and PWC we see that many companies are putting massive investments into generative AI projects.

And yet you might be surprised by...

A recent survey reveals a startling statistic: 68% of organizations have moved 30% or fewer of their GenAI experiments into production. That’s right—nearly three-quarters of companies are stuck in the proof-of-concept stage. So, what’s stopping them from moving forward?

So this week we will discuss what are the barriers preventing these experiments from going live and what techniques and tools are needed to improve this for your organization.

Managing Risk in a non-deterministic world

Getting some promising results when running tests on Generative AI products can be very gratifying. Many companies are now building internal tools that they are having employees use almost like an internal beta that they hope will eventually be rolled out to external users.

Why though do only a third of projects make it to production?

Well we work with a lot of companies that are in the same boat. Lets discuss some of the issues and how we might begin to solve them.

Quality Assurance and Testing:

The problem: In software development we are used to deterministic output. In other words, you query the same database once or a hundred times you get the same output. This can be tested with a simple automated test in you CI/CD pipeline.

What can you do : Hopefully you caught our article a couple of weeks ago on How to evaluate AI Output, if not please go and check it out. But in short you need to use a mixture Evals, Human generated benchmarks, Synthetic data and humans in the loop and add these processes to your CI/CD pipeline. This is a new type of QA that is not always deterministic. You have to be comfortable with that and manage the risk. This is a major reason stakeholders are afraid to push the button on projects going live.

Regulatory and Data Governance

Despite the growing adoption of GenAI, a mere 23% of organisations feel equipped to manage the associated risks. As this VentureBeat article highlights this concern, noting that with the introduction of regulations like the EU AI Act, many companies are finding themselves in uncharted waters. As one executive aptly put it, “You need a strategy to handle risks, and most companies just aren’t there yet.”

The issue is finding an expert to be able to catigorize your AI tool or application and then being able to comply with the EU Act and the other regulations that will follow.

What is needed is a tool that will automatically check for compliance at the CI/CD pipeline layer as well as live monitoring for things like personally identifiable information (PII) and other restricted data.

Model testing and Management

Models keep getting better and changing. In most applications we want to use the cheapest fastest model possible that will get the work done such as Claude Haiku or GPT-4o Mini. Sometimes we need the big guns like Sonnet 3.5 or we might have a custom Llama model for certain tasks. We need to be able to test these models, test our existing prompts against new ones and measure output evaluation with benchmarks.

Tools like Claude Workbench help evaluate models and prompts, unfortunately this does not work with models not created by anthropic. New evaluation tools are going to be needed to tackle this problem long term.

Tristan FREDERICK 4 个月前

AI and Scaling

Fernando Lucini 6 年前

Save my AI project with the 12 thresholds of an AI…

Frederic Thys 1 年前

Monitoring and auditing

We need to have audit logs of all output generated by models so that should there be any safety or legal questions we can see, how was the output generated, what was the model, prompt version, user etc. Currently this has to be done with custom implementation that provide this kind of audit.

Live monitoring is possible for metrics like toxicity, bias and hallucination using frameworks like DeepEval but a lot of human customization is still needed for these. Here is an interesting article that covers some of these.

Data management

Data is another critical hurdle. GenAI relies on vast amounts of data, but data quality, privacy, and security issues are causing 55% of organisations to avoid certain use cases altogether. The fear of mishandling sensitive data is real, and it’s preventing companies from fully leveraging their AI capabilities.

Where do we begin?

Avoiding High-Risk Tools and Use Cases When it comes to GenAI, one of the first steps many organisations are taking is to carefully select the tools and use cases they engage with. To avoid potential regulatory headaches, it’s wise to steer clear of use cases that might attract additional scrutiny. Some companies are even going a step further, restricting access to specific GenAI tools for their staff to mitigate risk altogether.

Minimising Data Exposure For organisations that heavily depend on their intellectual property, caution is key when it comes to using GenAI models. The risks of exposing sensitive data are significant, so it's crucial to establish strict guidelines to ensure that staff don’t inadvertently input organizational data into public large language models (LLMs). This approach helps protect proprietary information while still leveraging the benefits of GenAI.

Leveraging Technology for Control Another effective strategy is to invest in custom solutions within the company’s technology stack. This allows for greater control over how GenAI is deployed and used. Additionally, building ‘walled gardens’ in private cloud environments, equipped with safeguards, can prevent data from leaking into the public cloud, thereby maintaining a secure operational perimeter.

Building Robust Frameworks Developing strong, comprehensive frameworks that integrate compliance, risk management, and privacy considerations is essential. These frameworks serve as the backbone for safely scaling GenAI within an organisation, ensuring that risks are proactively managed and that the company stays compliant with all relevant regulations.

Managing Regulatory Uncertainty Given the rapidly evolving regulatory landscape, it's important to collaborate with partners to create ecosystem solutions that ensure compliance across the board. Organisations must also prepare to address multiple regulations simultaneously, creating a flexible strategy that can adapt as new laws and guidelines come into play.

The Need for Tools That Do More

Given these challenges, it’s clear that organisations need better tools—tools that can manage the entire lifecycle of a GenAI project. From rigorous testing and validation to comprehensive risk management and regulatory compliance, these platforms need to cover all the bases.

One major sticking point is measuring value. Without clear metrics and KPIs, it’s tough to justify the continued investment in GenAI projects. According to the PWC, 41% of organisations are struggling to define and measure the impact of their GenAI initiatives. This lack of clarity can lead to a loss of momentum and, ultimately, stalled projects.

But it’s not all bad news. The technology to manage these challenges is out there and is becoming more usable. We’re seeing the emergence of more sophisticated tools that help organisations bridge the gap between potential and performance. These platforms offer centralised governance, robust data management, and integrated risk assessment—everything you need to take your GenAI projects from the lab to the real world.

Moving from Potential to Performance

The potential of GenAI is enormous, but realising that potential requires more than just enthusiasm. It requires a strategic approach, the right tools, and a willingness to tackle the complexities of scaling AI across an enterprise.

Organisations that succeed with GenAI will be those that not only embrace the technology but also take a proactive stance on managing its risks. This means moving beyond the proof-of-concept phase and investing in the tools and frameworks that will help us realise the full potential of GenAI.

ANNOUNCEMENTS

Are you looking to move an experiment out of the POC zone?

Join us for a free web event on September 27th for companies that want to learn about tools, techniques and frameworks to move Generative AI experiments into working live applications safely. You will learn about

Quality Assurance
Model Management
Data and regulatory governance

Places are limited, reserve yours now!

Reserve your place

Phased Approach | Why most Gen AI experiments never hit production

Richard Skinner

CEO @ PhasedAI | Helping Enterprise Transform Operations with Generative AI

Why do most Generative AI experiments stay in the "friend zone"

Managing Risk in a non-deterministic world

领英推荐

Where do we begin?

The Need for Tools That Do More

Moving from Potential to Performance

Are you looking to move an experiment out of the POC zone?

Phased Approach

319 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

The key to generative AI? Humans.

How do you harness the power of generative AI? Let me count the ways

AI Transformation Excellence & Latest AI Tools ??

Redefining Competitive Advantage: Generative AI and the Erosion of Traditional Business Moats

The Digital Alchemist: Harnessing Business Agility with Generative AI

March Modernization Madness

Beyond the Hype: Steering Your Enterprise Through the Generative AI Revolution

AI in practice - what businesses need, and how to get it.

The Dance of Bits and Bytes: Harmonizing Automation and AI Strategies

The 2024 AI Dilemma: Choosing the Right Generative AI Tools Amidst Rapid Market Evolution

Why do most Generative AI experiments stay in the "friend zone"

Managing Risk in a non-deterministic world

领英推荐

Where do we begin?

The Need for Tools That Do More

Moving from Potential to Performance

Are you looking to move an experiment out of the POC zone?

Phased Approach

319 位关注者

Phased Approach | Why Trustworthy AI Matters for Your Business

2024年10月6日

Phased Approach | NotebookLM a surprisingly good tool Made by Google

2024年10月1日

Phased Approach | Maturing your AI Operations - A Quick Guide

2024年9月23日

Phased Approach | Generative AI - Thinking fast and Slow

2024年9月15日

Phased Approach | How do we Evaluate Generative AI?

2024年9月2日

Phased Approach | Reports of the death of RAG has been greatly exaggerated

2024年8月18日

Phased Approach | How to Evaluate AI Output

2024年8月5日

Phased Approach | Are Open Source models finally ready for business use?

2024年7月28日

Phased Approach | Are Mini Models Good for Business?

2024年7月21日

Phased Approach | Claude 3.5 : Why should you care?

2024年7月7日

社区洞察

其他会员也浏览了

The key to generative AI? Humans.

How do you harness the power of generative AI? Let me count the ways

AI Transformation Excellence & Latest AI Tools ??

Redefining Competitive Advantage: Generative AI and the Erosion of Traditional Business Moats

The Digital Alchemist: Harnessing Business Agility with Generative AI

March Modernization Madness

Beyond the Hype: Steering Your Enterprise Through the Generative AI Revolution

AI in practice - what businesses need, and how to get it.

The Dance of Bits and Bytes: Harmonizing Automation and AI Strategies

The 2024 AI Dilemma: Choosing the Right Generative AI Tools Amidst Rapid Market Evolution