Generative AI Deployment Strategies
Without the right strategy, you may become obsolete. Photo by Author David E. Sweenor

Generative AI Deployment Strategies

A Strategic Guide for CIOs and CTOs

Introduction

As organizations continue to tinker with generative AI, they face with a critical question: how can they best deploy the technology to achieve scale and return on investment (ROI) as quickly as possible? Given the sheer pace of innovation, CIOs and CTOs need to think carefully about the many available options. Key questions include: do you build or buy? What are the costs and benefits of each deployment approach? How long will it take? Before beginning any AI project, two critical items must be addressed: 1) make sure that your organization is crystal clear on what enterprise workflows will be supported by the technology, and 2) have an AI governance plan in place. See my article, The Generative AI Hammer. Is Everything a Nail for additional insights on selecting the right use cases.[1]

Assuming that the use cases and AI governance framework are defined, an organization can consider several deployment approaches. These approaches include:

  1. Use off-the-shelf products and services
  2. Use APIs to integrate with existing enterprise applications
  3. Implement retrieval augmented generation (RAG)
  4. Fine-tune an existing foundation model (FM)
  5. Do-it-yourself (DIY) and build a custom FM from scratch

Three of these were covered in my previous article, Generative AI’s Force Multiplier: Your Data.[2] This article extends the initial three and adds a decision framework.

To select the right approach, technology leaders must first understand and catalog the differences between each deployment approach so that they aren't locked into the approaches advised by their vendors. CIOs and CTOs should also analyze the strengths and cautions of each approach, align use cases with these approaches, and understand how they fit with the organization’s existing enterprise architecture to make objective decisions on a use-case-by-case basis.

Deployment Strategies for Generative AI

Organizations should create a decision framework for choosing the right deployment strategy when considering the various deployment options. Factors include, but are not limited to, the use cases, costs, organizational aptitude, security and privacy, governance (e.g., control of model output), and implementation simplicity.

Off-the-shelf Products and Services

The simplest, most straightforward strategy is to buy a commercial solution with generative AI baked in. Purchasing a ready-made solution is easier to deploy and has low or no fixed costs, allowing organizations to experiment with generative AI with minimal investment. However, with this approach, flexibility is sacrificed. When you buy a pre-packaged solution, you “get what you get” out of the box. Although it’s possible to customize some of these solutions, it ups the complexity and effort required to maintain them. McKinsey estimates that this costs between $0.5M and $2.0M with a recurring annual cost of $0.5M.[3] With off-the-shelf prepackaged solutions, you also have less control over security and data privacy risks and need to accept the application provider's security and data protection controls–many even offer indemnification clauses within their contracts, but do they really have your back if something goes amok?

  • Strengths: Easy to deploy, low/no fixed costs, seamless integration with existing workflows
  • Cautions: Limited flexibility, less control over security and data privacy

Use APIs to Integrate with Existing Enterprise Applications

Integrating pre-built generative AI models via APIs with existing enterprise applications is easier to implement. It has lower fixed costs since organizations pay only for model usage and not for its training. Please note that providers like OpenAI charge for both the inputs and outputs. When this approach is used, many companies embed one-shot or few-shot learning examples within the prompt–improving its accuracy for the task at hand. However, when this approach is used, you may run into context window limits–although these are generally increasing.?

  • Strengths: Lower costs, faster time-to-market, clever prompt engineering can improve accuracy, the model can be used across a variety of different use cases.
  • Cautions: Context window limitations, need to support each enterprise app integration

Extending Generative AI Models via Data Retrieval

RAG enables organizations to add their proprietary organizational data to the responses without fine-tuning or training a FM. This method improves the accuracy and quality of model responses by augmenting the model's knowledge base with your organization's data–without requiring the creation of custom models. However, similar to the API approach, a RAG approach is limited by the context window of the generative model, constraining the amount of retrieved information that can be sent to the model. For real-time use cases, the extra retrieval step to augment the prompt increases latency, limiting its viability.

  • Strengths: Ability to incorporate organizational context and domain-specific data, improved accuracy, and reduced hallucinations
  • Cautions: Increased complexity, additional technology like vector databases, potential latency issues, and the need to ensure sensitive organizational data isn’t leaked like Samsung discovered.[4]

Fine-tuning Existing FMs

The fourth approach involves fine-tuning a FM to incorporate additional domain knowledge or improve performance on specific tasks. This method results in custom models dedicated to the organization, improving performance and reducing hallucinations. However, the cost of using a fine-tuned model (inference cost) can be significant. McKinsey estimates this would cost between $2M to $10M, with a $0.5M to $1M recurring annual maintenance budget. Also, fine-tuning on top of a given foundation model may compromise safety.[5]

  • Strengths: Improved performance and lower data requirements compared to building from scratch
  • Cautions: Potential cost and flexibility trade-offs, safety concerns

DIY, Building Custom Foundation Models from Scratch

The last approach involves organizations building their own FMs from scratch, fully customizing them to their data and business domains. This method offers the potential for the highest accuracy, complete control over training datasets and model parameters, and competitive differentiation. However, training and maintaining a large, generative AI model is insanely expensive and can only be afforded by the largest organizations. McKinsey calculates that this costs somewhere between $5M to $200M for an initial build with a $1M to $5M recurring fee. The models will also need to be regularly updated which also increases costs.

  • Strengths: Highest potential for accuracy and control, competitive differentiation
  • Cautions: Exhorbant costs, access to AI talent, specialized infrastructure, and innovations may render your model and approach obsolete.

In the end, with the exception of the DIY approach, most companies will combine these approaches across their organization.

Deployment Decision Framework

To choose the right deployment approach, organizations should consider factors such as use case, costs, organizational knowledge, security and privacy, governance, and implementation simplicity.

  1. Costs Off-the-shelf products and API integrations have no training costs, but incur usage/inference costs. Prompt engineering only incurs inference costs when modifying prompts. Data retrieval increases inference costs by adding more data to the prompts. Fine-tuning costs can vary widely based on the size of the model. Building a model from scratch is the most expensive approach.
  2. Organizational Context (Data) Using APIs to integrate with existing enterprise apps, RAG, fine-tuning, and DIY offer the best ways to inject organizational or domain-specific knowledge. Domain-trained models and SaaS applications can also bring general-purpose models closer to organizational needs.
  3. Ability to Control Security and Privacy DIY and fine-tuning provide stronger ownership of assets and more flexibility in implementing security and privacy controls. Understand the shared responsibility model with providers and have an audit process to evaluate their security. See my article Future-Proof Your IT: The CIO’s Guide to Generative AI Vendor Selection for more details.[6]
  4. Governance and Model Output Control Consuming "as-is" models may not be suitable for highly regulated environments due to hallucination risk and leaking of sensitive information. RAG, fine-tuning, and DIY offer more control over model output quality and accuracy.
  5. Implementation Simplicity API integrations have advantages in terms of simplicity and time-to-market. DIY, fine-tuning, and RAG require new technology components, offer more flexibility, but can be expensive. As you move from off-the-shelf to DIY, the number of skills and resources needed dramatically increases.

Now that we have examined the deployment strategies and decision framework, what can you do?

Practical Advice and Next Steps

  • Assess your organization's use cases and AI governance framework before beginning any AI project. Make sure your organization is clear on what enterprise workflows will be supported by the technology and have an AI governance plan in place.
  • Analyze the strengths and cautions of each deployment approach, align use cases with these approaches, and understand how it fits with the organization's existing enterprise architecture to make objective decisions on a use-case-by-use-case basis.
  • Create a decision framework for choosing the right deployment strategy when considering the various deployment options. Factors include, but are not limited to, the use cases, costs, organizational aptitude, security and privacy, governance, and implementation simplicity.

Summary

  • Multiple deployment approaches: Organizations have several deployment approaches to consider when implementing generative AI, including off-the-shelf products and services, APIs to integrate with existing enterprise applications, RAG, fine-tuning, and DIY.
  • Document different approaches: To select the right approach, technology leaders must first understand and document the differences between each deployment approach so that they aren't locked into the approaches dictated by their vendors. They should also analyze the strengths and cautions of each approach, align use cases with these approaches, and understand how it fits with the organization's existing enterprise architecture to make objective decisions on a use-case-by-use-case basis.
  • Deployment Strategy Framework: Factors to consider when choosing a deployment strategy include use cases, costs, organizational aptitude, security and privacy, governance, and implementation simplicity. Organizations should create a decision framework to guide their selection process and ensure they are making informed decisions that align with their specific needs and goals.


If you enjoyed this article, please like it, highlight interesting sections, and share comments. Consider following me on Medium and LinkedIn.


Please consider purchasing my latest TinyTechGuide:

If you’re interested in this topic, check out TinyTechGuides’ other books, Mastering the Modern Data Stack or Artificial Intelligence: An Executive Guide to Make AI Work for Your Business.


[1] Sweenor, David. 2024. “The Generative AI Hammer. Is Everything a Nail?” Medium. April 15, 2024. https://medium.com/@davidsweenor/the-generative-ai-hammer-is-everything-a-nail-274c9a4a042d.

[2] Sweenor, David. 2023. “Generative AI’s Force Multiplier: Your Data.” Medium. September 20, 2023. https://medium.com/@davidsweenor/generative-ais-force-multiplier-your-data-3763e8ed59df.

[3] Baig, Aamer, Sven Blumberg, Eva Li, Douglass Merrill, Adi Pradhan, Megha Sinha, Alexander Sakharovevsky, and Stephen Xu. 2023. “A CIO and CTO Technology Guide to Generative AI | McKinsey.” Www.mckinsey.com. July 11, 2023. https://www.mckinsey.com/capabilities/mckinsey-digital/our-insights/technologys-generational-moment-with-generative-ai-a-cio-and-cto-guide.

[4] Park, Kate. 2023. “Samsung Bans Use of Generative AI Tools like ChatGPT after April Internal Data Leak.” TechCrunch. May 2, 2023. https://techcrunch.com/2023/05/02/samsung-bans-use-of-generative-ai-tools-like-chatgpt-after-april-internal-data-leak/.

[5] Qi, Xiangyu, Yi Zeng, Tinghao Xie, Pin-Yu Chen, Ruoxi Jia, Prateek Mittal, and Peter Henderson. 2023. “Fine-Tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!” ArXiv.org. October 5, 2023. https://doi.org/10.48550/arXiv.2310.03693.

[6] Sweenor, David. 2024a. “Future-Proof Your IT: The CIO’s Guide to Generative AI Vendor Selection.” Medium. January 20, 2024. https://medium.com/@davidsweenor/future-proof-your-it-the-cios-guide-to-generative-ai-vendor-selection-9bd9cc5c55b6.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了