Generative AI Deployment Strategies
David Sweenor
B2B Marketing Leader, Founder TinyTechGuides, DataIQ 100, Top 25 AI and Analytics Thought Leader, Master Gardener
A Strategic Guide for CIOs and CTOs
Introduction
As organizations continue to tinker with generative AI, they face with a critical question: how can they best deploy the technology to achieve scale and return on investment (ROI) as quickly as possible? Given the sheer pace of innovation, CIOs and CTOs need to think carefully about the many available options. Key questions include: do you build or buy? What are the costs and benefits of each deployment approach? How long will it take? Before beginning any AI project, two critical items must be addressed: 1) make sure that your organization is crystal clear on what enterprise workflows will be supported by the technology, and 2) have an AI governance plan in place. See my article, The Generative AI Hammer. Is Everything a Nail for additional insights on selecting the right use cases.[1]
Assuming that the use cases and AI governance framework are defined, an organization can consider several deployment approaches. These approaches include:
Three of these were covered in my previous article, Generative AI’s Force Multiplier: Your Data.[2] This article extends the initial three and adds a decision framework.
To select the right approach, technology leaders must first understand and catalog the differences between each deployment approach so that they aren't locked into the approaches advised by their vendors. CIOs and CTOs should also analyze the strengths and cautions of each approach, align use cases with these approaches, and understand how they fit with the organization’s existing enterprise architecture to make objective decisions on a use-case-by-case basis.
Deployment Strategies for Generative AI
Organizations should create a decision framework for choosing the right deployment strategy when considering the various deployment options. Factors include, but are not limited to, the use cases, costs, organizational aptitude, security and privacy, governance (e.g., control of model output), and implementation simplicity.
Off-the-shelf Products and Services
The simplest, most straightforward strategy is to buy a commercial solution with generative AI baked in. Purchasing a ready-made solution is easier to deploy and has low or no fixed costs, allowing organizations to experiment with generative AI with minimal investment. However, with this approach, flexibility is sacrificed. When you buy a pre-packaged solution, you “get what you get” out of the box. Although it’s possible to customize some of these solutions, it ups the complexity and effort required to maintain them. McKinsey estimates that this costs between $0.5M and $2.0M with a recurring annual cost of $0.5M.[3] With off-the-shelf prepackaged solutions, you also have less control over security and data privacy risks and need to accept the application provider's security and data protection controls–many even offer indemnification clauses within their contracts, but do they really have your back if something goes amok?
Use APIs to Integrate with Existing Enterprise Applications
Integrating pre-built generative AI models via APIs with existing enterprise applications is easier to implement. It has lower fixed costs since organizations pay only for model usage and not for its training. Please note that providers like OpenAI charge for both the inputs and outputs. When this approach is used, many companies embed one-shot or few-shot learning examples within the prompt–improving its accuracy for the task at hand. However, when this approach is used, you may run into context window limits–although these are generally increasing.?
Extending Generative AI Models via Data Retrieval
RAG enables organizations to add their proprietary organizational data to the responses without fine-tuning or training a FM. This method improves the accuracy and quality of model responses by augmenting the model's knowledge base with your organization's data–without requiring the creation of custom models. However, similar to the API approach, a RAG approach is limited by the context window of the generative model, constraining the amount of retrieved information that can be sent to the model. For real-time use cases, the extra retrieval step to augment the prompt increases latency, limiting its viability.
Fine-tuning Existing FMs
The fourth approach involves fine-tuning a FM to incorporate additional domain knowledge or improve performance on specific tasks. This method results in custom models dedicated to the organization, improving performance and reducing hallucinations. However, the cost of using a fine-tuned model (inference cost) can be significant. McKinsey estimates this would cost between $2M to $10M, with a $0.5M to $1M recurring annual maintenance budget. Also, fine-tuning on top of a given foundation model may compromise safety.[5]
DIY, Building Custom Foundation Models from Scratch
The last approach involves organizations building their own FMs from scratch, fully customizing them to their data and business domains. This method offers the potential for the highest accuracy, complete control over training datasets and model parameters, and competitive differentiation. However, training and maintaining a large, generative AI model is insanely expensive and can only be afforded by the largest organizations. McKinsey calculates that this costs somewhere between $5M to $200M for an initial build with a $1M to $5M recurring fee. The models will also need to be regularly updated which also increases costs.
领英推荐
In the end, with the exception of the DIY approach, most companies will combine these approaches across their organization.
Deployment Decision Framework
To choose the right deployment approach, organizations should consider factors such as use case, costs, organizational knowledge, security and privacy, governance, and implementation simplicity.
Now that we have examined the deployment strategies and decision framework, what can you do?
Practical Advice and Next Steps
Summary
If you enjoyed this article, please like it, highlight interesting sections, and share comments. Consider following me on Medium and LinkedIn.
Please consider purchasing my latest TinyTechGuide:
If you’re interested in this topic, check out TinyTechGuides’ other books, Mastering the Modern Data Stack or Artificial Intelligence: An Executive Guide to Make AI Work for Your Business.
[1] Sweenor, David. 2024. “The Generative AI Hammer. Is Everything a Nail?” Medium. April 15, 2024. https://medium.com/@davidsweenor/the-generative-ai-hammer-is-everything-a-nail-274c9a4a042d.
[2] Sweenor, David. 2023. “Generative AI’s Force Multiplier: Your Data.” Medium. September 20, 2023. https://medium.com/@davidsweenor/generative-ais-force-multiplier-your-data-3763e8ed59df.
[3] Baig, Aamer, Sven Blumberg, Eva Li, Douglass Merrill, Adi Pradhan, Megha Sinha, Alexander Sakharovevsky, and Stephen Xu. 2023. “A CIO and CTO Technology Guide to Generative AI | McKinsey.” Www.mckinsey.com. July 11, 2023. https://www.mckinsey.com/capabilities/mckinsey-digital/our-insights/technologys-generational-moment-with-generative-ai-a-cio-and-cto-guide.
[4] Park, Kate. 2023. “Samsung Bans Use of Generative AI Tools like ChatGPT after April Internal Data Leak.” TechCrunch. May 2, 2023. https://techcrunch.com/2023/05/02/samsung-bans-use-of-generative-ai-tools-like-chatgpt-after-april-internal-data-leak/.
[5] Qi, Xiangyu, Yi Zeng, Tinghao Xie, Pin-Yu Chen, Ruoxi Jia, Prateek Mittal, and Peter Henderson. 2023. “Fine-Tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!” ArXiv.org. October 5, 2023. https://doi.org/10.48550/arXiv.2310.03693.
[6] Sweenor, David. 2024a. “Future-Proof Your IT: The CIO’s Guide to Generative AI Vendor Selection.” Medium. January 20, 2024. https://medium.com/@davidsweenor/future-proof-your-it-the-cios-guide-to-generative-ai-vendor-selection-9bd9cc5c55b6.