Deploying a Minimalist ChatGPT in Azure

Deploying a Minimalist ChatGPT in Azure

Pick any AI and ask how to deploy AI (pun intended). Of course, you will receive an answer that reflects the complexity of your prompt. But how do you get started with the absolute lowest level of complexity and time/cost investment?

Deploy Azure OpenAI, retaining access control of your prompts and data

Based upon my recent experience, here is my simplified roadmap to the most basic version of ChatGPT deployed in the Azure platform. Please see Baseline OpenAI end-to-end chat reference architecture - Azure Reference Architectures | Microsoft Learn for architectural information.

  1. I used Microsoft's Sample Chat App with AOAI (Azure Open AI). Using this particular GitHub repository provided credible source code and low effort path to implementation. When you review this repository, please note that the first implementation described is "the basic" ChatGPT app. The repository also contains guidance and code to further extend AI functions to allow access to your own data or even AdHOC data that you upload. In this simplified guide I am only deploying the most basic implementation, just enough to have a working ChatGPT app.
  2. In Azure, I created a new Resource Group and Azure Open AI object. The S0 pricing tier is the only option and I encourage you to review the pricing for the model(s) you deploy in detail as well as understand how tokens are calculated at Azure OpenAI Service - Pricing | Microsoft Azure. Basically, one token generally corresponds to ~4 characters of common English text (More information at What are tokens and how to count them? | OpenAI Help Center). There is a cost per token for the prompt input and the response output. The cost per a token is directly related to which AI model is being used. I know, it sounds like a potential sticker shock moment but there is a way to throttle token use. My deployment was tested with a community size near 1000 people and cost under $300/month US dollars.
  3. In the new Azure Open AI object, I deploy a simple chat model. At this point I have satisfied the prerequisites specified in the repository. Remember that throttling token use is possible! In the image below, notice I have disabled dynamic token quotas and configured a 5K tokens permute rate limit.

Model got-35-turbo deployment configuration

4. Back at the repository, I select Deploy to Azure. The deployment template contains a default App Service Plan size B3, which I changed to F0 for my development. If you require a custom DNS name for the web app, select at least size B1 (which supports near 1000 users and stays under 60% CPU). The repository also contains the web app variables for customizing the web app UI without editing the code used from the repository.

Custom ChatGPT web app

This very basic deployment is the most basic implementation, just enough to have a working ChatGPT app. Even though this custom chatGPT web app had humble beginnings, it steadily supported nearly 1000 users and the monthly cost for tokens and required Azure infrastructure remained near $300/month. Of course, this basic deployment can be extended further by using the same repository web app variables. Even more customization is possible with forking/cloning the repository.

Thanks for reading this post!

Woodley B. Preucil, CFA

Senior Managing Director

1 个月

Kam-Reef S. Great post! You've raised some interesting points.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了