Generative AI with LLM : application challenges on deployment
Deploying any generative AI application, which is using Large Language Model (LLM) need some special consideration. In this article, I will try to introduce some issues, which need to consider during deployment of generative AI issue and give some hints (references) of frameworks and libraries which will be candidate of deployment architecture of that solution.
Important abbreviations
LLM: Large Language Model
RAG: Retrieval Augmentation Generation
DL: Deep Learning
BNLP: Bangla Natural Language Processing
Basic deployment architecture of ML project
To work for generative AI, to serve some extraordinary features to your customer, you have done all hard works by preparing the LLM. You did lot research work to select one, prepare lot data and did data analysis works to train the selected LLM for fine-tuning. You saw exciting results, which is satisfying all of your investors. Now everybody excited to see how it is going to effect on business. Perhaps you already planning to deploy for beta version.
Therefore, you deployed it for alfa testing with following architecture. Which is typical application architecture with additional layer as ML access layer with LLM. You must plan to keep log and other best practices of solution architecture. You also need to collect user updates and retrain the model. In this diagram, I ignored these parts. (In our DL program for BNLP, we followed this structure)
Issues on this architecture?
However, one of your team found some serious issues.
The LLM is
(a) Not giving correct results from latest updates on NEWS (i.e. an LLM cannot give answer to question, “Who own the football world cup in 2022?” Not only that, it is not giving any latest NEWS. How to get information from latest NEWS?)
(b) Cannot give answer to reasoning problems.
(c) Giving close result but not giving exact result of simple math.
In team discussion on this issue, one of your team member noticed that, LLM would never give answer to this (at least as of current understanding). Because, the main task of LLM is to predict related words (completion) against given words (prompt). Therefore, if we need to solve above problems, we need to work more.
领英推荐
Another issue you need to notice. Your LLM will never stop giving answer. It is because, it always predicts next token (another word), and hence always it will have some answer. In practice, it may no answer at al or may some funny words. This is hallucination.
Modified architecture
For this, you need to retrieve related data from external resources to augment with LLM to generate desired output. This is Retrieval Augmentation Generation (RAG) framework. This framework suggest to generate a query and retrieve related data from external resources (any document, API data or can be internally stored data) and concatenate with prompt before sending to LLM. Then the LLM generate the completion. There are different version of RAG and different implementation. You need to find specific one as of your need (in this article, I am nor focusing on RAG). There are different libraries available to support RAG and some other helping tools and frameworks like LangChain, PromtChainer etc. These libraries also solve other problems which directly not solvable by LLM (like reasoning). You can find lot more available both free and paid. You will select one library, which will serve your purpose. So the updated deploy can be look like following diagram.
Conclusion:
Generative AI using LLM to generate completion. But to get applicable output, it requires some extra efforts depending on requirement. It is doing so many things that were almost impossible before. In addition, some vendor offering regular works (like coding of applications) that is becoming thread for some jobs. It also opening door to new opportunities. You will see the libraries may need wrapper to fit, collecting data for further training, troubleshooting of issues and so on. Devils people are not out of race. Therefore, new cyber thread will appear. Hackers will find new ways of cyber-crime on Generative AI related applications like prompt injection. Therefore, new opportunities will come for cyber security professional. It is time for organizations to be prepared for these before starting mass level of deployment of Generative AI application.
References
?
?
?
?
?