Generative AI provides a wide breadth of capability, and a key part of designing and building a generative AI solution in government departments will be to get it to behave accurately and reliably.
The below presents the key and foundational concepts that is needed to understand , design and build generative AI solutions that meet department needs.
- Prompts are the primary input provided to an LLM. In the simplest case, a prompt may only be the user-prompt. In production systems, a prompt will have additional parts, such as meta-prompts, the chat history, and reference data to support explainability.
- Prompt engineering describes the process of adjusting LLM input to improve performance and accuracy. In its simplest form it may be testing different user-prompt formulations. In production systems, it will include adjustments, such as adding meta-prompts, provision of examples and data sources, and sometimes parameter tuning.
- User-prompts are whatever you type into e.g. a chat box. They are generally in the everyday natural language you use, e.g. ‘Write a summary of the generative AI and LLM'.
- Meta-prompts/System prompts are higher-level instructions that help direct an LLM to respond in a specific way. They can be used to instruct the model on how to generate responses to user-prompts, provide feedback, or handle certain types of content.
- Embedding is the process of transforming information such as words, or images into numerical values and relationships that the computer algorithms can understand and manipulate. Embeddings are typically stored in vector databases (see below).
- Retrieval augmentation generation(RAG) is a technique which uses reference data stored in vector databases (i.e. the embeddings) to ground a model’s answers to a user’s prompt. One could specify that the model cites its sources when returning information.
- Vector databases index and store data such as text in an indexed format easily searchable by models. The ability to store and efficiently retrieve information has been a key enabler in the progress of generative AI technology.
- Grounding is the process of linking the representations learned by the AI models to real-world entities or concepts. It is essential for making AI models understand and relate its learned information to real-world concepts. In the context of LLMs, grounding is often achieved by a combination of prompt engineering, parameter tuning, and retrieval augmented generation.
- Chat history is a collection of prompts and responses. It is limited to a session. Different models may allow different session sizes. For example, Bing search sessions allow up to 30 user-prompts. The chat history is the memory of LLMs. Outside of the chat history LLMs are ‘stateless’. That means the model itself does not store chat history. If you wanted to permanently add information to a model you would need to fine-tune an existing model (or train one from scratch).
- Parameter tuning is the process of optimising the performance of the AI model for a specific task or data set by adjusting configuration settings.
- Model fine-tuning is the process of limited re-training of a model on new data. It can be done to enforce a desired behaviour. It also allows us to add data sets to a model permanently. Typically, fine-tuning will adjust only some layers of the model’s neural network. Depending on the information or behaviour to be trained, fine-tuning may be more expensive and complicated than prompt engineering. Experience with model tuning in government is currently limited and expected in future iteration.
- Open-source models are publicly accessible, and their source code, architecture, and parameters are available for examination and modification by the broader community.
- Closed models, on the other hand, are proprietary and not openly accessible to the public. The inner workings and details of these models are kept confidential and are not shared openly.