Learn how to put Large Language Model-based applications into production safely and efficiently.
- LLMs are exciting because they work within the same framework (language) as humans
- Society has been built on language, so effective language models have limitless applications such as chatbots, programming assistants, video games, and AI assistants.
- LLMs are excellent at many tasks and can even pass high-ranking medical and law exams
- LLMs are wrecking balls not hammers, and should be avoided for simple problems, problems that require low latency, and problems with high risks.
- Reasons to buy include: Quickly get up and running to conduct research and prototype use cases Easy access to highly optimised production models Access to vendors technical support and system
- Reasons to build include: Getting a competitive edge for your business use case Keeping costs low and transparent Ensuring reliability of the model Keeping your data safe Controlling model output on sensitive or private topics
- There is no technical moat that is preventing you from competing with larger companies since open source frameworks and models provide the building blocks to pave your own path.
- The five components of linguistics are phonetics, syntax, semantics, pragmatics, and morphology. Phonetics can be added through a multimodal model that processes audio files and is likely to improve LLMs in the future, but current datasets are too small.Syntax is what current models are good at.Semantics is added through the embedding layer.Pragmatics can be added through engineering efforts.Morphology is added in the tokenisation layer.
- Language does not necessarily correlate with reality. Understanding the process that people use to create meaning outside of reality is useful to training meaningful (to people) models.
- Proper tokenisation can be a major hurdle due to too many <UNK> tokens, especially when it comes to specialised problems like code or math.
- Multilingual processing has always outperformed monolingual processing, even on monolingual tasks without models.
- Each language model type in sequence show a natural and organic growth of the LLM field as more and more linguistic concepts are added and make the models better.
- Language modelling has seen an exponential increase in efficacy, correlating to how linguistics-focused the modelling has been.
- Attention is a mathematical shortcut for solving larger context windows faster and is the backbone of modern architectures - Encoders, Decoders, and Transformers.Encoders improve the semantic approximations in embeddings.Decoders are best at text generation.Transformers combine the two.
- Larger models demonstrate emergent behaviour suddenly being able to accomplish tasks they couldn’t before.
- LLMs are difficult to work with mostly because they are big. Which impacts a longer time to download, load into memory, and deploy forcing us to use expensive resources.
- LLMs are also hard to deal with because they deal with natural language and all its complexities including hallucinations, bias, ethics, and security.
- Regardless if you build or buy, LLMs are expensive and managing costs and risks associated with them will be crucial to the success of any project utilising them.
- Compressing models to be as small as we can will make them easier to work with; quantisation, pruning, and knowledge distillation are particularly useful for this.
- Quantisation is popular because it is easy and can be done after training without any fine-tuning.
- Low Rank Approximation is an effective way at shrinking a model and has been used heavily for Adaptation thanks to LoRA.
- There are three core directions we use to parallelise LLM workflows: Data, Tensor, and Pipeline. DP helps us increase throughput, TP helps us increase speed, and PP makes it all possible to run in the first place.
- Combining the parallelism methods together we get 3D parallelism (Data+Tensor+Pipeline) where we find that the techniques synergise, covering each others weaknesses and help us get more utilisation.
- The infrastructure for LLMOps is similar to MLOps, but don’t let that fool you since there are many caveats where “good enough” no longer works.
- Many tools are offering new features specifically for LLM support.
- Vector Databases in particular are interesting as a new piece of the infrastructure puzzle needed for LLMs that allow quick search and retrievals of embeddings.
- Data Engineers have unique datasets to acquire and manage for LLMs, like model weights, evaluation datasets, and embeddings.
- No matter your task, there is a wide array of open source models to choose from and acquire to use to fine-tune your own model.
- Text based tasks are harder to evaluate than just simple equality metrics you’d find in traditional ML tasks, but there are many industry benchmarks to help you get started.
- Evaluating LLMs for more than just performance like bias and potential harm is your responsibility.
- You can use the Evaluate library to build your own evaluation metrics.
- There are many large open source datasets, but most of them come from scraping the web and require cleaning.
- Instruct schemas and annotating your data can be effective ways to clean and analyse your data.
- Fine-tuning a model on a dataset that has an appropriate distribution of speech acts for the task you want your model to perform will help your model generate context appropriate content.
- Building your own subword tokeniser to match your data can greatly improve your model’s performance.
- Many problems teams are trying to use LLMs for can be solved by simply using embeddings from your model instead.
#LLMs #Production #Building #Product #Modelling #Apis #Code