- As you are well aware of C-3PO a robotic diplomat and translator from star wars universe can communicate fluently in over six million forms of languages.
- While he might not have the terabytes of data and billions of parameters like modern Large Language Models(LLMs).
- C-3PO’s ability to converse in over six million forms of languages definitely earns him a spot in the linguistic hall of fame. He’s the vintage, handcrafted, artisanal version of what we now call a Large Language Model — a true linguistic multitasker from a galaxy far, far away.??????
- Large Language Models (LLMs) are subset of deep Learning.
- LLMs also intersects with Generative AI.
To learn more about Generative AI please refer to my previous article
So….. What is Large Language Models (LLMs)?
- LLMs refers to large, general-purpose language models can be pre-trained and then fine-tuned for specific purposes.
- LLMs are sophisticated AI systems utilizing massive neural networks to understand and produce human-like text.
- LLMs are trained to solve common language problems like Text classification, Question answering, Document summarization, Text Generation.
- These model then be tailored to solve specific problems in different fields such as Retail, Finance, Entertainment trained with relatively small size of field datasets.
Breakdown of LLMs into major features!!
- enormous size of datasets sometimes at petabyte scale.
- Large number of parameters, in ML they called as hyperparameters they are basically the memories and knowledge that machine learned from the model training. parameters define skill of a model in solving problem such as predicting text.
- It simply means that models are sufficient to solve common problems.
- Reasons being the commonality of human language regardless of specific tasks.
- Only certain organizations have capability to train on large datasets with high number of parameters like google, so creating fundamental Large Language Models(LLMs) would be useful to other organizations.
Pre- trained and fine-tuned:
- Training a model on large data sets for general purpose for example imagine training a dog basic commands sit, come, down, stay is called pre-training.
- Training a model on task specific datasets so that the customized problem gets solved for example if we want special training dog like police dog, hunting dog we add-on special training layers on top of basic commands can be referred as fine-tuning.
- In April 2022 google released PaLM that is short for Pathways Language model.
- It was trained on 540 billions of parameters.
- Pathway is new AI architecture that will handle many tasks at once and learn new task quickly
- PaLM is transformer model in which the encoder encodes input sequences and passes it to decoder which then learns how to decode the representations for relevant task.
LLM development vs. Traditional development.!
- LLM development using pre-trained APIs which doesn’t require ML expertise, training examples, training a model which mainly focuses on prompt design. prompt means quality of input which defines the quality of output. It is important part of Natural Language Processing(NLP).
- On the other hand Traditional development requires training , expertise , model training and hardware setip.
What is QA in Natural Language Processing?
- Question Answering(QA) is subfield of natural language processing that deals with task automatically answering questions posed in natural language.
- here are some questions given to Bard conversational AI from Google
- The desired response is obtained due to prompt design. here the questions we ask is called as prompts.
- There are 3 main kinds of LLMs each needs prompting in different way.!
- Generic (or Raw) Language Models these predict the next word technically called as token based on the language in the training data. Example:- Next word suggestion in google search and next word suggest in our mobile google keyboards.
2. Instruction tuned Models trained to predict a response to instructions given in the input. for example,
3. Dialog Tuned models trained to have a dialog by predicting the next response. for example,
- As we discussed earlier Fine-tuning for on custom data for specific task which requires more efforts, more expensive and is not realistic example.
The efficient methods of tuning.!
Parameter-Efficient Tuning Methods(PETM) in which the base model is not altered. We adjust and switch a few extra add-on layers for specific needs. Like swapping toppings on a pizza. it is considered as one of the easiest way PET Methods.
- PaLM API, Maker Suite by google can give us access to user interface to model training tool , model deployment tool and model monitoring tool.
- you can create Generative AI Apps by using googles Generative AI App builder without wring any code.! if you want to build an app checkout the link https://cloud.google.com/generative-ai-app-builder
- Google release released PaLM 2 on May10th , 2023. PaLM is the basis for Bard.
- ChatGPT is built upon the GPT (Generative Pre-trained Transformer) architecture, which is a type of Large Language Model (LLM).
- Generative Pre-trained Transformer (GPT) is a type of advanced artificial intelligence model designed for natural language processing tasks. It uses a transformer architecture(we discussed above), which is a type of neural network architecture specifically suited for handling sequential data like text.
- ChatGPT is fine-tuned for conversations, making it great for chatbots and dialogue.
- It uses LLM principles to create a conversational AI system.
- Large Language Models (LLMs) represent a groundbreaking advancement in AI, transforming the landscape of language understanding and generation.
- With their ability to learn from extensive datasets, LLMs have demonstrated remarkable proficiency in various tasks, from translation to text creation.
- LLMs hold substantial potential to drive innovation across industries, streamlining processes, enhancing communication, and enabling more efficient data analysis.
- As LLM technology evolves, its impact on human-computer interactions, content creation, and information dissemination is expected to be far-reaching, shaping the future of AI-driven communication.