The Evolving Landscape of AI Models: A Quick Guide
Sangeeta Gautam
GenAI | Data & Analytics | Innovator | Patent Holder | Speaker | Author | Complex Program Management
The AI industry is changing at a fascinating pace! I am loving it! If you are thinking let it settle down for a little bit and then I will start with the best model. Let me share what I am seeing – the large language models are swapping Leader spot quickly in various categories - as quickly as every few weeks. ChatGpt, Granite, Mistral, Gemini and Claude are all responding to the need of the moment and making rapid updates.
What you call "best" is really defined by "what feature is best for what you are trying to achieve". This suggests the importance of understanding the different types of AI models. Once you are aware of the types of AI model, what they are good at then it will be easy to choose the one that aligns best with what you are trying to achieve in your business.
You have heard about the Foundation Models (FMs), Large Language Models (LLMs), Small Language Models (SLMs), Mixture of Experts (MoE) etc., these are various types of AI models, each best suited for a certain type of work. Let’s explore each one in further detail.
Foundation Models
Foundation models are large pre-trained algorithms. They are trained on large and diverse dataset such as books, articles and internet content. Because these models are trained on such a large dataset, they are exposed to various language patterns, nuances and context. For this reason, foundation models provide a good starting point for many AI applications. You can think of foundation models like a multi-purpose tool such as the Swiss Army Knife. Like one Swiss Army Knife has a blade, a screwdriver, a can opener etc. allowing you to use it in many different ways, similarly you can use a foundation model for wide variety of tasks such as answer questions, write poems, generate code, analyze images etc.? These models can be adapted for various kinds of tasks by fine-tuning it for the task at hand. This makes foundation models a good choice to use as a starting point for many AI applications as you don’t have to train the model from scratch, instead you can tweak the pre-trained model to your needs and get started.
Example: BERT (Bidirectional Encoder Representations from Transformers) by Google
Use Case: Text Classification e.g., spam detection in emails
Large Language Models
Large Language Models (LLMs) are subset of Foundation Models. They are famous for their understanding of human language. Not only are they good at understanding human language but are also great at generating human language. The word “large” in Large Language Models signifies the huge amount of text data they are trained on. In addition to the huge amount of training data, they have millions or billions of parameters. Parameters here refers to the number of ways the model can adjust as it learns. Adjusting these parameters impacts the output it provides, an easy-to-understand example is you can provide a picture to a LLM and by adjusting the parameters (or weights) you can get variations of the picture such as younger or older looking in terms of age. I have many of my friends posting their younger or older looking self on Facebook. Did you try it yet?
LLMs are also a great choice for answering questions, writing poems or articles (even books), summarizing text or translating languages.? OpenAI’s ChatGPT is a popular example of LLM along with many others. Next time you are there, use it to interview yourself or someone, it’s pretty cool!
Like everything else, there are pros and cons to pretty much everything, LLMs included. The art and science of LLMs have not been perfected yet, so while they are great with providing good output they still hallucinate – so don’t let the output pass your hand without a good read. Yes, be the human-in-the-loop.
Example: ChatGPT-4, Claude, Gemini, Granite etc.
Pros: Very adaptable, can handle complex tasks
Cons: Very resource intensive for training, potential for bias if not designed well, may not be a good fit for tasks real-time applications
Use Case: Content creation e.g. writing articles, drafting emails etc.
Small Language Models
Small Language Models (SLMs), as the name suggests are smaller in size as compared to the LLMs. SLMs are trained on smaller data set as compared to LLMs and have fewer parameters available. Think of it like, if you are in healthcare industry and healthcare data is what matters to you most then, you train the model with healthcare data. Keeping the model focused on a specific task makes the model act like a specialist in that field. The benefit of SLMs is that since they are small, they need fewer computational resources and can also be used on smaller devices. And yes, smaller carbon footprint. An example of SLM would be language translation apps you may be using on your smartphone.
Example: DistilBERT, a smaller version of BERT that is faster and designed to be more efficient
Pros: lower resource requirement, efficient and faster responses, can be deployed on edge devices
Cons: less powerful than LLMs and may not handle complex tasks well
Use Case: Chatbots used to handle customer service queries
?Mixture of Experts
Think of Mixture of Experts (MoE) as a group of experts aka a group of small language models. Depending upon the type of request, the most relevant expert is called upon to fulfill that request. You have experienced this when watching a panel discussion. Depending upon your business need this group would be a combination of small models that fulfills that need.
There exists a network that orchestrates which expert from the group of models gets to respond to the request. This collaborative approach makes MoE scalable and flexible. Of course, always take into account that the more complex you make the MoE the more complicated its management will get and may also degrade performance.
An example of MoE would be image recognition. An MoE for this could consist of a model that is expert in facial recognition, another could be object recognition and another in surrounding/environment recognition etc.
Example: Switch Transformer by Google
领英推荐
Pros: provides platform that can collaborate with multiple specialized systems, balances efficiency and performance, scalable
Cons: more complex to design and train, if not designed well (too many experts may lead to performance and scalability issues)
Use Case: Adaptive Learning Systems that personalize education and adjust content based on the learner's progress
?Other Important AI Model Types
While language models are popular, other crucial AI model types include:
????????????? Computer Vision Models: Process and analyze visual data
????????????? Reinforcement Learning Models: Learn by interacting with the environment
????????????? Generative Models: Create new content based on training data
Now the million-dollar question, which one is best for me?
?With a good understanding of types of AI models, let us explore the factors to consider to help choose the best model that may fulfill the business need.
?Below are some key considerations to take into account, think through each of these in your context and then prioritize. Next find the LLM that aligns best with your prioritized needs.
?1. Data Sensitivity: If you are working with sensitive data such as healthcare data or handling any personally identifiable information (PII) you may consider SLM as they can be run locally instead of making it available on the cloud where a LLM may reside.
2. Resources: If you need fast and efficient response, like in the case of language apps on smartphone then SLM is your best bet. These work well with limited resources and do not require a lot of computational resources. Same for MoE, they work with less resources and provide efficiency. On the other hand, LLMs require lot of computational power and are more resource intensive.
3. Complexity: Consider the complexity of the task at hand. For instance, tasks that are focused and specific, you guessed it right SLMs are the best choice. On the other hand, if the task at hand is more creative in nature like writing a book or requires a deep understanding of complex languages then LLM may be a better choice. For complex tasks that require fast response and expert knowledge or scaling then consider MoE as it will let you distribute the tasks among specialized models.
4. Efficiency: If the task at hand requires diverse inputs to be handled efficiently then MoE may be the best choice providing a good balance between resource use and efficient response.
5. Budget: LLMs have a huge cost of training, maintenance and deployment. On the other hand, SLM and MoE are less resource intensive with lower associated cost.
6. Customization: If you are looking to quickly get started without spending millions (yes millions of dollars) building a model from scratch. You can use a FM and fine-tune it.
7. Real-time Responses: If your work requires real-time response then consider an SLM or a MoE.
8. Ethical Considerations: Evaluate the ethical safeguards and guardrails needed, as not all models are created equal. It is crucial to ensure ethical safeguards are in place and will not expose you to risks of misuse. Another, important factor is to ensure it’s not a black box, the model should provide transparency, fairness and accountability in your AI deployment.
9. Regulatory Compliance: Depending on the business region and need, ensure to evaluate the model for data protection and governance for the location you will use it in.
It is impossible to cover here all combinations of implementation architecture but this should give you a good idea of things to keep in mind.
By carefully considering these factors, you can make a smart decision that maximizes the benefits of AI for your specific use case.
The field of AI is evolving rapidly. What's cutting-edge today might be old news tomorrow. Stay curious, keep learning, and don't hesitate to experiment with different approaches.
?
**The above article is my perspective and does not represent any person or organization’s opinions or strategies.
Chief Revenue Officer (CRO)
3 个月Very well written Sangeeta.
CIO | Founder | Investor | Board Member | Bestselling Author | Speaker
3 个月Excellent explanation Sangeeta! Thank you for sharing.
Client Delivery Executive, Sr Director at NTT DATA Services
3 个月Very informative article, Sangeeta