Exploring SLMs -Getting started with Phi-3

Exploring SLMs -Getting started with Phi-3

SLMs and Phi-3 Family of models

Small Language Models (SLMs) are gaining prominence in the field of Artificial Intelligence (AI) for several compelling reasons. They offer a more sustainable and accessible approach to AI, particularly in scenarios where resources are limited or where specific, domain-focused applications are needed. Unlike their larger counterparts, SLMs require significantly less computational power and data to operate, which not only makes them more environmentally friendly but also allows for their deployment in a wider range of settings, including those with restricted technological infrastructure.

The importance of SLMs lies in their ability to provide AI enablement without the extensive resource demands typically associated with Large Language Models (LLMs). This is particularly beneficial for smaller organizations or educational institutions that may not have the capacity to support the heavy computational load of LLMs. By utilizing SLMs, these entities can still leverage the power of AI for tasks like natural language processing, data analysis, and automated decision-making, but with a smaller carbon footprint and lower operational costs.

While there are numerous players that have built their own SLMs Microsoft recently introduced their family of models named as Phi-3. It's a type of SLM (Small Language Model) that is designed to deliver great performance while remaining light weight enough to run on resource-constrained devices like smartphones. Within the family of Phi-3, Phi-3-mini does better than modes twice it's size, and Phi-3-small and Phi-3-medium outperforms much larger models including our very own GPT-3.5 Turbo.

Benchmark Reference

Not only they outperform their counterparts but thy are actually cost-effective too. While deploying most of LLMs pose complex challenges lets have a look at how easy it is to deploy Phi-3-mini smallest of the Phi-3 family.

Deployment Options

The deployment of Phi-3-mini showcases the versatility and adaptability of modern AI frameworks, catering to a wide range of platforms and preferences. The Semantic Kernel/LangChain approach within the Copilot application exemplifies the seamless integration with Azure OpenAI Service and OpenAI models, while also extending support to open-source models from Hugging Face and local models. This flexibility is crucial for developers who require a robust framework that can accommodate various AI models.

The preference for quantized models, as seen with Ollama/LlamaEdge, underscores the growing demand for efficient local model execution. The ability for users to invoke different quantized models through Ollama/LM Studio demonstrates a shift towards more personalized and immediate AI interactions. The provision to run

ollama run phi3        

or configure it offline further enhances this user-centric approach, allowing for greater control and customization.

Optimization for ONNX Runtime indicates a commitment to performance and compatibility, with Phi-3-mini's support for Windows DirectML and cross-platform functionality across GPU, CPU, and mobile hardware. This optimization ensures that the model can perform efficiently in diverse environments, which is essential for developers looking to deploy AI solutions in varied settings.

The availability of Phi-3 Mini as an NVIDIA NIM at ai.nvidia.com with a 128K context window reflects the collaboration between AI service providers and hardware manufacturers. Packaging it as a microservice with a standard API that can be deployed anywhere provides developers with a scalable and accessible AI tool.

Last but not the least, the option for local deployment of Phi-3 Mini on laptops or within mobile applications speaks to the growing trend of on-the-go AI. Support for platforms like Microsoft Azure AI Studio, Hugging Face, and Ollama ensures that developers have the necessary tools to integrate AI into a range of applications, further pushing the boundaries of what's possible with AI technology today. Each of these deployment methods highlights the dynamic nature of AI development and the continuous evolution of deployment strategies to meet the needs of a diverse developer community.

Let's try using Ollama

Lets try to deploy Phi-3-mini using Ollama on our local machine and test it out.

Firstly, you need to ensure you download Ollama for Windows (Currently in Preview) : Ollama.

Once downloaded and installed you'd be able to see that the service runs itself once installed so now it's time to play with Ollama and install Phi-3-mini. In case your Ollama service is not running you can use this command below to start Ollama.

// To check port bindings
netstat -a 

//Start ollama
ollama serve        

Once that's done now it's time to pull our Phi-3 model on our local machine so that we can utilize it.

ollama pull phi3        
Ollam Pull process for Phi-3

And now it's time to use it. You can you another basic command to play with it.

ollama run phi3        

While this is just start to what possible you could do with this little magical SLM it's super easy to play with it in python. You can install pip package for Ollama and consume Phi-3 model within your python program,

pip install ollama        

Import and consume

import ollama

# Define the messages
messages = [
    {
        'role': 'user',
        'content': 'Hey! ',
    },
]

# Use the chat function with streaming
stream = ollama.chat(model='phi3', messages=messages, stream=True)

# Print each part of the stream
for chunk in stream:
    print(chunk['message']['content'], end='', flush=True)        
Response for the above content message

Exciting! No? It brings great possibilities you can try summarizing your documents create embeddings and do magic with your available data not worrying about data residency, being expensive and good thing is that it had latest information by 2023 which provide better context for general conversations. Looking forward to see what you folks are going to build using it.


Usama wahab Khan ?????

Microsoft Azure MVP leading AI innovation at Evolution Technologies. #generativeai #OpenAI #LLM #DATA #AwS #Azure | 15k + followers

6 个月

Really interesting. Going to try phi 3 with RAG. Any cost comparison available phi 3 vs 3.5 gpt for interference

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了