登录查看更多内容

Exploring SLMs -Getting started with Phi-3

Saad Mahmood

Sr. Cloud Solution Architect - Lead @ Microsoft | HPC, Azure AI, FinOps, Resiliency, Cloud Infrastructure , Azure Open AI

发布日期: 2024年5月3日

SLMs and Phi-3 Family of models

Small Language Models (SLMs) are gaining prominence in the field of Artificial Intelligence (AI) for several compelling reasons. They offer a more sustainable and accessible approach to AI, particularly in scenarios where resources are limited or where specific, domain-focused applications are needed. Unlike their larger counterparts, SLMs require significantly less computational power and data to operate, which not only makes them more environmentally friendly but also allows for their deployment in a wider range of settings, including those with restricted technological infrastructure.

The importance of SLMs lies in their ability to provide AI enablement without the extensive resource demands typically associated with Large Language Models (LLMs). This is particularly beneficial for smaller organizations or educational institutions that may not have the capacity to support the heavy computational load of LLMs. By utilizing SLMs, these entities can still leverage the power of AI for tasks like natural language processing, data analysis, and automated decision-making, but with a smaller carbon footprint and lower operational costs.

While there are numerous players that have built their own SLMs Microsoft recently introduced their family of models named as Phi-3. It's a type of SLM (Small Language Model) that is designed to deliver great performance while remaining light weight enough to run on resource-constrained devices like smartphones. Within the family of Phi-3, Phi-3-mini does better than modes twice it's size, and Phi-3-small and Phi-3-medium outperforms much larger models including our very own GPT-3.5 Turbo.

Not only they outperform their counterparts but thy are actually cost-effective too. While deploying most of LLMs pose complex challenges lets have a look at how easy it is to deploy Phi-3-mini smallest of the Phi-3 family.

Deployment Options

The deployment of Phi-3-mini showcases the versatility and adaptability of modern AI frameworks, catering to a wide range of platforms and preferences. The Semantic Kernel/LangChain approach within the Copilot application exemplifies the seamless integration with Azure OpenAI Service and OpenAI models, while also extending support to open-source models from Hugging Face and local models. This flexibility is crucial for developers who require a robust framework that can accommodate various AI models.

The preference for quantized models, as seen with Ollama/LlamaEdge, underscores the growing demand for efficient local model execution. The ability for users to invoke different quantized models through Ollama/LM Studio demonstrates a shift towards more personalized and immediate AI interactions. The provision to run

ollama run phi3

or configure it offline further enhances this user-centric approach, allowing for greater control and customization.

Optimization for ONNX Runtime indicates a commitment to performance and compatibility, with Phi-3-mini's support for Windows DirectML and cross-platform functionality across GPU, CPU, and mobile hardware. This optimization ensures that the model can perform efficiently in diverse environments, which is essential for developers looking to deploy AI solutions in varied settings.

The availability of Phi-3 Mini as an NVIDIA NIM at ai.nvidia.com with a 128K context window reflects the collaboration between AI service providers and hardware manufacturers. Packaging it as a microservice with a standard API that can be deployed anywhere provides developers with a scalable and accessible AI tool.

Last but not the least, the option for local deployment of Phi-3 Mini on laptops or within mobile applications speaks to the growing trend of on-the-go AI. Support for platforms like Microsoft Azure AI Studio, Hugging Face, and Ollama ensures that developers have the necessary tools to integrate AI into a range of applications, further pushing the boundaries of what's possible with AI technology today. Each of these deployment methods highlights the dynamic nature of AI development and the continuous evolution of deployment strategies to meet the needs of a diverse developer community.

Let's try using Ollama

Lets try to deploy Phi-3-mini using Ollama on our local machine and test it out.

MIT Technology Review 1 个月前

Can GPTZero be relied upon for AI Detection accuracy?

Anna Y. 6 个月前

Understanding & Building LLM Applications!

Pavan Belagatti 6 个月前

Firstly, you need to ensure you download Ollama for Windows (Currently in Preview) : Ollama.

Once downloaded and installed you'd be able to see that the service runs itself once installed so now it's time to play with Ollama and install Phi-3-mini. In case your Ollama service is not running you can use this command below to start Ollama.

// To check port bindings
netstat -a 

//Start ollama
ollama serve

Once that's done now it's time to pull our Phi-3 model on our local machine so that we can utilize it.

ollama pull phi3

And now it's time to use it. You can you another basic command to play with it.

ollama run phi3

While this is just start to what possible you could do with this little magical SLM it's super easy to play with it in python. You can install pip package for Ollama and consume Phi-3 model within your python program,

pip install ollama

Import and consume

import ollama

# Define the messages
messages = [
    {
        'role': 'user',
        'content': 'Hey! ',
    },
]

# Use the chat function with streaming
stream = ollama.chat(model='phi3', messages=messages, stream=True)

# Print each part of the stream
for chunk in stream:
    print(chunk['message']['content'], end='', flush=True)

Exciting! No? It brings great possibilities you can try summarizing your documents create embeddings and do magic with your available data not worrying about data residency, being expensive and good thing is that it had latest information by 2023 which provide better context for general conversations. Looking forward to see what you folks are going to build using it.

Usama wahab Khan ?????

Microsoft Azure MVP leading AI innovation at Evolution Technologies. #generativeai #OpenAI #LLM #DATA #AwS #Azure | 15k + followers

6 个月

Really interesting. Going to try phi 3 with RAG. Any cost comparison available phi 3 vs 3.5 gpt for interference

要查看或添加评论，请登录

查看全部

Exploring SLMs -Getting started with Phi-3

Saad Mahmood

Sr. Cloud Solution Architect - Lead @ Microsoft | HPC, Azure AI, FinOps, Resiliency, Cloud Infrastructure , Azure Open AI

SLMs and Phi-3 Family of models

Deployment Options

Let's try using Ollama

领英推荐

更多精彩文章

社区洞察

其他会员也浏览了

ChatGPT vs Gemini; Uncertainty Quantification in GenAI; GPT-4 vs. GPT-4V vs. Humans On Abstraction and Reasoning; Private vs Public LLMs; and More.

The Dawn of Affordable Intelligence: GPT-4o mini Reshapes the AI Landscape

Voxel51 Filtered Views Newsletter - August 23, 2024

Why Do We Need Neuro-symbolic AI to Model Pragmatic Analogies?

Building vs. Utilizing Existing Large Language Models (LLMs): Considerations for Use Cases and Bias Mitigation

The Future of AI: The Advancements on the Horizon

How OpenAI's New Model o1's Enhanced Reasoning Capabilities Propel Compound AI Systems to New Levels

Explore the Future with Gen AI: Your Weekly Passport to Innovation!

Bicity AI Generates High-Quality, Entirely Automated Texts, and Articles Using A Cutting-Edge AI Model.

Unlocking the Potential of Multi-modal Intelligence: Introducing Google's Gemini - Part 1

SLMs and Phi-3 Family of models

Deployment Options

Let's try using Ollama

领英推荐

Immutable Infrastructure and Deployment Techniques in Microsoft Azure Focusing Resiliency

2024年5月2日

What is Machine Learning | Azure Machine Learning

2019年8月6日

Microsoft Azure Storage – Update that you didn’t noticed !

2017年7月22日

Seeing AI App Announced by Microsoft

2017年7月22日

Fast & Reliable Web Experience with all new Azure CDN DSA (Dynamic Site Acceleration)

2017年7月22日

Prevent unexpected deletion and modification of Azure resources with Microsoft Azure Locks

2017年5月3日

What's Behind Azure Web Apps ? Just a Black Box ?

2017年1月7日

Consuming Azure Storage Python SDK using PTVS

2017年1月7日

Azure Storage – Blobs and Types of Blobs

2017年1月4日

Single Click deployment of CodeIgniter with MYSQL on Azure

2017年1月3日

社区洞察

其他会员也浏览了

ChatGPT vs Gemini; Uncertainty Quantification in GenAI; GPT-4 vs. GPT-4V vs. Humans On Abstraction and Reasoning; Private vs Public LLMs; and More.

The Dawn of Affordable Intelligence: GPT-4o mini Reshapes the AI Landscape

Voxel51 Filtered Views Newsletter - August 23, 2024

Why Do We Need Neuro-symbolic AI to Model Pragmatic Analogies?

Building vs. Utilizing Existing Large Language Models (LLMs): Considerations for Use Cases and Bias Mitigation

The Future of AI: The Advancements on the Horizon

How OpenAI's New Model o1's Enhanced Reasoning Capabilities Propel Compound AI Systems to New Levels

Explore the Future with Gen AI: Your Weekly Passport to Innovation!

Bicity AI Generates High-Quality, Entirely Automated Texts, and Articles Using A Cutting-Edge AI Model.

Unlocking the Potential of Multi-modal Intelligence: Introducing Google's Gemini - Part 1