Microsoft Releases the Phi-3.5 Family of Small Language Models

Microsoft Releases the Phi-3.5 Family of Small Language Models

Microsoft has recently announced the release of the Phi-3.5 family of models, which includes the Phi-3.5-vision, Phi-3.5-MoE, and Phi-3.5-mini models. These models are designed to offer lightweight, state-of-the-art solutions for various AI applications.

Phi-3.5-MoE: Mixture of Experts Technology

The Phi-3.5-MoE model is the first in the Phi family to leverage Mixture of Experts (MoE) technology. This 16 x 3.8B MoE model activates only 6.6B parameters with 2 experts and was trained on 4.9T tokens using 512 H100s.

Benchmark Results:

  • Language Understanding: Phi-3.5-MoE (85.1%), Gemini 1.5 Flash (83.2%), GPT-4o-mini (86.3%)
  • Math and Logic: Phi-3.5-MoE (87.5%), Gemini 1.5 Flash (85.1%), GPT-4o-mini (88.2%)

The Phi-3.5-MoE model demonstrates strong performance in language understanding and math and logic tasks, making it a versatile tool for a range of applications.

Phi-3.5-mini: Lightweight and Powerful

The Phi-3.5-mini is a 3.8B parameter model that was trained on 3.4T tokens using 512 H100s.

Benchmark Results:

  • Common Sense Reasoning: Phi-3.5-mini (74.2%), Llama-3.1 8B (72.1%), Mistral 7B (70.5%), Mistral NeMo 12B (75.6%)
  • Logical Reasoning: Phi-3.5-mini (83.5%), Llama-3.1 8B (81.2%), Mistral 7B (79.5%), Mistral NeMo 12B (84.2%)

The Phi-3.5-mini model is a lightweight and powerful solution for common sense reasoning and logical reasoning tasks, making it suitable for applications where computational resources are limited.

Phi-3.5-vision: Enhanced Multi-Frame Image Understanding

The Phi-3.5-vision is a 4.2B parameter model trained on 500B tokens using 256 A100 GPUs.

Benchmark Results:

  • Multi-Frame Image Understanding: Phi-3.5-vision (82.1%), GPT-4o-mini (80.5%)
  • Optical Character Recognition (OCR): Phi-3.5-vision (95.6%), GPT-4o-mini (94.2%)
  • Chart and Table Understanding: Phi-3.5-vision (88.5%), GPT-4o-mini (86.3%)
  • Multiple Image Comparison: Phi-3.5-vision (85.2%), GPT-4o-mini (83.5%)
  • Video Summarization: Phi-3.5-vision (83.8%), GPT-4o-mini (82.1%)

The Phi-3.5-vision model demonstrates strong performance in multi-frame image understanding, OCR, chart and table understanding, multiple image comparison, and video summarization tasks, making it a versatile tool for a range of computer vision applications.

Key Features and Applications

  • Lightweight Design: All models are built upon synthetic data and filtered publicly available websites and support a 128K token context length.
  • Multilingual Capabilities: Phi-3.5-mini and Phi-3.5-MoE offer strong multilingual support, making them versatile for global applications.
  • Fine-Tuning: These models can be fine-tuned on custom datasets using tools like UNSloth, enhancing their performance for specific tasks.
  • Installation and Testing: Tutorials are available for local installation and testing of these models, making them accessible for developers and researchers.

Conclusion

The Phi-3.5 family of models provides a range of capabilities from text-based tasks to multimodal applications. Their lightweight design and high-quality performance make them suitable for various use cases. These models can be further enhanced through fine-tuning on custom datasets, making them versatile tools for AI engineers and developers.


If you found this article informative and valuable, consider sharing it with your network to help others discover the power of AI.


要查看或添加评论,请登录

Robyn Le Sueur的更多文章

社区洞察

其他会员也浏览了