AI This Week: Microsoft’s Phi-3 Models, Copilot+ PCs, and Cutting-Edge AI Trends

AI This Week: Microsoft’s Phi-3 Models, Copilot+ PCs, and Cutting-Edge AI Trends

TOP NEWS

Industry

Microsoft unveils new Phi-3 models, GitHub Copilot updates, AI PCs, and more

Let's go through each of the releases!

Small Language Models

Microsoft aims to lead the Small Language Model race with new additions to the Phi-3 family:

  • Phi-3-vision: The first 4.2B multimodal model of the Phi family.
  • Phi-3-small: A 7B parameter language model that competes with models twice its size.
  • New Phi-3 model: A 14B parameter model trained on 4.8T tokens, achieving an MMLU of 78, comparable to Llama3 70B.
  • Phi-Silica: Included in every Copilot+PC, designed for NPUs with 650 tokens/sec prompt processing.

Microsoft Copilots and GitHub Copilot

New updates and features for the Copilots family:

  • GitHub Copilot Extensions: Customizable Copilot experience with services like Azure, Docker, or Sentry within GitHub Copilot Chat.
  • Team Copilot: Expands Copilot from a personal AI assistant to a collaborative team member, integrating with Microsoft 365 apps like Teams, Loop, and Planner.
  • Copilot Studio Agents: New capabilities for building proactive Copilots that can manage complex business processes independently.

New Copilot+ PCs

Introducing a new category of Windows PCs designed for AI:

  • Windows Copilot library and Phi-Silica included in every Copilot+PC.
  • PyTorch and the new Web Neural Network now native on Windows.
  • Powerful new silicon: Capable of 40+ TOPS.
  • AI tools like Recall, Cocreator, Live Captions, and more.
  • 20x more powerful and up to 100x as efficient for running AI workloads.

Other Announcements

  • Real Time intelligence in Fabric app on Azure AI-powered analytics platform, now in preview.
  • New partnerships with AMD, Khan Academy, and Cognition AI.
  • Access: Phi-3-vision on Azure AI platform and HuggingFace. GitHub Copilot Extensions available on GitHub.com, Visual Studio, and VS Code.

TRENDING SIGNALS

TOP REPOS

Crowd-Sourcing

Fabric

Fabric is an open-source framework for augmenting humans using AI. It provides a modular framework for solving specific problems using a crowdsourced set of AI prompts that can be used anywhere.

Multimodal Models

Lumina-T2X

Lumina-T2X is a series of text-conditioned Diffusion Transformers capable of transforming textual descriptions into images, videos, detailed multi-view 3D images, and synthesized speech. This family of models allows generation of output in any modality, resolution, and duration within one framework, requiring low training resources.

RAG

Verba

Verba is Weaviate's open-source application designed to offer an end-to-end, streamlined, and user-friendly interface for Retrieval-Augmented Generation (RAG) out of the box. You can run Verba either locally with HuggingFace and Ollama or through LLM providers such as OpenAI, Cohere, and Google. It also supports customizable RAG frameworks, data types, chunking, and retrieving techniques.

Language Models

LLaMa 3 Implemented From Scratch in Python

A new repository offers a detailed implementation of Llama3, covering:

  • Tokenization Process: Uses tiktoken library.
  • Attention Mechanism: Detailed explanation of weights and splitting query vectors.
  • Multi-Head Attention Operations: Includes matrix multiplication and masking future tokens.
  • Feedforward Network: Uses SwiGLU for processing embeddings.
  • Visualizations and Practical Examples: Heatmaps for attention scores and more.

TOP PAPERS

Text-to-Image

Chameleon: Mixed-Modal Early-Fusion Foundation Models

Chameleon is a family of early-fusion, token-based mixed-modal models that can understand and generate images and text in any sequence. Its fully token-based architecture allows for seamless information integration across modalities, achieving state-of-the-art performance on image captioning and visual QA benchmarks.

3D Generation

CAT3D: Create Anything in 3D with Multi-View Diffusion Models

CAT3D uses a multi-view diffusion model to generate multiple output views given one or more input views and camera poses. It outperforms prior work on few-view and single-image 3D reconstruction benchmarks and can generate high-quality 3D content in as little as one minute.

Language Models

LoRA Learns Less and Forgets Less

This study compares LoRA and full finetuning on programming and mathematics, showing that while LoRA underperforms full finetuning in target domains, it better preserves base model performance and offers stronger regularization.


Subscribe to Newsletter: https://lnkd.in/guxfrUSM

要查看或添加评论,请登录

社区洞察

其他会员也浏览了