登录查看更多内容

llamafile enables on-device large language models without the cloud

Jim Liddle

Chief Innovation Officer Data Intelligence and AI

发布日期: 2023年12月14日

Mozilla has released an open source project called llamafile that allows users who wish to easily experiment with AI and LLM's, to take a large language model file and turn it into an executable binary to run locally. This makes it much easier to distribute and run language models without needing to install them or rely on cloud platforms.

Llamafile combines Mozilla's llama.cpp language model framework with Cosmopolitan Libc, an existing project for creating portable C programs. Through llamafile, users can take a 4GB neural network file in the standard GGUF format and transform it into a program that will run on six operating systems without installation.

This solves issues around distributing sizeable language model files and ensures models remain usable even as formats change over time. The llamafile project is the result of recent collaboration between Mozilla's innovation group and developer Justine Tunney.

Llamafile is released under the Apache 2.0 open source license to encourage contributions from the community. Mozilla hopes it will be a useful tool for running language models locally rather than through cloud platforms.

To demonstrate this we start by downloading a model from HuggingFace (from Justin's repo). This model is the LLava 1.5 model.

wget -N https://huggingface.co/jartine/llava-v1.5-7B-GGUF/resolve/main/llava-v1.5-7b-q4-server.llamafile

Once that is done the steps are relatively simple. You need to give the downloaded file executable permissions:

chmod 755 ./llava-v1.5-7b-q4-server.llamafile

and then you run it.

./llava-v1.5-7b-q4-server.llamafile

At this point it will automatically start a web server and open your browser so you can interact with the model:

领英推荐

OpenAI Hype Cycle

AIM 2 年前

Issue #37: Free Courses, Migrating to .NET 9, SOLID…

Progress Telerik and Kendo UI 1 个月前

Introducing Gemma: New Open Source Model from Google…

Clarifai 1 年前

Also if you download this file on windows, you can simply add the .exe extension to the end of the downloaded file and double click to run it (although you may need to click through Windows SmartScreen) and it will exhibit the same behavior ie. run the model, auto-start a web server, and launch the browser to display a web page to interact with the model through prompts.

The LlLva model is multi-modal so it can also enable you to interact with images:

The llamfile repository has detailed instructions laying out how to make a model file executable in the llamafile format.

The self-contained nature of llamafiles and the ability to, for example, even run off USB drives, means models can be easily shared and transported even in airgapped environments.

In packaging models along with a simple browser-based UI, llamafiles could make deploying localized AI within companies much easier and more consumable for non-experts. Users don't need to know Python or APIs to interact with models.

Also, with the responsive web UI, changes to prompts and parameters can be tested quickly without needing new deployments. This benefits model development and tuning, and because llamafiles include everything needed in one package and run via web browsers, they can work across Windows, Mac, Linux etc. This vastly reduces platform dependencies required to run the models.

By containerizing both models and interfaces together, llamafiles could reduce, ins some use cases, the need for expensive model hosting/serving infrastructure. Models run locally.

You can find more models by following the links in the llamfile readme.

Roman Gelembjuk

Team Lead Software Developer

1 年

Nice. Time to build own ChatGPT on the home desktop!

要查看或添加评论，请登录

Jim Liddle的更多文章

AI and Creative Rights: Finding Balance in the Age of AI

2025年2月25日

AI and Creative Rights: Finding Balance in the Age of AI

In a show of unity, the UK's creative industries are pushing back against UK government proposals that would allow AI…
PrivateLLMLens: A Zero-Server Web Interface for Local Ollama LLMs

2025年2月16日

PrivateLLMLens: A Zero-Server Web Interface for Local Ollama LLMs

I've been working on and off in my spare time on a simplistic web interface for Ollama. I've written about Ollama…

1 条评论
The Emperor Has No Clothes: Security In The Age Of Deepfakes

2025年2月13日

The Emperor Has No Clothes: Security In The Age Of Deepfakes

In the realm of cybersecurity, we're living through our own version of Hans Christian Andersen's "The Emperor's New…

1 条评论
2025 Enterprise AI Predictions

2025年1月6日

2025 Enterprise AI Predictions

AI is moving so fast it can make any predictions look (very) foolish in hindsight but nevertheless here is my stake in…
Local AI vision processing with Ollama and Meta's Llama vision model

2024年11月11日

Local AI vision processing with Ollama and Meta's Llama vision model

Ollama now supports Meta's 3.2 vision 11b and 90b vision models which means that it can potentially facilitate some…
The 'What Not How' Revolution: How Generative AI is Redefining Technical Expertise

2024年11月8日

The 'What Not How' Revolution: How Generative AI is Redefining Technical Expertise

In the world of enterprise software development, we're bearing witness to a seismic shift that few could have predicted…

6 条评论
What exactly do we mean by LLM Reasoning ?

2024年10月15日

What exactly do we mean by LLM Reasoning ?

My ex-colleague Markus Warg kindly sent me an X thread on a recent paper by Apple that provides compelling evidence…

4 条评论
Unlocking Enterprise Image Descriptions: Harnessing the Power of Nasuni's S3 Edge API and LLMs for Precision Naming

2024年8月29日

Unlocking Enterprise Image Descriptions: Harnessing the Power of Nasuni's S3 Edge API and LLMs for Precision Naming

In a prior article I showed how we could leverage Llamafile to use a local LLM to rename images with descriptive names…
Elastic Serverless, Search, RAG & Nasuni Oh My !

2024年7月5日

Elastic Serverless, Search, RAG & Nasuni Oh My !

I thought it would be fun to take a look at Elastic Serverless in the context of indexing a small Nasuni files…

2 条评论
Tackling a small AI use case with a local multi-modal LLM

2024年6月26日

Tackling a small AI use case with a local multi-modal LLM

For the enterprise AI is really all about business process automation. Local LLM’s can be really useful to automate…

5 条评论

See all articles

llamafile enables on-device large language models without the cloud

Jim Liddle

Chief Innovation Officer Data Intelligence and AI

领英推荐

Jim Liddle的更多文章

社区洞察

其他会员也浏览了

How to measure language model performance

Meta Activates Code Llama

Turning browsers into smart agents with GPT + ARIA

Retrieval-Augmented Generation (RAG) Application using AWS Bedrock, Titan Model, and the FastAPI

Enhancing Web and Mobile Apps with Google APIs

DataPanthy #78

A Guide To Integrating Pythia With AWS Bedrock

Mistral Launches Codestral Mamba and Mathstral for Enhanced AI Capabilities

Latest Advancements in RAG Every Developer Should Know!

Part III: Getting Started with ollama

领英推荐

Jim Liddle的更多文章

AI and Creative Rights: Finding Balance in the Age of AI

PrivateLLMLens: A Zero-Server Web Interface for Local Ollama LLMs

The Emperor Has No Clothes: Security In The Age Of Deepfakes

2025 Enterprise AI Predictions

Local AI vision processing with Ollama and Meta's Llama vision model

The 'What Not How' Revolution: How Generative AI is Redefining Technical Expertise

What exactly do we mean by LLM Reasoning ?

Unlocking Enterprise Image Descriptions: Harnessing the Power of Nasuni's S3 Edge API and LLMs for Precision Naming

Elastic Serverless, Search, RAG & Nasuni Oh My !

Tackling a small AI use case with a local multi-modal LLM

社区洞察

其他会员也浏览了

How to measure language model performance

Meta Activates Code Llama

Turning browsers into smart agents with GPT + ARIA

Retrieval-Augmented Generation (RAG) Application using AWS Bedrock, Titan Model, and the FastAPI

Enhancing Web and Mobile Apps with Google APIs

DataPanthy #78

A Guide To Integrating Pythia With AWS Bedrock

Mistral Launches Codestral Mamba and Mathstral for Enhanced AI Capabilities

Latest Advancements in RAG Every Developer Should Know!

Part III: Getting Started with ollama