llamafile enables on-device large language models without the cloud
Mozilla has released an open source project called llamafile that allows users who wish to easily experiment with AI and LLM's, to take a large language model file and turn it into an executable binary to run locally. This makes it much easier to distribute and run language models without needing to install them or rely on cloud platforms.
Llamafile combines Mozilla's llama.cpp language model framework with Cosmopolitan Libc, an existing project for creating portable C programs. Through llamafile, users can take a 4GB neural network file in the standard GGUF format and transform it into a program that will run on six operating systems without installation.
This solves issues around distributing sizeable language model files and ensures models remain usable even as formats change over time. The llamafile project is the result of recent collaboration between Mozilla's innovation group and developer Justine Tunney.
Llamafile is released under the Apache 2.0 open source license to encourage contributions from the community. Mozilla hopes it will be a useful tool for running language models locally rather than through cloud platforms.
To demonstrate this we start by downloading a model from HuggingFace (from Justin's repo). This model is the LLava 1.5 model.
wget -N https://huggingface.co/jartine/llava-v1.5-7B-GGUF/resolve/main/llava-v1.5-7b-q4-server.llamafile
Once that is done the steps are relatively simple. You need to give the downloaded file executable permissions:
chmod 755 ./llava-v1.5-7b-q4-server.llamafile
and then you run it.
./llava-v1.5-7b-q4-server.llamafile
At this point it will automatically start a web server and open your browser so you can interact with the model:
领英推荐
Also if you download this file on windows, you can simply add the .exe extension to the end of the downloaded file and double click to run it (although you may need to click through Windows SmartScreen) and it will exhibit the same behavior ie. run the model, auto-start a web server, and launch the browser to display a web page to interact with the model through prompts.
The LlLva model is multi-modal so it can also enable you to interact with images:
The llamfile repository has detailed instructions laying out how to make a model file executable in the llamafile format.
The self-contained nature of llamafiles and the ability to, for example, even run off USB drives, means models can be easily shared and transported even in airgapped environments.
In packaging models along with a simple browser-based UI, llamafiles could make deploying localized AI within companies much easier and more consumable for non-experts. Users don't need to know Python or APIs to interact with models.
Also, with the responsive web UI, changes to prompts and parameters can be tested quickly without needing new deployments. This benefits model development and tuning, and because llamafiles include everything needed in one package and run via web browsers, they can work across Windows, Mac, Linux etc. This vastly reduces platform dependencies required to run the models.
By containerizing both models and interfaces together, llamafiles could reduce, ins some use cases, the need for expensive model hosting/serving infrastructure. Models run locally.
You can find more models by following the links in the llamfile readme.
Team Lead Software Developer
1 年Nice. Time to build own ChatGPT on the home desktop!