Ollama with LangChain for Local LLMs Applications
Rany ElHousieny, PhD???
SENIOR SOFTWARE ENGINEERING MANAGER (EX-Microsoft) | Generative AI / LLM / ML / AI Engineering Manager | AWS SOLUTIONS ARCHITECT CERTIFIED? | LLM and Machine Learning Engineer | AI Architect
Ollama is a versatile platform for running and interacting with large language models (LLMs) like Llama, Gemma, Phi, Zypher, Code Llama, and many more. It allows users to pull, run, and create models easily on local machines, ensuring privacy and control over data. With compatibility for macOS, Linux, and Windows, Ollama also supports API functionalities that align with OpenAI standards, allowing for seamless integration and usage of various tools and applications locally.
For Python developers, Ollama provides a library that can be installed with a simple pip command. The library enables straightforward interaction with models for tasks like chat completions, text generation, and even handling multimodal inputs such as images. This flexibility is showcased in the ability to handle streaming data, use different models for specific tasks, and create custom models tailored to unique requirements.
!pip install ollama
Ollama also integrates with LangChain, allowing developers to build complex applications such as retrieval augmented generation (RAG). These applications leverage embedding models to create vector embeddings from texts, which can be used to retrieve and generate relevant responses based on the input queries. This capability is particularly useful for building sophisticated AI-powered search and response systems.
Ollama Installation
Go to https://ollama.com/ and follow the steps below:
Unzip the downloaded file and click on Ollama
Open a terminal and run the command
ollama run gemma
pulling manifest
pulling ef311de6af9d... 10% ▕█ ▏ 525 MB/5.0 GB 37 MB/s 1m58s
It will pull the LLM image first and then give you a prompt to chat with the LLM
ollama run gemma
pulling manifest
pulling ef311de6af9d... 100% ▕███████████████████▏ 5.0 GB
pulling 097a36493f71... 100% ▕███████████████████▏ 8.4 KB
pulling 109037bec39c... 100% ▕███████████████████▏ 136 B
pulling 65bb16cf5983... 100% ▕███████████████████▏ 109 B
pulling 0c2a5137eb3c... 100% ▕███████████████████▏ 483 B
verifying sha256 digest
writing manifest
removing any unused layers
success
>>> Send a message (/? for help)
You can get the list of downloaded models with the command ollama list
ollama list
NAME ID SIZE MODIFIED
gemma:latest a72c7f4d0a15 5.0 GB 52 minutes ago
llama2:latest 78e26419b446 3.8 GB 25 hours ago
Python Example:
!pip install ollama
领英推荐
import ollama
response = ollama.chat(model='gemma', messages=[
{
'role': 'user',
'content': 'Why is the sky blue?',
},
])
print(response['message']['content'])
The sky is blue due to a phenomenon called **Rayleigh scattering**.
* Sunlight is composed of all the colors of the rainbow, each with a specific wavelength.
* When sunlight interacts with molecules in the atmosphere, like nitrogen and oxygen, the molecules scatter the light in all directions.
* Different wavelengths of light are scattered differently.
**How it works:**
- Shorter wavelengths of light (like blue light) scatter more efficiently than longer wavelengths (like red light).
- Since the molecules in the atmosphere are much smaller than the wavelengths of visible light, they preferentially scatter the shorter wavelengths in all directions.
- The scattered blue light is dispersed in all directions, but our eyes are primarily facing upwards, so we see more blue light from the sky than any other color.
**Additional factors:**
- The amount of sunlight that reaches the Earth's surface also affects the color of the sky. The closer you are to the equator, the more direct sunlight you receive, resulting in a slightly whiter sky.
- Clouds and dust in the atmosphere can also scatter light and change the color of the sky.
**Result:**
- The combination of Rayleigh scattering and other factors results in the sky appearing predominantly blue during clear weather conditions.
Ollama with LangChain
!pip install langchain-community
from langchain_community.llms import Ollama
llm = Ollama(model='gemma')
llm.invoke('tell me a joke?')
'What did the ocean say to the beach?\n\nNothing, it just waved!'
You can also print the response as follows:
Let's load Llama2 and see if it has better jokes
llm = Ollama(model='llama2')
print(llm.invoke('tell me a joke?'))
Sure, here's one:
Why don't scientists trust atoms?
Because they make up everything!
I hope that brought a smile to your face! Do you want to hear another one?
You can check all the available models at ollama.com then click Models
Let's try Mixtral
From the command terminal, write:
Rany ~ >
Llama 3.1
Meta AI has unveiled Llama 3.1, their most capable AI model to date. This new release includes the flagship Llama 3.1 405B, an open-source model that rivals top proprietary models in general knowledge, tool use, and multilingual translation.
Download Llama3.1 405B
Ollama pull llama3.1:405b
After that, you can invoke as any other model
AI and GenAI, Data Science, Machine Learning, and Data Engineering:Teach, Train, Write and Learn.
1 个月For most users, especially those without access to enterprise-level hardware, leveraging cloud-based solutions or using smaller, more manageable models like Llama 3.1 70B is recommended. These smaller models offer a balance between performance and resource requirements, making them suitable for a wider range of applications while still delivering impressive capabilities
AI and GenAI, Data Science, Machine Learning, and Data Engineering:Teach, Train, Write and Learn.
1 个月However, quantized versions of the model can reduce the hardware burden. For instance, quantizing to lower precision (like 4-bit) significantly decreases memory and compute requirements, making it more accessible on less powerful hardware. Even then, running a quantized version of Llama 3.1 405B on a Mac M1 Pro is still impractical, but smaller models like Llama 3.1 8B or 70B in a quantized form might be viable options for consumer-grade devices.
AI and GenAI, Data Science, Machine Learning, and Data Engineering:Teach, Train, Write and Learn.
1 个月Hello, thanks for the informative article, including the latest Meta's Llama3.1 405B model. But one thing we need to add is the hardware requirements to run such huge models, in fact even a 4-bit quantized version can not be run on powerful Macs like M1/M2 Pros. Here are some details about the hardware ware requirements to run the 405B model: Running Meta's Llama 3.1 405B model requires substantial hardware resources due to its massive size and complexity. Here are the key specifications: 1. Storage: Approximately 820 GB of storage space is needed. 2. RAM: At least 1 TB of RAM is required to load the model into memory. 3. GPU: Multiple high-end GPUs, preferably NVIDIA A100 or H100 series, are necessary. 4. VRAM: A total of at least 640 GB of VRAM across all GPUs is essential for handling the model efficiently. Given these requirements, running Llama 3.1 405B on consumer-grade hardware, such as a Mac M1 Pro laptop, is not feasible. The hardware limitations of consumer devices, particularly in terms of RAM and GPU capacity, make them unsuitable for such a large-scale model.