Ollama with LangChain for Local LLMs Applications

Ollama with LangChain for Local LLMs Applications

Ollama is a versatile platform for running and interacting with large language models (LLMs) like Llama, Gemma, Phi, Zypher, Code Llama, and many more. It allows users to pull, run, and create models easily on local machines, ensuring privacy and control over data. With compatibility for macOS, Linux, and Windows, Ollama also supports API functionalities that align with OpenAI standards, allowing for seamless integration and usage of various tools and applications locally.

For Python developers, Ollama provides a library that can be installed with a simple pip command. The library enables straightforward interaction with models for tasks like chat completions, text generation, and even handling multimodal inputs such as images. This flexibility is showcased in the ability to handle streaming data, use different models for specific tasks, and create custom models tailored to unique requirements.

!pip install ollama        


Ollama also integrates with LangChain, allowing developers to build complex applications such as retrieval augmented generation (RAG). These applications leverage embedding models to create vector embeddings from texts, which can be used to retrieve and generate relevant responses based on the input queries. This capability is particularly useful for building sophisticated AI-powered search and response systems.


Ollama Installation

Go to https://ollama.com/ and follow the steps below:



Unzip the downloaded file and click on Ollama







Open a terminal and run the command

ollama run gemma        
pulling manifest 
pulling ef311de6af9d...  10% ▕█                  ▏ 525 MB/5.0 GB   37 MB/s   1m58s        

It will pull the LLM image first and then give you a prompt to chat with the LLM

ollama run gemma 
pulling manifest 
pulling ef311de6af9d... 100% ▕███████████████████▏ 5.0 GB                         
pulling 097a36493f71... 100% ▕███████████████████▏ 8.4 KB                         
pulling 109037bec39c... 100% ▕███████████████████▏  136 B                         
pulling 65bb16cf5983... 100% ▕███████████████████▏  109 B                         
pulling 0c2a5137eb3c... 100% ▕███████████████████▏  483 B                         
verifying sha256 digest 
writing manifest 
removing any unused layers 
success 
>>> Send a message (/? for help)        

You can get the list of downloaded models with the command ollama list

ollama list
NAME         	ID          	SIZE  	MODIFIED       
gemma:latest 	a72c7f4d0a15	5.0 GB	52 minutes ago	
llama2:latest	78e26419b446	3.8 GB	25 hours ago          


Python Example:

!pip install ollama
        
import ollama
response = ollama.chat(model='gemma', messages=[
  {
    'role': 'user',
    'content': 'Why is the sky blue?',
  },
])
print(response['message']['content'])        
The sky is blue due to a phenomenon called **Rayleigh scattering**. 

* Sunlight is composed of all the colors of the rainbow, each with a specific wavelength. 
* When sunlight interacts with molecules in the atmosphere, like nitrogen and oxygen, the molecules scatter the light in all directions. 
* Different wavelengths of light are scattered differently. 

**How it works:**

- Shorter wavelengths of light (like blue light) scatter more efficiently than longer wavelengths (like red light).
- Since the molecules in the atmosphere are much smaller than the wavelengths of visible light, they preferentially scatter the shorter wavelengths in all directions.
- The scattered blue light is dispersed in all directions, but our eyes are primarily facing upwards, so we see more blue light from the sky than any other color.

**Additional factors:**

- The amount of sunlight that reaches the Earth's surface also affects the color of the sky. The closer you are to the equator, the more direct sunlight you receive, resulting in a slightly whiter sky.
- Clouds and dust in the atmosphere can also scatter light and change the color of the sky.

**Result:**

- The combination of Rayleigh scattering and other factors results in the sky appearing predominantly blue during clear weather conditions.        

Ollama with LangChain

!pip install langchain-community        


from langchain_community.llms import Ollama

llm = Ollama(model='gemma')        
llm.invoke('tell me a joke?')        
'What did the ocean say to the beach?\n\nNothing, it just waved!'        

You can also print the response as follows:

Let's load Llama2 and see if it has better jokes

llm = Ollama(model='llama2')        
print(llm.invoke('tell me a joke?'))        
Sure, here's one:

Why don't scientists trust atoms?
Because they make up everything!

I hope that brought a smile to your face! Do you want to hear another one?        

You can check all the available models at ollama.com then click Models

Let's try Mixtral


From the command terminal, write:

Rany ~ >         


Llama 3.1

Meta AI has unveiled Llama 3.1, their most capable AI model to date. This new release includes the flagship Llama 3.1 405B, an open-source model that rivals top proprietary models in general knowledge, tool use, and multilingual translation.

Download Llama3.1 405B


library Get up and running with large language models.ollama.com


Ollama pull llama3.1:405b        

After that, you can invoke as any other model


KV Subbaiah Setty

AI and GenAI, Data Science, Machine Learning, and Data Engineering:Teach, Train, Write and Learn.

1 个月

For most users, especially those without access to enterprise-level hardware, leveraging cloud-based solutions or using smaller, more manageable models like Llama 3.1 70B is recommended. These smaller models offer a balance between performance and resource requirements, making them suitable for a wider range of applications while still delivering impressive capabilities

KV Subbaiah Setty

AI and GenAI, Data Science, Machine Learning, and Data Engineering:Teach, Train, Write and Learn.

1 个月

However, quantized versions of the model can reduce the hardware burden. For instance, quantizing to lower precision (like 4-bit) significantly decreases memory and compute requirements, making it more accessible on less powerful hardware. Even then, running a quantized version of Llama 3.1 405B on a Mac M1 Pro is still impractical, but smaller models like Llama 3.1 8B or 70B in a quantized form might be viable options for consumer-grade devices.

KV Subbaiah Setty

AI and GenAI, Data Science, Machine Learning, and Data Engineering:Teach, Train, Write and Learn.

1 个月

Hello, thanks for the informative article, including the latest Meta's Llama3.1 405B model. But one thing we need to add is the hardware requirements to run such huge models, in fact even a 4-bit quantized version can not be run on powerful Macs like M1/M2 Pros. Here are some details about the hardware ware requirements to run the 405B model: Running Meta's Llama 3.1 405B model requires substantial hardware resources due to its massive size and complexity. Here are the key specifications: 1. Storage: Approximately 820 GB of storage space is needed. 2. RAM: At least 1 TB of RAM is required to load the model into memory. 3. GPU: Multiple high-end GPUs, preferably NVIDIA A100 or H100 series, are necessary. 4. VRAM: A total of at least 640 GB of VRAM across all GPUs is essential for handling the model efficiently. Given these requirements, running Llama 3.1 405B on consumer-grade hardware, such as a Mac M1 Pro laptop, is not feasible. The hardware limitations of consumer devices, particularly in terms of RAM and GPU capacity, make them unsuitable for such a large-scale model.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了