Image Analysis using LLava 7B (running on a laptop)
Illustration generated by ChatGPT

Image Analysis using LLava 7B (running on a laptop)

The ability to analyze and describe images is increasingly important for businesses and developers. Some of the latest local LLMs, which you can run on a Mac or a PC with an NVIDIA GPU, now support image analysis.

While local image analysis with a 7B model isn't as powerful or accurate as high-end models, it's nevertheless impressive — it's running quite well on my 3-year-old laptop (or is it 4?).

Setting Up Ollama for Image Analysis

1. Download and install Ollama:

Download Ollama from ollama.com . After installing, run the following command in the command line to set up the LLava model:

ollama run llava:7b        

2. Create a Python virtual environment:

With Python installed, set up a virtual environment with Ollama support:

python -m venv vision
cd vision
Scripts/activate
pip install ollama        

3. Write the script vision.py:

Create a Python script vision.py with the following content:

import ollama
res = ollama.chat(
	model="llava:7b",
	messages=[
		{
			'role': 'user',
			'content': 'Describe this image:', # or any prompt regarding the image
			'images': ['./image.jpg']
		}
	]
)
print(res['message']['content'])        

4. Run the script:

To execute the script, simply run:

python vision.py        

Test Image 1: A Mandrill

https://www.bristol.ac.uk/media-library/sites/news/2023/september/mandrill-article.jpg

The image shows a monkey standing on what appears to be a rocky or concrete surface. The monkey has a striking blue and pink face with a large, prominent nose. Its fur is predominantly gray, with some white markings around its head and limbs. The animal's posture suggests it is comfortable in its environment. In the background, there is an enclosure structure made of concrete or stone, which indicates that the monkey is likely in captivity, such as a zoo or wildlife sanctuary. There are no texts visible on the image.


Test Image 2: A Family Gathering

https://stock.adobe.com/ca/images/multi-generation-multiethnic-family-at-home-together-with-grandparents/423742605

This is a color photograph of a group of people, who appear to be a family celebrating an occasion. The central figure seems to be an older gentleman in a wheelchair, surrounded by children and adults, all waving and smiling at the camera. The atmosphere suggests warmth and joy. They are gathered in what looks like a home setting with a white wall in the background and a wooden floor. The lighting is bright, creating a cheerful ambiance.


Test Image 3: Abstract Illustration

https://preview.redd.it/purple-reverie-v0-3zavpofbt8mc1.png?auto=webp&s=e17962923319644ca83ee78c7e290c3da8a8cb71

The image depicts a vibrant and serene digital illustration. At the center, there is a large, transparent sphere floating in space. This sphere contains an array of colorful flowers with blue and purple hues, which are reflected on its surface. These flowers create a mesmerizing pattern within the sphere.

The background consists of a cosmic setting with stars, nebulae, and clouds scattered throughout. The colors in the background transition from shades of purple, pink, and orange to deeper blues, creating a sense of depth and vastness.

Below the floating sphere, there is a field of flowers blooming on a grassy terrain. These flowers are in full bloom with green stems and white petals, complementing the colors seen within the sphere. The ground appears lush and well-maintained, enhancing the overall natural aesthetic of the image.

The style of the image is digital art, with a focus on color, texture, and an otherworldly, fantasy-like atmosphere.


Test Image 4: Robert Doisneau's "Le Baiser de l'H?tel de Ville"

"Le Baiser de l'H?tel de Ville" - Robert Doisneau

The image is a composite or collage, featuring various elements that might suggest historical significance. In the center, there is a black and white photograph of two individuals embracing, with one person's face visible and both in what appears to be formal attire, including ties. Surrounding this central photo are several smaller images capturing different moments or scenes:

1. The top left corner shows a section of a cityscape with buildings that have a European architectural style, possibly indicating a location like Paris given the presence of a Haussmannian building in the background.

2. Below the central image, on the right side, there is another black and white photo of two individuals shaking hands, both appearing to be wearing military-style uniforms with insignias that might suggest a World War II era, given their style and the emblematic elements such as helmets and epaulettes.

3. The bottom left corner features a person walking on a street lined with lampposts, with various individuals dressed in more contemporary clothing.

4. Lastly, the lower right corner shows another black and white photograph of two people standing close to each other, with both wearing ties, suggesting a formal occasion or event.

The overall composition gives a sense of historical narrative, possibly telling a story that involves different moments captured over time. The collage could be used as a visual aid for a history lesson or as part of an art installation meant to evoke past events or periods.


Wrap-Up

Using Ollama's LLava 7B model for image analysis provides a practical and cost-effective solution for a variety of applications. Although not on par with larger models like GPT-4V, LLava 7B offers surprisingly good results, making it ideal for prototyping, non-critical tasks, or generic image analysis. With easy installation and implementation (doesn't get much easier than this...), this approach offers an accessible entry point for developers and businesses seeking to integrate image recognition capabilities into their projects.

Mohammed Alzahrani

Interested in research, monitoring, and investigation of everything related to the Earth, the Earth’s atmosphere, and the links with the universe, the hourglass

5 个月

Nice

回复

Impressive to see the LLava 7B in action on a laptop – the potential for on-the-go image analysis could be a game-changer for many industries!

要查看或添加评论,请登录

Pedro Velez的更多文章

  • Gerador de conteúdos com Llama3 (a correr num laptop).

    Gerador de conteúdos com Llama3 (a correr num laptop).

    Estou de volta aos meus artigos sobre IA, desta vez para mostrar um exemplo de como usar um llm local para gerar…

    3 条评论
  • The Role of YOLO in Brand Audit Challenges: Enhancing Analysis with OpenAI's API

    The Role of YOLO in Brand Audit Challenges: Enhancing Analysis with OpenAI's API

    Retail is brutally dynamic and competitive. For some brands, audits are essential for understanding product placement…

    1 条评论
  • The Mad Machines Disease

    The Mad Machines Disease

    In the 1980s, England was the stage for an invisible tragedy: cows falling victim to bovine spongiform encephalopathy…

  • The Strategic Power of Email Sentiment Analysis

    The Strategic Power of Email Sentiment Analysis

    Emails: the de facto communication tool for business. But beyond the structured data — the numbers, the dates, the…

  • O Poder da Linguagem

    O Poder da Linguagem

    O Poder da Linguagem Fernando Pessoa disse uma vez que "Há um tempo em que é preciso abandonar as roupas usadas, que já…

  • Will Machines Dream?

    Will Machines Dream?

    Introduction Your personal data increasingly feels beyond your control. Corporate servers and government databases…

社区洞察

其他会员也浏览了