Image Analysis using LLava 7B (running on a laptop)
Pedro Velez
AI Advocate | Driving Innovation and Efficiency in Businesses through AI
The ability to analyze and describe images is increasingly important for businesses and developers. Some of the latest local LLMs, which you can run on a Mac or a PC with an NVIDIA GPU, now support image analysis.
While local image analysis with a 7B model isn't as powerful or accurate as high-end models, it's nevertheless impressive — it's running quite well on my 3-year-old laptop (or is it 4?).
Setting Up Ollama for Image Analysis
1. Download and install Ollama:
Download Ollama from ollama.com . After installing, run the following command in the command line to set up the LLava model:
ollama run llava:7b
2. Create a Python virtual environment:
With Python installed, set up a virtual environment with Ollama support:
python -m venv vision
cd vision
Scripts/activate
pip install ollama
3. Write the script vision.py:
Create a Python script vision.py with the following content:
import ollama
res = ollama.chat(
model="llava:7b",
messages=[
{
'role': 'user',
'content': 'Describe this image:', # or any prompt regarding the image
'images': ['./image.jpg']
}
]
)
print(res['message']['content'])
4. Run the script:
To execute the script, simply run:
python vision.py
Test Image 1: A Mandrill
The image shows a monkey standing on what appears to be a rocky or concrete surface. The monkey has a striking blue and pink face with a large, prominent nose. Its fur is predominantly gray, with some white markings around its head and limbs. The animal's posture suggests it is comfortable in its environment. In the background, there is an enclosure structure made of concrete or stone, which indicates that the monkey is likely in captivity, such as a zoo or wildlife sanctuary. There are no texts visible on the image.
Test Image 2: A Family Gathering
领英推荐
This is a color photograph of a group of people, who appear to be a family celebrating an occasion. The central figure seems to be an older gentleman in a wheelchair, surrounded by children and adults, all waving and smiling at the camera. The atmosphere suggests warmth and joy. They are gathered in what looks like a home setting with a white wall in the background and a wooden floor. The lighting is bright, creating a cheerful ambiance.
Test Image 3: Abstract Illustration
The image depicts a vibrant and serene digital illustration. At the center, there is a large, transparent sphere floating in space. This sphere contains an array of colorful flowers with blue and purple hues, which are reflected on its surface. These flowers create a mesmerizing pattern within the sphere.
The background consists of a cosmic setting with stars, nebulae, and clouds scattered throughout. The colors in the background transition from shades of purple, pink, and orange to deeper blues, creating a sense of depth and vastness.
Below the floating sphere, there is a field of flowers blooming on a grassy terrain. These flowers are in full bloom with green stems and white petals, complementing the colors seen within the sphere. The ground appears lush and well-maintained, enhancing the overall natural aesthetic of the image.
The style of the image is digital art, with a focus on color, texture, and an otherworldly, fantasy-like atmosphere.
Test Image 4: Robert Doisneau's "Le Baiser de l'H?tel de Ville"
The image is a composite or collage, featuring various elements that might suggest historical significance. In the center, there is a black and white photograph of two individuals embracing, with one person's face visible and both in what appears to be formal attire, including ties. Surrounding this central photo are several smaller images capturing different moments or scenes:
1. The top left corner shows a section of a cityscape with buildings that have a European architectural style, possibly indicating a location like Paris given the presence of a Haussmannian building in the background.
2. Below the central image, on the right side, there is another black and white photo of two individuals shaking hands, both appearing to be wearing military-style uniforms with insignias that might suggest a World War II era, given their style and the emblematic elements such as helmets and epaulettes.
3. The bottom left corner features a person walking on a street lined with lampposts, with various individuals dressed in more contemporary clothing.
4. Lastly, the lower right corner shows another black and white photograph of two people standing close to each other, with both wearing ties, suggesting a formal occasion or event.
The overall composition gives a sense of historical narrative, possibly telling a story that involves different moments captured over time. The collage could be used as a visual aid for a history lesson or as part of an art installation meant to evoke past events or periods.
Wrap-Up
Using Ollama's LLava 7B model for image analysis provides a practical and cost-effective solution for a variety of applications. Although not on par with larger models like GPT-4V, LLava 7B offers surprisingly good results, making it ideal for prototyping, non-critical tasks, or generic image analysis. With easy installation and implementation (doesn't get much easier than this...), this approach offers an accessible entry point for developers and businesses seeking to integrate image recognition capabilities into their projects.
Interested in research, monitoring, and investigation of everything related to the Earth, the Earth’s atmosphere, and the links with the universe, the hourglass
5 个月Nice
Impressive to see the LLava 7B in action on a laptop – the potential for on-the-go image analysis could be a game-changer for many industries!