Image to Text using Muti Modal Model LLAVA in GGUF format.
Satish Srinivasan
Cloud Architect I Cloud Security Analyst I Specialist - AWS & Azure Cloud. AWS Community Builder| AWS APN Ambassador
We will be using Amazon EC2 Linux instance and be deploying the model and requisite libraries. We will be using M5.8xlarge instance for inference. This is a CPU based instance.The sample code for the same is available in the repo git.
?Create EC2 instance.
Press Launch Instance and then login to the instance that is created.The security group associated with the EC2 instance should have the below Ports opened for the inbound rules.
We are allowing access to the user public IP address for security purposes.
?Login to the EC2 instance and create a virtual environment.
Install the huggingface-cli to download the Model.
pip install -U "huggingface_hub[cli]"
We can use either “llava-v1.5-7B-GGUF” or “liuhaotian_llava-v1.5-13b-GGUF”. For this demo I will be using “jartine/llava-v1.5-7B-GGUF llava-v1.5-7b-Q4_K.gguf” & “jartine/llava-v1.5-7B-GGUF llava-v1.5-7b-mmproj-f16.gguf”.
?Steps to download the Model.
?Install the required libraries. We will be using “llama_cpp” for loading the model.
?It will prompt to install , press “Y”.
Other packages required to run this sample are given below.
Create a folder image in the demo folder. We use this folder for image conversion and this part of the code needs to be refined for generic image format conversion. Currently the conversion does convert .png to ,jpeg format which is the default image format used in the demo.
Let us go back to the demo folder.
?Create a file app.py and copy the code to this file.
?Let us run the code.
?We will be using google chrome browser to test . The url is https://3.90.36.218:8501 which is the public IP address of the ec2 instance.
Further refinement of the code sample can be done and we can experiment with other use cases using LLAVA model.
??
?
?
?
?