登录查看更多内容

Unveiling AI's Potential: Building a Visual Question Answering App with Gradio and Transformers

Venugopal Adep

AI Leader | General Manager at Reliance Jio | LLM & GenAI Pioneer | AI Evangelist

发布日期: 2023年12月28日

In my latest project, I've ventured into the fascinating realm of AI, exploring how vision and language models can work together to answer questions about images. I'll take you through my journey of building a Visual Question Answering application using Gradio and Hugging Face's Transformers. This article details the process, from installing Gradio to deploying a user-friendly web interface, demonstrating the ease and efficiency of creating powerful AI tools. Join me in discovering how these cutting-edge technologies are making AI more accessible and interactive than ever before.

Link to my code

Install and import libraries

This code snippet installs a specific version (4.5.0) of Gradio, a Python library for building machine learning web apps, quietly without showing output. It then imports Gradio, the VILT (Vision-and-Language Transformer) processor and model for question answering tasks from Hugging Face's transformers, and the Python Imaging Library (PIL) for image processing.

!pip install gradio==4.5.0 -q
import gradio as gr
from transformers import ViltProcessor, ViltForQuestionAnswering
from PIL import Image

Load the model

This code initializes a VILT processor and model specifically fine-tuned for visual question answering tasks by loading them once from a pre-trained configuration. This is done outside a function to optimize performance and prevent reloading them with every function call.

# Load the processor and model outside of the function to avoid reloading them each time the function is called
processor = ViltProcessor.from_pretrained("dandelin/vilt-b32-finetuned-vqa")
model = ViltForQuestionAnswering.from_pretrained("dandelin/vilt-b32-finetuned-vqa")

Answer Generation

The answer_question function takes an image and a question as inputs, processes them using a pre-trained VILT processor to prepare for inference, and then uses a VILT model to predict the answer. It performs the prediction, retrieves the answer with the highest probability from the logits, and translates the answer ID to a human-readable format using the model's configuration, finally returning this answer.

领英推荐

Artificial Intelligence #207

Andriy Burkov 1 年前

Artificial Intelligence #207

Andriy Burkov 1 年前

Artificial Intelligence #185

Andriy Burkov 1 年前

def answer_question(image, question):
    # Process the image and question
    inputs = processor(images=image, text=question, return_tensors="pt", padding=True)

    # Perform the inference
    outputs = model(**inputs)

    # Extract the predicted answer
    logits = outputs.logits
    answer_id = logits.argmax(-1).item()
    answer = model.config.id2label[answer_id]

    return answer

Create Gradio Interface

The code defines a Gradio web interface, iface, for the visual question answering application, specifying answer_question as the function to call with an image and a question as inputs and a textbox for the output. It then launches this interface, making it accessible as a web application with a user-friendly GUI for users to upload images, ask questions, and receive AI-generated answers.

# Define the Gradio interface
iface = gr.Interface(
    fn=answer_question,
    inputs=[gr.Image(type="pil"), gr.Textbox(label="Question")],
    outputs=gr.Textbox(label="Answer"),
    title="Visual Question Answering",
    description="Upload an image and ask a question related to the image. The AI will try to answer it."
)

# Launch the interface
iface.launch()

要查看或添加评论，请登录

Venugopal Adep的更多文章

Advancing Linguistic Diversity: India's Journey in Developing Large Language Models

2025年2月15日

Advancing Linguistic Diversity: India's Journey in Developing Large Language Models

Executive Summary India's artificial intelligence landscape is undergoing a transformative shift with the emergence of…
?2,500 Crore Investment: Jio's AI Research Centers in 12 Indian Cities

2024年11月19日

?2,500 Crore Investment: Jio's AI Research Centers in 12 Indian Cities

In an ambitious move to democratize AI across India, Reliance Jio is establishing a network of AI research centers…
5,000 AI Use Cases: Inside Jio's Industry-Specific Solutions Factory

2024年11月19日

5,000 AI Use Cases: Inside Jio's Industry-Specific Solutions Factory

In a groundbreaking development, Reliance Jio has unveiled its comprehensive AI solutions ecosystem, powered by…
The 100K AI Engineers: Jio's Massive Upskilling Program for Digital India

2024年11月19日

The 100K AI Engineers: Jio's Massive Upskilling Program for Digital India

In an ambitious move to transform India's tech landscape, Jio has launched a comprehensive AI upskilling initiative…

1 条评论
?12,000 Per Device: How Jio's AI-Powered Smartphones Will Reach 500M Indians

2024年11月19日

?12,000 Per Device: How Jio's AI-Powered Smartphones Will Reach 500M Indians

In an ambitious move to democratize AI access across India, Reliance Jio is launching AI-powered smartphones starting…
2 Million Edge Nodes: Jio's Ambitious Plan to Create India's Largest AI Network

2024年11月19日

2 Million Edge Nodes: Jio's Ambitious Plan to Create India's Largest AI Network

In a bold move to revolutionize India's digital infrastructure, Reliance Jio is deploying an unprecedented network of…
?75,000 Crore AI Push: How Jio Plans to Transform India's Digital Landscape by 2025

2024年11月19日

?75,000 Crore AI Push: How Jio Plans to Transform India's Digital Landscape by 2025

In a bold move to revolutionize India's digital ecosystem, Reliance Jio has unveiled an ambitious AI strategy backed by…
The Taste Synthesizer: AI That Creates Any Food Flavor Instantly

2024年11月18日

The Taste Synthesizer: AI That Creates Any Food Flavor Instantly

In a groundbreaking development at the intersection of artificial intelligence and food science, AI-powered flavor…
Memory Deletion: The AI Service That Helps You Forget Traumatic Experiences

2024年11月18日

Memory Deletion: The AI Service That Helps You Forget Traumatic Experiences

In a groundbreaking convergence of neuroscience and artificial intelligence, researchers have developed sophisticated…
The Sleep Engineer: AI That Designs Your Perfect Dreams

2024年11月18日

The Sleep Engineer: AI That Designs Your Perfect Dreams

In a groundbreaking advancement at the intersection of neuroscience and artificial intelligence, researchers have…

See all articles

Unveiling AI's Potential: Building a Visual Question Answering App with Gradio and Transformers

Venugopal Adep

AI Leader | General Manager at Reliance Jio | LLM & GenAI Pioneer | AI Evangelist

Install and import libraries

Load the model

Answer Generation

领英推荐

Create Gradio Interface

Venugopal Adep的更多文章

社区洞察

其他会员也浏览了

AI Foundation: Creating a small Language Model (LLM) for a lab exercise

New flagship and advanced LLM from MistralAI with a 32K context window ??

Key Insights from the Top 10 AI Papers on HuggingFace as of February 28

Geek Out Time: Play with LangChain

??OpenAI o1 in action ??Vision RAG+Google Street View?AI IRL impacts

Multimodal Prompting with Llama 3.2

From Zero to Chatbot Hero (Well, Almost): My First Steps

Unleashing the Power of AI: Building an Intelligent Agent with LangGraph

OpenAI’s O3: Breaking Records, But Is It AGI?

Image noising and denoising with heatdiff

Install and import libraries

Load the model

Answer Generation

领英推荐

Create Gradio Interface

Venugopal Adep的更多文章

Advancing Linguistic Diversity: India's Journey in Developing Large Language Models

?2,500 Crore Investment: Jio's AI Research Centers in 12 Indian Cities

5,000 AI Use Cases: Inside Jio's Industry-Specific Solutions Factory

The 100K AI Engineers: Jio's Massive Upskilling Program for Digital India

?12,000 Per Device: How Jio's AI-Powered Smartphones Will Reach 500M Indians

2 Million Edge Nodes: Jio's Ambitious Plan to Create India's Largest AI Network

?75,000 Crore AI Push: How Jio Plans to Transform India's Digital Landscape by 2025

The Taste Synthesizer: AI That Creates Any Food Flavor Instantly

Memory Deletion: The AI Service That Helps You Forget Traumatic Experiences

The Sleep Engineer: AI That Designs Your Perfect Dreams

社区洞察

其他会员也浏览了

AI Foundation: Creating a small Language Model (LLM) for a lab exercise

New flagship and advanced LLM from MistralAI with a 32K context window ??

Key Insights from the Top 10 AI Papers on HuggingFace as of February 28

Geek Out Time: Play with LangChain

??OpenAI o1 in action ??Vision RAG+Google Street View?AI IRL impacts

Multimodal Prompting with Llama 3.2

From Zero to Chatbot Hero (Well, Almost): My First Steps

Unleashing the Power of AI: Building an Intelligent Agent with LangGraph

OpenAI’s O3: Breaking Records, But Is It AGI?

Image noising and denoising with heatdiff