登录查看更多内容

Building Object Detection Models for Visually Impaired People

Arpitha S.

AI Performance Optimization Expert | Specializing in LLM models (Llama 3.1, ResNet, Stable Diffusion) | TensorRT, vLLM, CUDA, ROCm | MLCommons MLPerf Benchmarks on NVIDIA, AMD & Intel chips.

发布日期: 2024年3月13日

Recently I completed Deeplearning.AI course on open source #huggingface models. Found object detection model implementation very useful for a use case “Harnessing the power of open source models to build meaningful applications for society.

STEP 1 : Install all the required pip libraries such as Transformers, gradio, timm, inflect and phonemizer

!pip install transformers

!pip install gradio

!pip install timm

!pip install inflect

!pip install phonemizer

STEP 2 : Using gradio to convert the image description to voice to help visually impaired person.

!sudo apt-get update

!sudo apt-get install espeak-ng

!pip install py-espeak-ng

STEP 3 : Building object detection pipeline using facebook/detr-resnet-50 model from hugging face.

od_pipe = pipeline("object-detection", "facebook/detr-resnet-50"

STEP 4 : Import images to the pipeline

领英推荐

Artificial Intelligence #266

Andriy Burkov 2 周前

Artificial Intelligence #266

Andriy Burkov 2 周前

Artificial Intelligence #129

Andriy Burkov 2 年前

from PIL import Image
raw_image = Image.open('/content/image.jpeg')
raw_image.resize((569, 491))
pipeline_output = od_pipe(raw_image)

processed_image = render_results_in_image(
   raw_image,
   pipeline_output)

processed_image

pipeline_output

od_pipe

STEP 5 :? Using summarize predictions natural language to summarize the text generated from the object detection pipeline

raw_image = Image.open('/content/home.jpeg')
raw_image.resize((284, 245))

from helper import summarize_predictions_natural_language

STEP 6 : Generating audio narration from the image description generated

tts_pipe = pipeline("text-to-speech",
                   model="kakao-enterprise/vits-ljs")
narrated_text = tts_pipe(text)
from IPython.display import Audio as IPythonAudio
IPythonAudio(narrated_text["audio"][0],
            rate=narrated_text["sampling_rate"])

Credit code : DeepLearning+HuggingFace

By harnessing the power of cutting-edge object detection technology, we have the potential to revolutionize the way visually impaired individuals experience and navigate the world around them. This innovative solution promises to open up new avenues of independence, empowering those with visual impairments to tackle everyday tasks with renewed confidence and freedom.

#genAI, #objectdetection #AI #ML

TOMEK

1 年

Impressive work on completing the course and applying object detection models for such a noble cause—your dedication to leveraging technology for social good is truly inspiring!

2 次回应

要查看或添加评论，请登录

Arpitha S.的更多文章

Coding with AI: Quick walkthrough of Google Colab’s Generative AI feature

2024年3月7日

Coding with AI: Quick walkthrough of Google Colab’s Generative AI feature

Coding with AI: Quick walkthrough of Google Colab's Generative AI feature Google introduced Colab AI in 2023. Generate…

3 条评论

Building Object Detection Models for Visually Impaired People

Arpitha S.

AI Performance Optimization Expert | Specializing in LLM models (Llama 3.1, ResNet, Stable Diffusion) | TensorRT, vLLM, CUDA, ROCm | MLCommons MLPerf Benchmarks on NVIDIA, AMD & Intel chips.

领英推荐

Arpitha S.的更多文章

社区洞察

其他会员也浏览了

Artificial Intelligence #139

Artificial Intelligence #91

Artificial Intelligence #48

Gaining Excellence via Prompt Engineering

Optimizing Machine Learning: The Role of Embeddings and Vector Databases

BxD Notes (Saturday Letter #202413)

Infinite Loops and Gradient Descent: A Journey Through AI

23nd April 2023

DeepSeek R1: Secret Sauce for Cost-Efficient LLM Training

Your Daily AI Research tl;dr - 2022-08-19 ??

领英推荐

Arpitha S.的更多文章

Coding with AI: Quick walkthrough of Google Colab’s Generative AI feature

社区洞察

其他会员也浏览了

Artificial Intelligence #139

Artificial Intelligence #91

Artificial Intelligence #48

Gaining Excellence via Prompt Engineering

Optimizing Machine Learning: The Role of Embeddings and Vector Databases

BxD Notes (Saturday Letter #202413)

Infinite Loops and Gradient Descent: A Journey Through AI

23nd April 2023

DeepSeek R1: Secret Sauce for Cost-Efficient LLM Training

Your Daily AI Research tl;dr - 2022-08-19 ??