Building Object Detection Models for Visually Impaired People
Arpitha S.
AI Performance Optimization Expert | Specializing in LLM models (Llama 3.1, ResNet, Stable Diffusion) | TensorRT, vLLM, CUDA, ROCm | MLCommons MLPerf Benchmarks on NVIDIA, AMD & Intel chips.
Recently I completed Deeplearning.AI course on open source #huggingface models. Found object detection model implementation very useful for a use case “Harnessing the power of open source models to build meaningful applications for society.
STEP 1 : Install all the required pip libraries such as Transformers, gradio, timm, inflect and phonemizer
!pip install transformers
!pip install gradio
!pip install timm
!pip install inflect
!pip install phonemizer
STEP 2 : Using gradio to convert the image description to voice to help visually impaired person.
!sudo apt-get update
!sudo apt-get install espeak-ng
!pip install py-espeak-ng
STEP 3 : Building object detection pipeline using facebook/detr-resnet-50 model from hugging face.
od_pipe = pipeline("object-detection", "facebook/detr-resnet-50"
STEP 4 : Import images to the pipeline
from PIL import Image
raw_image = Image.open('/content/image.jpeg')
raw_image.resize((569, 491))
pipeline_output = od_pipe(raw_image)
processed_image = render_results_in_image(
raw_image,
pipeline_output)
processed_image
pipeline_output
od_pipe
STEP 5 :? Using summarize predictions natural language to summarize the text generated from the object detection pipeline
raw_image = Image.open('/content/home.jpeg')
raw_image.resize((284, 245))
from helper import summarize_predictions_natural_language
STEP 6 : Generating audio narration from the image description generated
tts_pipe = pipeline("text-to-speech",
model="kakao-enterprise/vits-ljs")
narrated_text = tts_pipe(text)
from IPython.display import Audio as IPythonAudio
IPythonAudio(narrated_text["audio"][0],
rate=narrated_text["sampling_rate"])
Credit code : DeepLearning+HuggingFace
By harnessing the power of cutting-edge object detection technology, we have the potential to revolutionize the way visually impaired individuals experience and navigate the world around them. This innovative solution promises to open up new avenues of independence, empowering those with visual impairments to tackle everyday tasks with renewed confidence and freedom.
#genAI, #objectdetection #AI #ML
Impressive work on completing the course and applying object detection models for such a noble cause—your dedication to leveraging technology for social good is truly inspiring!