登录查看更多内容

Building Experiences Using AI - 6: "Nayan" - AI Powered Vision for the Visually Challenged

Samir Dash

UX. DESIGN. AI & ML. INNOVATION

发布日期: 2024年7月18日

Post GAAD event that I attended on 11th May at L.V.Prasad Eye Institute, Hyderbad, on World Accessibility Day, and it really got me thinking about the everyday challenges visually challenged people face while moving around. It’s easy for those of us with sight to take our vision for granted. We effortlessly navigate through crowded streets, bustling malls, and even our own homes without giving it a second thought. But for someone who is visually challenged, every step is a potential hazard, and every journey, no matter how short, can be daunting.

The Problem Statement

Though there are now several assistive technology apps that try to solve these problems. Majority of them use devices like GPS navigation apps, talking maps, and electronic travel aids can provide blind people with information about their surroundings and directions to their destination, facilitating independent mobility.

But one interesting is "Be My Eyes", a paid app where AI is used to recognize the objects in the user's environment using. camera image and AI backend service in play. This is interesting because, unlike the other apps, the app developer does not need to pre-record audio guide for objects, directions etc. And this means this can work user's indoors also.

I decided to explore if one can build a simple solution getting inspired from this app , where it will calculate the distance between the user and the objects and describe those, using existing AI-based APIs to make life a bit easier for visually challenged individuals. I’m calling this solution "Nayan" – which means "eye" in Hindi. The idea is for Nayan to work as an additional eye, guiding the user safely and efficiently through their surroundings.

The Experiment

The basic hypothesis behind Nayan, was simple, it should be able to identify the surrounding objects of the user and calculate the distance in terms of foot steps of the user, and tell that to the user over voiceover, so that the user can confidently move around with it's help. So, imagine walking through a park. Nayan would tell you something like, “In 10 steps, there is a bench to your right,” or “In 5 steps, you will reach a curb.” This way, the user gets a real-time audio guide, helping them navigate more confidently and independently.

Essentially the following steps will be part of the experience.

STEP 1 - Capturing the Environment: First, a camera will capture the environment in front of the user. This could be through a smartphone, a wearable device, or any portable camera.
STEP 2 - Identifying Objects: Using an AI Vision API, the captured image is processed to identify objects in the environment. This is where the magic of AI comes in – recognizing everything from a tree to a parked car, to a curb, or even a stray dog.
STEP 3 - Simulating to Physical Space: The next step involves simulating this information to a physical space. The Vision API will help calculate the approximate distance of these objects from the user in feet.
STEP 4 - Converting Distance to Steps: To make this information practical, the distance is then converted into steps. Based on an average value where one step equals a specific number of feet, this translation makes the data actionable for the user.
STEP 5 - Audio Feedback: Finally, using Text-to-Speech (TTS) technology, this information is converted into audio. The user will hear directions in the form of steps, guiding them safely through their environment.

The application would calculate the distance (in terms of steps) between the user and objects in-front of him .

So the initial step was to build the framework that would allow the user to capture a photo using his mobile and send that to the backend to processed. Using HTML5 canvas , sending a base64 image data of an instance of a video object showing the camera stream was straight forward easy.

The following form was helpful in setting up the UI and the placeholder to hold the base64 data as part of the form to send.

  <form method="POST" action="storeImage.php">
                <div onClick="take_snapshot()" id="my_camera"></div>
                <input type="hidden" name="image" class="image-tag">
                <div id="results">Your captured image will appear here...</div>
                <button id="submitBt" class="btn btn-success">Submit</button>
    </form>

To make sure the user is not struggling to find the trigger point, when running the app, tapping anywhere a function gets triggered to takes the screen grab of the live stream and sends it to backend as base64 image data for processing.

Steve Nouri 2 年前

The Only Open Source AR Glasses Powered by AI

Gabriele Romagnoli 1 个月前

Directly from 'The Labs': Advancements in AI for 3D…

Dr. Ivan Del Valle 1 个月前

function  Take_snapshot() {
        Webcam.snap( function(data_uri) {
            $(".image-tag").val(data_uri);
            document.getElementById('results').innerHTML = '<img src="'+data_uri+'"/>';
            submit_form();

        } );
    }

The next step is to send this base 64 image to LLM for processing, I used GPT3.5 for this, with the instructions to take a pre-configured value of 1 step equals to 1.5 feet.

$prompt =   array(

  array(
    "type"=>"text",
    "text"=>"$prompt_insruction_set",
  ),
  array(
    "type"=> "image_url",
    "image_url"=>
    array(
      "url" =>  "$file_base64",
    )

So I get results like the following

From your viewpoint:1. Wooden bench - Approximately 2 steps (5 feet)2. Wall-mounted mirror - Slightly above 2 steps (around 5 feet)3. Soap dispenser - Just over 2 steps (around 5 to 6 feet)4. Wall-mounted sink - Just over 2 steps (around 5 to 6 feet)5. Hand towel dispenser - Just over 2 steps (around 5 to 6 feet)6. Waste bin - Approximately 3 steps (7.5 feet)7. Wall-mounted toilet with a grab bar - Approximately 3 steps (7.5 feet)8. Toilet brush - Around 3 steps (7.5 feet)9. Toilet paper holder - Approx. 3 to 3.5 steps (7.5 to 8.75 feet)Everything is placed within a short walking distance, allowing for ease of use.

Following is the the demo of an indoor scenario - It was able to detect the objects and the distances from the user's mobile camera using 1.5ft as 1 step of the user.

Another run of Nayan in the following:

Conclusion/ Going Forward

There are multiple things that can improve this prototype to improve the experience for the user including but not limited to following:

Using a web-socket for realtime communication
Include instructions for LLM to prfioritize objects that is important for the user, and skip rest of the objects in view.
Provided the ability for the user to ask for certain information about the environment he is in.

Stay tuned for more updates as I continue to work different aspects of how AI can be used to improve experiences. If you have any thoughts, feedback, or ideas, I’d love to hear them.

要查看或添加评论，请登录

查看全部

Building Experiences Using AI - 6: "Nayan" - AI Powered Vision for the Visually Challenged

Samir Dash

UX. DESIGN. AI & ML. INNOVATION

The Problem Statement

The Experiment

领英推荐

Conclusion/ Going Forward

更多精彩文章

社区洞察

其他会员也浏览了

DailySnap: Transforming Daily Life into Visual Narratives with AI

Enhancing Accessibility: Generative AI in Inclusive Design

James Dean’s Voice in AI, Meta 3D Gen, and Grok-2: The Future Unveiled

Weekly News Recap

8 Best AI Avatar Video Generators in 2024

Future of the Smart Glasses Industry: Innovations and Possibilities

Evolution of Smart AI Toys: Recent Developments and Regulatory Landscape

Are Snap’s AI Glasses Worth the Hype?

"Spatial" vs. "Spacial": More Than Just a Typo?

Google Maps New Upgrade: A Leap Towards a Conversational AI Experience

The Problem Statement

The Experiment

领英推荐

Conclusion/ Going Forward

The Next-Gen Designer (2): Prototyping & The Full-stack Designer

2024年10月13日

The Next-Gen Designer (1): Time to Cross-skill

2024年9月30日

Building Experiences Using AI (8): 'Proto UXR' - Next-Gen Approach to AI Enabled UX/ Market Research

2024年8月24日

Building Experiences Using AI - 7: 'ProDaLiDE' - NextGen Prototyping for AI & Data driven Web Applications

2024年8月2日

GAAD, Hyderbad(IN) 2024 Keynote Slides: Empowering Web-Accessibility through Artificial Intelligence (AI)

2024年5月12日

Building Experiences Using AI- 5: First-ever AI Facilitated In-Person Design Thinking Session

2024年3月21日

Building Experiences Using AI- 4: Leveraging AI for Storyboarding for In-Person Design Thinking Sessions

2024年3月10日

Building Experiences Using AI-3: Converting Paper Prototypes into Digital

2024年3月7日

Building Experience Using AI -2: Making Charts/Graphs Semantic & Accessible for Screen-readers in Real-time Using AI

2024年3月6日

Building Experience Using AI -1: AI-Powered Accessibility Page Fixer for Existing Web-pages

2024年3月3日

社区洞察

其他会员也浏览了

DailySnap: Transforming Daily Life into Visual Narratives with AI

Enhancing Accessibility: Generative AI in Inclusive Design

James Dean’s Voice in AI, Meta 3D Gen, and Grok-2: The Future Unveiled

Weekly News Recap

8 Best AI Avatar Video Generators in 2024

Future of the Smart Glasses Industry: Innovations and Possibilities

Evolution of Smart AI Toys: Recent Developments and Regulatory Landscape

Are Snap’s AI Glasses Worth the Hype?

"Spatial" vs. "Spacial": More Than Just a Typo?

Google Maps New Upgrade: A Leap Towards a Conversational AI Experience