Building Experiences Using AI - 6: "Nayan" - AI Powered Vision for the Visually Challenged

Building Experiences Using AI - 6: "Nayan" - AI Powered Vision for the Visually Challenged

Post GAAD event that I attended on 11th May at L.V.Prasad Eye Institute, Hyderbad, on World Accessibility Day, and it really got me thinking about the everyday challenges visually challenged people face while moving around. It’s easy for those of us with sight to take our vision for granted. We effortlessly navigate through crowded streets, bustling malls, and even our own homes without giving it a second thought. But for someone who is visually challenged, every step is a potential hazard, and every journey, no matter how short, can be daunting.


The Problem Statement

Though there are now several assistive technology apps that try to solve these problems. Majority of them use devices like GPS navigation apps, talking maps, and electronic travel aids can provide blind people with information about their surroundings and directions to their destination, facilitating independent mobility.

But one interesting is "Be My Eyes", a paid app where AI is used to recognize the objects in the user's environment using. camera image and AI backend service in play. This is interesting because, unlike the other apps, the app developer does not need to pre-record audio guide for objects, directions etc. And this means this can work user's indoors also.

I decided to explore if one can build a simple solution getting inspired from this app , where it will calculate the distance between the user and the objects and describe those, using existing AI-based APIs to make life a bit easier for visually challenged individuals. I’m calling this solution "Nayan" – which means "eye" in Hindi. The idea is for Nayan to work as an additional eye, guiding the user safely and efficiently through their surroundings.


The Experiment

The basic hypothesis behind Nayan, was simple, it should be able to identify the surrounding objects of the user and calculate the distance in terms of foot steps of the user, and tell that to the user over voiceover, so that the user can confidently move around with it's help. So, imagine walking through a park. Nayan would tell you something like, “In 10 steps, there is a bench to your right,” or “In 5 steps, you will reach a curb.” This way, the user gets a real-time audio guide, helping them navigate more confidently and independently.


Essentially the following steps will be part of the experience.

  • STEP 1 - Capturing the Environment: First, a camera will capture the environment in front of the user. This could be through a smartphone, a wearable device, or any portable camera.
  • STEP 2 - Identifying Objects: Using an AI Vision API, the captured image is processed to identify objects in the environment. This is where the magic of AI comes in – recognizing everything from a tree to a parked car, to a curb, or even a stray dog.
  • STEP 3 - Simulating to Physical Space: The next step involves simulating this information to a physical space. The Vision API will help calculate the approximate distance of these objects from the user in feet.
  • STEP 4 - Converting Distance to Steps: To make this information practical, the distance is then converted into steps. Based on an average value where one step equals a specific number of feet, this translation makes the data actionable for the user.
  • STEP 5 - Audio Feedback: Finally, using Text-to-Speech (TTS) technology, this information is converted into audio. The user will hear directions in the form of steps, guiding them safely through their environment.


The application would calculate the distance (in terms of steps) between the user and objects in-front of him .

So the initial step was to build the framework that would allow the user to capture a photo using his mobile and send that to the backend to processed. Using HTML5 canvas , sending a base64 image data of an instance of a video object showing the camera stream was straight forward easy.

The setup of the 'Nayan' experiment


The following form was helpful in setting up the UI and the placeholder to hold the base64 data as part of the form to send.

  <form method="POST" action="storeImage.php">
                <div onClick="take_snapshot()" id="my_camera"></div>
                <input type="hidden" name="image" class="image-tag">
                <div id="results">Your captured image will appear here...</div>
                <button id="submitBt" class="btn btn-success">Submit</button>
    </form>        

To make sure the user is not struggling to find the trigger point, when running the app, tapping anywhere a function gets triggered to takes the screen grab of the live stream and sends it to backend as base64 image data for processing.

function  Take_snapshot() {
        Webcam.snap( function(data_uri) {
            $(".image-tag").val(data_uri);
            document.getElementById('results').innerHTML = '<img src="'+data_uri+'"/>';
            submit_form();

        } );
    }        

The next step is to send this base 64 image to LLM for processing, I used GPT3.5 for this, with the instructions to take a pre-configured value of 1 step equals to 1.5 feet.

$prompt =   array(

  array(
    "type"=>"text",
    "text"=>"$prompt_insruction_set",
  ),
  array(
    "type"=> "image_url",
    "image_url"=>
    array(
      "url" =>  "$file_base64",
    )
         



So I get results like the following

From your viewpoint:1. Wooden bench - Approximately 2 steps (5 feet)2. Wall-mounted mirror - Slightly above 2 steps (around 5 feet)3. Soap dispenser - Just over 2 steps (around 5 to 6 feet)4. Wall-mounted sink - Just over 2 steps (around 5 to 6 feet)5. Hand towel dispenser - Just over 2 steps (around 5 to 6 feet)6. Waste bin - Approximately 3 steps (7.5 feet)7. Wall-mounted toilet with a grab bar - Approximately 3 steps (7.5 feet)8. Toilet brush - Around 3 steps (7.5 feet)9. Toilet paper holder - Approx. 3 to 3.5 steps (7.5 to 8.75 feet)Everything is placed within a short walking distance, allowing for ease of use.




Following is the the demo of an indoor scenario - It was able to detect the objects and the distances from the user's mobile camera using 1.5ft as 1 step of the user.

Another run of Nayan in the following:

Conclusion/ Going Forward

There are multiple things that can improve this prototype to improve the experience for the user including but not limited to following:

  1. Using a web-socket for realtime communication
  2. Include instructions for LLM to prfioritize objects that is important for the user, and skip rest of the objects in view.
  3. Provided the ability for the user to ask for certain information about the environment he is in.


Stay tuned for more updates as I continue to work different aspects of how AI can be used to improve experiences. If you have any thoughts, feedback, or ideas, I’d love to hear them.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了