Building Experiences Using AI - 6: "Nayan" - AI Powered Vision for the Visually Challenged
Post GAAD event that I attended on 11th May at L.V.Prasad Eye Institute, Hyderbad, on World Accessibility Day, and it really got me thinking about the everyday challenges visually challenged people face while moving around. It’s easy for those of us with sight to take our vision for granted. We effortlessly navigate through crowded streets, bustling malls, and even our own homes without giving it a second thought. But for someone who is visually challenged, every step is a potential hazard, and every journey, no matter how short, can be daunting.
The Problem Statement
Though there are now several assistive technology apps that try to solve these problems. Majority of them use devices like GPS navigation apps, talking maps, and electronic travel aids can provide blind people with information about their surroundings and directions to their destination, facilitating independent mobility.
But one interesting is "Be My Eyes", a paid app where AI is used to recognize the objects in the user's environment using. camera image and AI backend service in play. This is interesting because, unlike the other apps, the app developer does not need to pre-record audio guide for objects, directions etc. And this means this can work user's indoors also.
I decided to explore if one can build a simple solution getting inspired from this app , where it will calculate the distance between the user and the objects and describe those, using existing AI-based APIs to make life a bit easier for visually challenged individuals. I’m calling this solution "Nayan" – which means "eye" in Hindi. The idea is for Nayan to work as an additional eye, guiding the user safely and efficiently through their surroundings.
The Experiment
The basic hypothesis behind Nayan, was simple, it should be able to identify the surrounding objects of the user and calculate the distance in terms of foot steps of the user, and tell that to the user over voiceover, so that the user can confidently move around with it's help. So, imagine walking through a park. Nayan would tell you something like, “In 10 steps, there is a bench to your right,” or “In 5 steps, you will reach a curb.” This way, the user gets a real-time audio guide, helping them navigate more confidently and independently.
Essentially the following steps will be part of the experience.
So the initial step was to build the framework that would allow the user to capture a photo using his mobile and send that to the backend to processed. Using HTML5 canvas , sending a base64 image data of an instance of a video object showing the camera stream was straight forward easy.
The following form was helpful in setting up the UI and the placeholder to hold the base64 data as part of the form to send.
<form method="POST" action="storeImage.php">
<div onClick="take_snapshot()" id="my_camera"></div>
<input type="hidden" name="image" class="image-tag">
<div id="results">Your captured image will appear here...</div>
<button id="submitBt" class="btn btn-success">Submit</button>
</form>
To make sure the user is not struggling to find the trigger point, when running the app, tapping anywhere a function gets triggered to takes the screen grab of the live stream and sends it to backend as base64 image data for processing.
领英推荐
function Take_snapshot() {
Webcam.snap( function(data_uri) {
$(".image-tag").val(data_uri);
document.getElementById('results').innerHTML = '<img src="'+data_uri+'"/>';
submit_form();
} );
}
The next step is to send this base 64 image to LLM for processing, I used GPT3.5 for this, with the instructions to take a pre-configured value of 1 step equals to 1.5 feet.
$prompt = array(
array(
"type"=>"text",
"text"=>"$prompt_insruction_set",
),
array(
"type"=> "image_url",
"image_url"=>
array(
"url" => "$file_base64",
)
So I get results like the following
From your viewpoint:1. Wooden bench - Approximately 2 steps (5 feet)2. Wall-mounted mirror - Slightly above 2 steps (around 5 feet)3. Soap dispenser - Just over 2 steps (around 5 to 6 feet)4. Wall-mounted sink - Just over 2 steps (around 5 to 6 feet)5. Hand towel dispenser - Just over 2 steps (around 5 to 6 feet)6. Waste bin - Approximately 3 steps (7.5 feet)7. Wall-mounted toilet with a grab bar - Approximately 3 steps (7.5 feet)8. Toilet brush - Around 3 steps (7.5 feet)9. Toilet paper holder - Approx. 3 to 3.5 steps (7.5 to 8.75 feet)Everything is placed within a short walking distance, allowing for ease of use.
Following is the the demo of an indoor scenario - It was able to detect the objects and the distances from the user's mobile camera using 1.5ft as 1 step of the user.
Another run of Nayan in the following:
Conclusion/ Going Forward
There are multiple things that can improve this prototype to improve the experience for the user including but not limited to following:
Stay tuned for more updates as I continue to work different aspects of how AI can be used to improve experiences. If you have any thoughts, feedback, or ideas, I’d love to hear them.