Part 1 - Applied Computer Vision: Developmental Training for Autistic Children
Digital applications may help children on the autistic spectrum to acquire social and behavioral skills

Part 1 - Applied Computer Vision: Developmental Training for Autistic Children

tl;dr > New hobby project! My first one with a team, and they are total strangers. We are trying to build something that can be used to help autistic children train and develop their visual cognition. But we're not quite there yet. Help us?

When I was in junior high, our school was structured so that the regular students shared the same building with a wing dedicated to autistic children. The entrances to the autistic wing were often locked for obvious reasons, but from time to time, some of our classes were held in rooms located there so we would visit that part of the building. We also occasionally (though very rarely) got a chance to interact directly with the autistic children on joint school outing trips, for example.

Because I was attending a special "media emphasis" class, during one of these trips we were tasked to film a short documentary, focusing on the topic of autism. We were researching the topic for the narration which was recorded by my classmate Joonatan, and to this date I remember his opening line for our video, word-for-word:

Autism is a neurobiological developmental disorder of the central nervous system

These children, who were socially and emotionally extremely introverted even on a Finnish scale (and you know Finns can be VERY introverted), fascinated me and stayed with me long after I graduated on to high school, and beyond.

No alt text provided for this image

A few years back, I learned that one of our neighbors here in China has an autistic child. A diner I used to frequent in downtown Shanghai has an owner who also volunteers at events organized by and for autistic children. Even a Chinese anime series that I watched, set in the city of Chongqing, features a mysterious autistic child at a local children's welfare center...

These people are all around us, yet for the masses, they remain invisible, always a bit out of reach.

Background

Those who follow my writing here might have noticed it's been a year and a half since my previous post, and I never actually got to finish that project, either. Work and family have absolutely taken all my time and energy and I couldn't focus on other things, even on the weekends.

But I do have a bit more time around the summer to work on my own things. About a month ago I went to a randomish Mixlab meetup where people from various disciplines get together and try to collaborate on stuff. This time the topic was with chatbots and a local deep learning architecture called PaddlePaddle (which is Baidu's open source project, while the company was also co-sponsoring the event, along with an open source chatbot platform project Wechaty).

I have to admit I didn't have much interest in chatbots per se. But the event, which turned out to be a kick-off to a competition, was rigorous in form in that the participating projects should be using the chatbot platform and building their applications on top of that user interface. So at least for me, I am just going to treat it as a user input/output mechanism.

Developmental Training for Autistic Children

At the event, a young woman was describing her UX research at university and how she wanted to turn it into a tool or game that the autistic children could play with and train to develop some basic visual recognition skills. The topic intrigued me so I decided to join the group.

We soon settled on an idea for the user to interact through the chatbot feature, as required by the contest. The chatbot would randomly select one of the several basic shapes (round, square, triangle) and ask the user to take pictures of real life objects that would resemble this shape and submit it back. The app would rate the picture based on the shape that it could identify in it and give feedback to the user.

This was a first time that I do a hobby project in a team, with total strangers. Of course during these years working as machine learning engineer at an AI startup, I've gotten used to working with other people. Heck, we might even publish our source code on GitHub after the demo is done!

Because of my previous doodling with computer vision (and because it later led to my current job), I was pretty confident we could do something akin of a shape recognizer. My theory was we could make it work using OpenCV alone, without even the need for any deep learning. The basic workflow would be:

  1. Take a photo
  2. Run some processing (for example detecting the edges or contours)
  3. Compare the image with the shape templates, also processed to have the same post edge/contour detection look
  4. Predict the template that most resembles the user input image

Template and Test Data Collection

To get things started, our team members helped collect a bunch of photos from everyday life along each of the basic shape categories. We even collected some more obscure ones that couldn't be readily attributed to any single shape, but we were curious how the machine would recognize them.

No alt text provided for this image

Here we have already ingested the photos in grayscale and ran a normalizing pipeline to crop them in the middle to the same aspect ratio, then resize to an uniform resolution, and made into a collage for easy previewing. We would also keep this same view when applying further processing to the images to have an intuition on how same actions perform on various kinds of data.

No alt text provided for this image

Similarly, we would process the template images, as mentioned earlier.

No alt text provided for this image

Structural Similarity

The most naive way of comparing two images would be to sum up the difference on a pixel-per-pixel basis. However, in such situations, consider that the second image is shifted to the side by just a couple of pixels, and suddenly the pixel-to-pixel comparison fails spectacularly (since each pixel is now different, the sum of all the differences would be huge) although to a human eye, the two images still look almost exactly the same.

in my plan, the primary method of comparing the templates to a user image would be through an algorithm called structural similarity, lifted from the Scikit-Image library. The idea of structural similarity is to abstract the images to some degree before making the comparison on the detected structural features instead of the original pixel level.

No alt text provided for this image

However, as you can see from the above image, the recognition results from this structural comparison didn't immediately convince us. It seems there is still quite a lot of noise. We initially suspected it's because we only have one orientation for the templates where angles do matter (such as square or triangle) but even accounting for rotational variance, the accuracy did not increase accordingly. We suspect size also matters, although making our comparison algorithm robust to scale would seem much harder with just OpenCV.

Next Steps

This was a fairly short and straightforward playtest sprint, but so far we have already generated some ideas for the next steps.

Contrary to what I initially theorized, we could probably use deep learning after all, at least to generate vectorized embeddings for both the template and input images. Convolutional deep learning models have already been trained to extract highly robust features that are translation, rotation and scale invariant. Thus, we could be doing similarity comparisons in the abstracted embedding space instead of the absolute pixel space. Although we have converted the raw pixels into edge or contour maps, they're still too achored into actual pixel coordinates.

Do you, dear reader, have any further ideas or suggestions that you would want to add to our list of things to try? Feel free to leave a comment and participate in the discussion!

Lisa Emily Petersen

Video Editor & Social Media Manager

3 年

This looks awesome!

Hua Jin

Founder and CEO @ Aixedu.com /AI for Education and Sustainability

3 年

very interesting project

回复
Tianyi Pan

AI Generalist & LLM Whisperer ? Multicultural Biz & Tech Professional ? CMA?

3 年

Joonatan Lintala mainittu ohimennen! :D Alkuper?inen lainaus suomeksihan oli: "Autismi on neurobiologinen keskushermoston kehitysh?iri?".

回复

要查看或添加评论,请登录

Tianyi Pan的更多文章

社区洞察

其他会员也浏览了