Part 2 (Chatbots!) - Applied Computer Vision: Developmental Training for Autistic Children
We interrupt this programming with a special word from our charity project team: Please go to our GitHub project page and give the project a star to support us in the contest. Thank you, and now back to the regular program. >>
In the previous part of the article, we discussed the background and some of the core algorithms of the chatbot-based shape recognizing application that I am building as part of a team for an online contest. After posting the article, I received many great suggestions on how to improve the computer vision algorithms - super grateful for that.
However, this time around, I am going to focus on an entirely different aspect of the project: putting it all together as an MVP, especially the chatbot and interaction part, so we'll leave the algorithm improvement to a later stage.
As mentioned in the previous post, the competition we are partaking is co-sponsored by Wechaty, which (quoting on their Github page) is "a RPA (Robotic Process Automation) SDK for Chatbot Makers". I don't presume to fully understand what that even means, and to be honest, I wasn't particularly crazed about chatbots in general, but we did find a nice way of using it as input/output interface for our app. Moreover, getting Wechaty to work at all has been a tremendous challenge, so I wanted to share our experience lest someone else follows our footsteps and is bumping their heads into some very thick walls.
Wechaty Operating Principle
First of all, like the name suggests, Wechaty is mostly designed to operate with the Chinese chat app Wechat, although it also supports other chat apps. Wechat is a proprietary chat platform by Tencent, and they have very limited API support for third parties. For this reason, the whole implementation of Wechaty is quite convoluted, as mentioned above, and it took me very long time to figure it out (and I am not sure I've got it entirely even now). But here I try:
The leftmost (green) part is what every user of the chat experiences, namely the chatroom. It is where all the messaging happens back and forth between different users. On Wechat, apart from your mobile phone, you can also access the chatrooms via desktop or tablet apps. However, these "side-logins" are associated with the main phone app in that you cannot log in to them independently, only via the phone app through a single-use QR code.
Here is where the Puppet service (yellow) part comes in. It is an online service masking as an iPad device so it can login to the same Wechat account as your phone. After that, it has access to all of your chatrooms and private messages. And as these online Puppet services seem quite random, so BE VERY MINDFUL OF YOUR PRIVACY when using them. I, for one, use a second Wechat account altogether, separated from my main account, that I dedicate for these kind of little hacky projects. So basically, the chatbot app will be running on top of someone's real Wechat account instead of being spawned as a standalone bot that becomes part of the chatroom.
The iPad Puppet service connects with the main Wechaty program (blue) that you can just run on your local machine, or also deploy somewhere on the cloud. It allows the program to access all the messages and attachments sent to the chatrooms and grabbed by the Puppet service. Naturally, it can also send messages and attachments back into the chatrooms via the Puppet. The main chat logic of the app will be built into the Wechaty part such as specific wake words, if-this-then-thats, all the rules for interaction basically.
领英推荐
Normally, one would have all elements of the main program running directly within the Wechaty code, including the computer vision parts for our specific project needs. However, the computer vision part was already implemented in Python but the Python version of the Wechaty codebase is very unstable (several members in our team and also from other teams were not able to get it running properly). In the end, we decided to implement the Wechaty chatbot logic in C# (thanks to the last-minute backend developer addition to our team) and have a separate Python codebase dedicated to the computer vision (purple) part. The CV part was made into a mini service using the Flask framework on Python that the Wechaty part could call via a simple http API to transfer data back and forth.
It's definitely not the most elegant implementation but gets the job done, which is all we needed for the first MVP demo and to get the project submitted to the contest.
Aaand... Action!
What we see here is basically an early prototype where every piece is put together and acts in an interactive way within the chatroom:
Surely, due to deficiencies in my computer vision algorithms (part 1), the verdict itself is not always correctly judged, but for the sake of the demo and for the contest, it is working well enough already.
Next Steps
We are now gearing towards putting everything together, writing some documentations and filming an introductory video to submit our work to the contest. As mentioned in the opening, it would mean a world to us, and to me personally, to get your support in the form of a GitHub star to get our charity project for the autistic children off to a great start in the contest. Thank you!
And I will be back to give an update on those CV algorithms for sure :)