Learning and Fun with AI, Bots, and Ninjas!
Guy Barker
Exploring how to make apps more accessible to more people, and sharing all I learn.
This article describes one approach to setting up a workshop introducing students to AI, and also describes steps for updating an Azure webchat bot to support token authentication and speech input/output.
Introduction
One of the highlights of my work is having the opportunity to meet students who are interested in learning about new technologies, and who can help me to consider things in new, creative ways.
For example, last year I met with students from the DO-IT program, who raised some fascinating ideas around how the keyboard experience with the demo Sa11ytaire game might be made more efficient. (I described those ideas at The Sa11ytaire Experiment – Reacting to Feedback Part 1: Keyboard efficiency.)
I also got to learn from the DigiGirlz earlier this year when they suggested ways of making that game more accessible. I was particularly impressed by their feedback on reducing the complexity of the game. For example, would some players prefer less detailed images shown on the picture cards, or a version of the game where the cards are represented by color rather than both color and suit. These were certainly interesting ideas which I’d not considered before. I also got feedback reminding me that if I add features to the game which involve animations or count-downs, players must have a way to turn those animations and count-downs off.
All in all, these were great discussions on creating an inclusive game.
Most recently, I was really pleased to have an opportunity to be involved with Ninja Camp 2019, where students could learn about, and experiment with, various technologies.
Specifically I got to deliver an “Introduction to AI” workshop. Some of the goals of the workshop were:
- It would be educational.
- It would be fun.
- It would be accessible to all the students, regardless of how they interact with their devices.
- The students would control their own AI, rather than only consume someone else’s AI.
This article describes the steps I took in the hope of achieving those goals. And I have to say, I learnt a lot while preparing for, and running the workshop. Perhaps some of the details here might be of interest to you, should you be considering your own AI workshop.
And of course, a big thanks go to many others who worked on the workshop, particularly to Azure team members, whose help was invaluable.
Also, a special thanks goes to Priyanshu Sugasani, who spent some time with our team over the summer of 2019. Priyanshu's contribution during the preparation and running of the Ninja Camp had an important impact in a number of ways on its success. Particularly valuable was his analysis of the accessibility of the various products which were used in the Camp.
What AI to use?
One important decision to make related to what AI would be the best match for the workshop’s goals. For example, Azure Notebooks is a very powerful tool, but perhaps that might be too advanced a tool for an introduction to AI. Azure Machine Learning Studio can be a great tool for providing an introduction into the various steps involved with AI, but I wondered if its use of a drag & drop interaction model might impact the experience for students using screen readers.
So we focused on Azure Cognitive Services, given that those services provide a way for people to create custom AI, without having to be experts in machine learning, (ML,) or even having to write any ML code. The hope was that the various Azure web sites for customizing the AI would be accessible to all students.
It was then a question of which Cognitive Services would be of most interest to the workshop. Some services, such as Custom Vision and the Speech services, can be powerful in enabling custom AI to be created, but would they be the most inclusive for the workshop? Remember, it was a goal to have all students be able to control their own AI, regardless of how they interact with their device. So for this workshop, we decided to go with the text-based Cognitive Services of QnA Maker and Language Understanding, (LUIS). The hope was that those services would be fully accessible for all students, including those using a screen reader, magnification, or speech input.
So the plan became:
- The students would interact with a webchat bot, and experience certain responses from their input.
- The students would then go to the QnA Maker and LUIS sites, and update their own QnA Maker knowledge bases and LUIS apps.
- The students would return to interact with the webchat bot again, and experience the results of their updated AI.
Setting up the AI and bots
This section describes the steps taken to set up the workshop. The summary is that each team of students, (with 2 or 3 students in a team,) got its own QnA Maker knowledge base (KB), LUIS app, and bot. The rationale for this approach was that changes to the AI made by the students would only impact the experience at their bot, not the experience at any other bot.
Important: The steps below are not a recommendation on how things should be set up, simply an example of one approach. While this approach worked satisfactorily at this workshop, it does have certain constraints, and you may feel it’s worth taking additional steps to avoid those constraints.
For example, for this workshop, all QnA Maker KBs and LUIS apps were associated with a single Azure account, which meant it was technically possible for students to navigate to other students’ KB or LUIS app. You might consider creating one account per student, where all the students’ accounts are tenants of the same subscription of a main account.
QnAMaker
In order for each team to be able to work independently, each team needs its own QnAMaker KB.
So first I set up a QnAMaker KB for myself. That QnAMaker KB was created with no chit-chat personality, in order to simplify the results shown during the workshop. Two Q&A pairs were then manually added to the KB, these being related to “hi” and “who are you”, including alternative phrases for the questions. The KB was then exported, for later importing into the students QnAMaker services.
Then QnAMaker KBs were manually created for all the students. The QnAMaker KB for each student was initialized by importing the KB exported from my own KB.
LUIS
I then set up a LUIS app for myself. The LUIS app had a single intent added, (in addition to the existing None intent,) in order for me to test the service later.
Then I manually created LUIS apps for each of the students, and edited them to have the same single intent that I had.
Bots
Having created the Cognitive Services, their related AI needed to be made accessible to the students through Azure bots. I’m not aware of a way today to generate a bot which is initialized to leverage both QnAMaker and LUIS, through either the Azure Portal or the *.ai sites. The Bot creation steps that I’m aware of at the Azure Portal has a template for a LUIS-enabled app, and QnAMaker.ai provides a simple way to generate a bot which leverages QnAMaker.
As such, when I first published all the QnAMaker services created earlier, I chose to have a bot created by QnAMaker.ai, such that each bot would access a specific QnAMaker service. I then manually edited all the bots to leverage a specific LUIS app, (and also to add a welcome card). The response returned by the bot would be based on the relative confidences of the predictions made by the QnAMaker and LUIS services.
Webchat UI
The students needed a way to interact with their bots, and to experience the results of their changes at QnAMaker.ai and LUIS.ai. I believe the Azure webchat V4 is the most accessible Microsoft webchat UI available today, so went with that.
Fgure 1: Configuration of the workshop’s Cognitive Services, Bots and web chat UI.
At the time of the workshop, the webchat was accessed using a local HTML file on each of the students’ devices, with each HTML file containing an embedded Direct Line Channel secret for the specific bot that the team will be using.
Since the workshop, I’ve explored how to avoid the use of the local secret and instead use a token, (as that would be definitely preferred in most cases,) and so I’ve included details on how to achieve that at the end of this article. And while we’re on the subject of updating bots, I’ve also been exploring recently what it takes to add speech input and output to that webchat UI. After all, some customers might much prefer to speak to the bot rather than type a question. So I’ve included notes on that at the end of this article too.
A quick introduction to other AI
Early on in the workshop I did take the opportunity to demonstrate a few forms of AI in action, using the Sa11ytaire game. This was a helpful step, as it introduced the concepts of bots and the QnA Maker service, which the students would be later working with directly themselves.
This demonstrated AI included:
Azure Custom Vision. I presented some physical playing cards to the game, and the game reacted accordingly.
Azure Speech. I spoke to the game, and the app showed the recognized text.
Azure Language Understanding. I spoke “Turn over the next cards”, “More cards”, “Continue”, and the app reacted by turning over the next cards.
A Bot. I turned on a Bot in the game, and first demonstrated a Q&A exchange, and then a LUIS exchange, as follows:
- Say “hi”, app responds with “Hi!”
- Say “who are you”, app responds with “I’m Sally the bot”.
- Say “what can you do?”, app responds with a longer answer.
- Say “how do I win?”, app responds.
- Say “More cards please”. App turns over next cards, through use of language understanding.
- Say “What can I do?”, and the app responds with a suggestion. This was similar to the earlier “what can you do?” but the app knew the difference.
- Say “I think I’m stuck”, and the app knew that this had the same intent as the previous utterance.
Figure 2: The Sa11ytaire app showing recognized text of “I think I’m stuck”, and the app’s response of “Consider moving the 8 of spades”, with the app having been supplied a LUIS intent of “Help” by a bot.
And later in the workshop, to provide a break of the QnA Maker experimentation, I demo’d the steps at CustomVision.ai which I’d taken to train the custom image recognition.
Figure 3: Tagging an image for object detection at CustomVision.ai.
The workshop itself
As it happens, mainly due to time constraints at the workshop, I skipped asking the students to use LUIS, and focused only QnA Maker. This meant that there was more time for me to demo a bot with QnA Maker, leveraging Q&A suggestions from the students, and then the students themselves could experiment with their own AI.
A very important point
It’s crucial to always discuss where the AI is playing a part in the experience. Say a QnA Maker KB is set up with questions and answer pairs, and a student asks exactly one of those questions, and the bot responds with one of those answers. I don’t consider that helpful in demonstrating AI. Instead, the AI really kicks in when a question is asked which the AI hasn’t seen before. So the AI then gets to make a prediction, with some degree of confidence, based on the data it has seen before.
For example, if a QnA Maker KB contains a question of “hi”, and the following are asked at the bot: “hi”, “hiya”, “hi there”, and “hiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii”, how might we think the bot would respond?
And a very important learning for me
When the students first started interacting with the webchat UI, a few students were initially successful. While this was briefly very exciting for me, in no time multiple students reported that their bot was saying some phrase including “forbidden”. (That text appeared because in the event of an exception calling the service, I had the bot present the exception message.)
This unexpected problem was due to me not paying enough attention to which keys were being used by the bots to access the Cognitive Services. The LUIS apps I’d set up had been published with a Starter key, and that meant the service had the constraints associated with a free pricing tier. According to Cognitive Services pricing—Language Understanding (LUIS), that free usage supports up to 5 transactions per second, and that’s not sufficient for this kind of workshop.
The remedy for this was to update the bots to use a standard pricing tier. That supports up to 50 transactions per second, and so worked fine for the workshop.
During the demo period, the students suggested many great ideas for questions and answers, some of which led to interesting discussions on AI. For example, why does QnA Maker react the way it does? That is, for two apparently similar questions, what leads to the prediction confidence being different by a certain amount? Another question related to what happens if the same question can have two answers? And that led to an interesting discussion on QnAMaker’s “follow-up prompt” feature.
And I have to say, the enthusiasm and creativity of the students was so energizing for me. After learning about the follow-up prompt feature, one student created a longer sequence of follow-ups than I’ve seen before. And another set of students set to work using updating their LUIS app, even though I’d only given a brief demo of it, without walking through the steps. With the tools at their disposal, these students were unstoppable!
Another interesting point that cropped up during these discussions related to whether a high-confidence prediction made by AI is “the truth”. When I asked for suggestions for Q&A pairs for my QnA Maker service, one student provided a question of “Does pineapple belong on a pizza?”. Most students felt the matching answer was Yes, but some felt No. I supplied the answer of Yes, and that was the answer provided by my bot later when demo’ing the AI. So this meant the AI provided an answer that some students felt was not correct. This was a reminder that AI responses are not necessarily based on truth, but rather they’re based on the data on which the AI was trained.
After demo’ing a few times the steps of (i) modifying Q&A data, (ii) training the AI based on the data, and (iii) publishing the data so that the bot can access it, I turned things over to the students to experiment with their own AI.
The image below shows a welcome card following by a conversation in the webchat UI. The conversation is as follows:
- Bot says: “Hello Guy”
- I ask: “whats the airspeed velocity of an unladen swallow”
- Bot says: “What do you mean? African or European?”
- I say: “African”
- Bot says: “About 18 miles per hour.”
Figure 4: The bot showing a welcome card and QnA Maker responses, (including follow-up prompts,) to a question provided by a student.
A note on the accessibility of the various products used at the workshop
A lot of attention was payed in the lead-up to the camp, as to the accessibility of the products themselves. For example, how efficient is it when using a screen reader to navigate the various tables shown on the sites, and through the web chat UI. In some cases, the experiences around interacting with the site could be significantly affected by which screen readers and which browsers were being used. Also, in my experience, I found I could only enter text in the products with Windows Speech Recognition if I was using the Chrome browser, and not Edge.
If you have feedback on your own experiences with the Azure products, please do let the product teams know. The teams are always working to improve the experiences for everyone.
Summary
The workshop gave the students an introduction into three fundamental aspects of AI:
- Data
- Predictions
- Confidence
And by using Azure Cognitive Services to update data, and train and publish models, the students could get a first hand experience for creating AI. A learning for me was that if the students have a mix of levels of previous experience with AI and technology, it’s good to have options for students to explore at a level that works best for them. For example, explore other Cognitive Services, or experiment with the bot code itself, or perhaps explore other tools such as Azure Notebooks.
It’s certainly a good idea to consider options and recommendations for the students to continue their explorations into AI after the workshop.
Overall, thanks to Azure and a fantastic group of Ninja Camp students, I’d say the goals of the workshop were met. It was an interesting learning experience for everyone involved, including myself, and importantly - the students got to control their own AI.
Hopefully everyone had fun too. I know I did! ??
Guy
P.S. Some additional technical notes on Azure bot webchat development
The notes below describe how to avoid the use of the local direct line channel secret and instead use a token, and also one approach for adding support for speech input and output to the bot. The steps assume a bot has already been created, and configured to support a direct line channel.
Using a token
In order to modify bots, I download the source code locally from the Azure Portal and edit/publish with Visual Studio, rather than editing the bot source directly at the online code editor accessible through the Azure Portal. (That’s just my personal preference on how I work with the code.)
The changes I made to the bot were based on the sample bot code at SimplifiedDLConnector. So first I copied in the token-related files from that sample project to my own bot’s solution.
The steps below describe how to do this.
- Add TokenController.cs to your solution’s Controllers folder.
- Copy in the entire Models folder to your solution. (Remember to change the namespace references in the above files from the sample’s namespace to your own bot’s namespace.)
- Open your bot’s appsettings.json file, and paste in the AllowedHosts and DirectLine sections from the sample’s appsettings.json file. (Note that in the speech-related section below, you may choose to remove the AllowedHosts details.)
- Replace "<DIRECTLINE SECRET>" in the file with your bot’s actual direct line channel secret.
- Open your bot’s Startup.cs and edit the ConfigureServices() method.
- Paste in the following code from the sample bot. (The “using” statements at the top of the file will need to be updated for the code to compile.)
services.Configure<DLSModel>(Configuration.GetSection("DirectLine")); services.AddSingleton<ICredentialProvider, ConfigurationCredentialProvider>();
- I also modified the SendActivityAsync() message in my OnMessageActivityAsync(), in order to include some specific test text, as that would make the response different from the bot template’s response. This would reduce the chances of me testing a bot and later learning that it wasn’t the bot I thought I was accessing. (I created a lot of test bots while preparing for the workshop.)
- The above changes are sufficient for the bot to use the token, so at this point I’d recommend rebuilding the project to verify it builds as expected.
- Create an HTML file for the web chat UI in the “wwwroot” folder of your solution.
- Paste in the following HTML to the file. (Rename the title to be something appropriate for your bot.)
<!DOCTYPE html> <html lang="en-US"> <head> <title>Barker Bot Demo Web Chat</title> <script src="https://cdn.botframework.com/botframework-webchat/latest/webchat.js"></script> <style> html, body { height: 100% } body { margin: 0 } #webchat { border-right-style: solid; border-right-width: 1px; height: 100%; width: 600px; background-color: pink; } </style> </head> <body> <div id="webchat" role="main"></div> <script> (async function () { const res = await fetch('/directline/token', { method: 'POST' }); const { token } = await res.json(); window.WebChat.renderWebChat({ directLine: window.WebChat.createDirectLine({ token }) }, document.getElementById('webchat')); document.querySelector('#webchat > *').focus(); })().catch(err => console.error(err)); </script> </body> </html>
- Publish the bot, (and if this is a new bot being published from VS, you can import the profile for publishing from the PublishSettings file in your solution).
- Once the bot modified to use the token has been published, along with the web chat HTML to access the bot, it can be tested by visiting “https://<Your bot site>/<Your web chat HTML file>”. And if you view the page’s source in the browser, no direct line channel secret is shown.
Adding speech input and output to the webchat bot
While researching options for speech, I learnt that there are multiple approaches that seem potentially of interest, and it might not be quite clear exactly which apply to web chat UI today. So first I’ll mention something that is not supported by the web chat UI at the time I write this.
Azure is currently building support for the Direct Line Speech channel, (as discussed at Use Direct Line Speech in your bot and Connect a bot to Direct Line Speech (Preview)). So at the Azure Portal you can add support for that type of channel to your bot, as well as other channels like the Direct Line channel. With the Direct Line Speech channel, while your bot still sends and receives text, there can be a conversion performed between text and audio within the communication channel itself. So your bot’s client can deal with audio, despite your bot only dealing with text. However, while a desktop client app today can use the Direct Line Speech channel, (with related sample code for that), as I understand things, today it’s not straightforward for HTML web chat UI to be updated to communicate with your bot through the Direct Line Speech channel.
So an alternative approach today is to have the client directly call the speech to text services to convert between text and audio. This means that both the channel and the bot only deal with text. This could be achieved by building the full client code for this, but personally, I’d rather avoid having to do the work myself to achieve that.
There is a sample at 06.d.speech-web-browser/index.html, which when hosted in Chrome, (not Edge,) communicates with the demo site, https://microsoft.github.io/BotFramework-WebChat/06.d.speech-web-browser/, and that sample supports speech input and output at the webchat UI. So my belief here is that there are three things of interest involved, (i) the client UI html, (ii) the referenced webchat.js which is doing all the work to perform speech to text and TTS through the browser, and (iii) the target bot, which doesn’t do anything specific for speech, (unless it wants to customize the speech output by passing a specific string to be spoken through SendActivityAsync().)
If you’re interested in experimenting with speech in your own Azure web chat UI, consider the steps below. The steps follow on from the details listed above related to setting up web chat to connect with a bot through a token, rather than an embedded direct line channel secret.
A couple of points of interest first:
- I’ve never got the web chat Speak button to appear when the UI is hosted in the Edge browser, so I’ve been using Chrome for my testing.
- I originally found that the Azure demo web chat seemed to work fine, (in Chrome,) yet my almost-identical html didn’t work at my own site. It turned out the web chat Speak button didn’t work because my bot site wasn’t secure. It seems that my use of "AllowedHosts": "*" in my appsetings.json file had left the site insecure and so it didn’t have access to the microphone. I’d copied in the AllowedHosts line with the DirectLineSecret content from the SimplifiedDLConnector sample when I moved from using an embedded direct line channel secret, to using a token. Once I’d removed the AllowedHost line, my site could access the microphone and the speech to text and TTS seemed to work as required with webchat and my bot.
So the steps of interest are:
- Remove the AllowedHosts line from your appsettings.json if you copied it in from the sample.
- Consider whether you want the bot’s response for speech interaction to be different from the response associated with text interaction. If the responses are to be different, change the call to SendActivityAsync() accordingly. For example:
await turnContext.SendActivityAsync( "This is the response for text output.", "This is the response for speech output.", "acceptingInput", // Hint cancellationToken);
(Otherwise, leave the call as it is.)
- Publish your bot.
- Create the web chat html based on the sample HTML below. I’ve highlighted the line which is of most interest to speech in yellow, and highlighted in blue the target site which you’d want to replace with your own site.
- Visit the site hosting the web chat UI, using Chrome.
- Type something into the webchat, and verify the response appears as text as expected.
- Invoke the webchat’s Microphone button. If you’re asked by the browser whether the site should have access to the microphone, accept it, and verify that the site doesn’t display UI indicating the access to the microphone has been blocked.
- Speak something into the webchat, and verify the response appears as both text and audio.
The image below shows the resulting web chat UI in Chrome, including the Speak button. The chat history contains the following:
- The bot’s introductory text of “Hello and welcome!”
- The input I typed into the web chat UI, this being: “I typed this sentence.”
- The bot’s response of “This is the response for text output.”
- Recognized text for the speech I input at the bot, the recognized text being: “I spoke with sentence”.
- The bot’s response of “This is the response for speech output.”
Figure 5: The Azure webchat UI enabled for speech input and output.
The following is the HTML for the speech-enabled web chat UI.
<!DOCTYPE html> <html lang="en-US"> <head> <title>AI4A Bot Speech Demo Web Chat</title> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <script src="https://cdn.botframework.com/botframework-webchat/latest/webchat.js"></script> <style> html, body { height: 100% } body { margin: 0 } #webchat { border-right-style: solid; border-right-width: 1px; height: 100%; width: 600px; } </style> </head> <body> <div id="webchat" role="main"></div> <script> (async function () { const res = await fetch('https://ai4abotdemo.azurewebsites.net/directline/token', { method: 'POST' }); const { token } = await res.json(); // Pass a Web Speech ponyfill factory to renderWebChat. You can also use your own speech // engine given it is complaint to W3C Web Speech API, https://w3c.github.io/speech-api/. // For implementor, look at createBrowserWebSpeechPonyfill.js for details. window.WebChat.renderWebChat({ directLine: window.WebChat.createDirectLine({ token }), webSpeechPonyfillFactory: window.WebChat.createBrowserWebSpeechPonyfillFactory() }, document.getElementById('webchat')); document.querySelector('#webchat > *').focus(); })().catch(err => console.error(err)); </script> </body> </html>