Can ChatGPT help improve accessibility? A quick proof of concept

Can ChatGPT help improve accessibility? A quick proof of concept

This article is written in between sessions during the SAP UI5 2023 conference in Germany. It's a first draft. I might come back to it later to edit or enrich it.

UI5 conf is a great event! I just got very enthusiastic from a discussion with Oliver Stoyanovski (SAPUI5 Framework Developer at SAP) and Nikolay Kolarov (Product Expert, UI5 Accessibility Lead at SAP about the accessibility features in UI5). We chatted about the future of accessibility from a technical point of view and, of course, the part about AI and ChatGPT came in.

No alt text provided for this image

Eventually, we thought that we could run a quick proof of concept. The thoughts flow was as follows:

Visually impared users need a way to convert the web content into an audible version instead of the visual in order to understand it, navigate through it and interact with it.
Aria attributes and third-party tools (e.g. JAWS) are currently necessary in order for the computer to understand the structure of the content and enable interactions.
What would the interaction be like when we involve the openAI?

Who is the user?

Take a visually impaired user. Let's call the user Sandra.

No alt text provided for this image

User's key activities

Sandra wants to do the following:

  • Get to know the content
  • Navigate through the content
  • Interact with the content

In our proof of concept, we wanted to check what openAI can mean for each of these.

Current way

No alt text provided for this image


Sandra uses a keyboard (mainly the tab key) to navigate through the web page and a Reader that reads the content. Just like that, she knows what's on the page and how to interact with the components like clicking buttons and links, moving between pages, etc. Check the video below to get a better hint of how that works.


The Reader needs to know what to read and what interactions are available. For this, the HTML elements need to have special aria attributes defined. There are standards that define these attributes and roles. See WCAG.

Future way: with openAI

OpenAI is capable of "understanding" the content and interpreting it in a nearly human way. We imagined a future where AI takes the HTML code as input for reference and assists Sandra in understanding, navigating, and interacting. This might be done via API on a browser level (e.g. a Chrome extension). In our quick proof of concept we worked with the UI version of ChatGPT.

Below is the web page that we used. It contains one table. Every row has a delete button. Enough content to test our assumptions.

No alt text provided for this image

We then copy/pasted the HTML code in the ChatGPT and asked the following questions that Sandra probably might want to ask:

  • What does the table contain?
  • How many people are on the table?
  • Can I interact with the items on the table?

See the snippets from the conversation with the openAI below.

No alt text provided for this image


No alt text provided for this image
No alt text provided for this image

Conclusions

Looking back at the key activities, we concluded that the OpenAI might assist as follows:

Get to know the content [? succeeded]

OpenAI recognized the table and rightly described the columns and data in cells. It, however, didn't report the correct number of rows, but we think that this is something fixable with proper model training and better prompts.

OpenAI answered correctly what data is there in a specific row. We assume that it can very well assist in navigating through the content. Let's say that Sandra wants to know all the first-level headings in the content or titles of tables, the OpenAI can quickly respond. Sandra doesn't have to traverse the whole content with the tab key all over and over.

Navigate through the content [? failed]

Interact with the content [? failed]

In both activities mentioned above, the OpenAI, as we expected, failed, because it doesn't have control over the code. It could output a modified HTML code that would be rendered in the browser, but that doesn't sound like the best solution. We envisioned a combination of third-party tools, like a Chrome extension, working hand in hand with the AI. In that scenario, Sandra would request to focus on the delete button in the second row, or more specifically, on a delete button in the row related to the person named Dente. The extension would listen to the OpenAI which could for example output ID of an element to focus on. Sandra would then hit enter to initiate the interaction.

Conclusion

Well-formed prompts

In our test, we acknowledged how much it matters to formulate the prompts correctly. Check the whole conversation below. Poorly-formed prompts lead to unwanted results. Again, this is also a matter of better model training.

Bigger scenario

Some bigger thoughts arose. Instead of supplying OpenAI with just one web page, why not supply the whole site code. Could OpenAI understand the relationships between the pages? Sandra, after landing on the homepage and wanting to delete Dente from the table, would be able to ask straight away something similar: "Hey, I need to delete Dente from the people's database, can you help?". Chrome extension could then navigate to the delete button and inform Sandra that she can hit enter.

This scenario might work for static content, but I'm not sure how it could work for dynamic content.

Will AI kill HTML?

HTML was invented to provide a semantic structure, i.e. something that computers can understand and operate with. Do we still need HTML since the AI "understands" unstructured content?

That's it!

What are your thoughts?


Glossary

UI5 is the HTML5 framework for creating cross-platform, enterprise-grade web applications. If you want to know more about UI5, head over here. The discussion unrolled when we started to talk about the future of accessibility and

要查看或添加评论,请登录

社区洞察

其他会员也浏览了