UI Automation: An Incomplete Guide for UI builders – Part 1
Guy Barker
Exploring how to make apps more accessible to more people, and sharing all I learn.
This series of articles gives an introduction into some aspects of the Windows UI Automation (UIA) API. Part 1 in the series discusses how a number of UI frameworks take action on behalf of apps to represent app UI through the UIA API. Part 2 in the series describes some real-world investigations into how apps’ UIA hierarchy, properties, patterns and events were each impacting the customer experience.
The series of articles is aimed at UI builders, not builders of assistive technology. The articles don’t describe all the things that UI builders can do to influence the UIA representation of their UI. They also don’t describe how helpful it would be to customers if product designers designed the UIA representation of their UI at the same time as they’re designing the UI’s visual representation.
If the articles did discuss all that, then they wouldn’t be such an incomplete guide to UIA.
Introduction
UI Automation (UIA) is an API which assistive technologies such as screen readers can use to help customers interact with a wide range of technology. For example, if a customer who’s blind or has low vision is using a screen reader while navigating through an app, the screen reader might use UIA to access data relating to where the customer is in the app, or to programmatically control the app.
For many years, most teams building UI which ships on Windows, have been shipping at least two versions of their UI. One of those versions is the visual representation, and traditionally gets a ton of attention while building the product. For example, how many apps do you know where adjacent buttons like OK and Cancel buttons are misaligned in the vertical by a few pixels? Product teams would typically consider such misaligned buttons to be too low a quality design to ship.
The other version of the UI that teams have been shipping is the programmatic one. That’s the one that doesn’t have a visual representation, rather it’s the interface that’s exposed to assistive technologies to access and convey to your customers in some form beyond purely the visual representation. For example, a screen reader might consume the programmatic representation, and generate related audio for your customers.
Many years ago, most products teams wouldn’t pay any serious attention to the programmatic representation of the product they shipped. Some team members wouldn’t even know they were shipping such an interface. It’s a natural tendency for a team member to focus on the interface that they themselves would consume as a customer of their product, and that would often be the visual one. This could lead to products being unusable to customers who relied on the team shipping a high-quality programmatic interface.
Things have changed since then, and more teams are now paying close attention to the programmatic representation of their product, as exposed through the UIA API.
I can’t emphasize enough how someone on a product team must know exactly what programmatic representation is being shipped, and what the resulting customer experience is. A team may feel that their programmatic interface is 90% accessible, and feel that sounds pretty good, but a single blocking issue could prevent a customer from completing a task that the customer needs to complete in order to do their job. And in addition, some business organizations and governments cannot or will not purchase software that they consider to be programmatically inaccessible.
All in all, product teams typically now feel it’s appropriate to take reasonable steps to ensure a high quality experience for all their customers, regardless of which of their products’ interfaces their customers leverage.
This article gives a little background on the UIA API, and discusses how UI frameworks do work on behalf of apps to enable customers to interact with the apps through the UIA API.
A brief history of UIA
Before UIA, an API called Microsoft Active Accessibility (MSAA) provided some support for programmatic accessibility on Windows. Through an interface called IAccessible, early screen readers could access information such as the names of buttons, and the values of sliders, and announce that data to your customers. Over time, it became apparent that MSAA was not sufficiently feature-rich nor performant for customers in practice, and so UIA was introduced to address those constraints.
Important: While MSAA still exists today, most app developers would have no reason to be involved with it, so by default, never get drawn into reading documentation about MSAA. Instead focus on UIA. In addition, UIA itself includes something called “LegacyIAccessible”. LegacyIAccessible will almost certainly never be something that will affect you or your customers, so don’t spend time on it.
The UIA API is made up of a client API and a provider API. The client API is implemented by an assistive technology which is being used by your customers as they work at an app. For example, the Narrator screen reader uses the client API to access the text in an edit control, and Windows Magnifier uses the client API to access the bounding rectangle of the element with keyboard focus, to ensure that that focused element lies inside the magnified view, and the Windows On-Screen Keyboard also uses the client API.
The UIA provider API is implemented by an app which has data to provide to the assistive technology that’s using the UIA client API. But this raises an interesting question. Given that few app teams implement the UIA provider API themselves, how come their apps are still exposing a programmatic representation through UIA?
The answer is that in many cases, someone’s doing all the UIA provider work on the app’s behalf.
Figure 1 below shows two processes. On the left is a client process, such as a screen reader implementing the UIA client API, and on the right is a provider process, such as an app. In the app process, the UI framework itself has implemented the UIA provider API. The UI framework understands the semantics of the UI shown in the app, and when the app is queried for UIA data related to the UI, the UI framework can respond appropriately.
For example, say a screen reader wants to know the name and control type of the control where the customer is working, and the screen reader uses the UIA client API to access that data. In response, if the app of interest is a WPF app, then WPF itself would know that the control is (say) the “Save” button, and so would implement the UIA provider API in order to provide that data through UIA.
In order for this communication between client and provider to happen, there needs to be a secure channel of communication set up between the two processes, and UIA itself takes care of setting up that channel.
Figure 1: A secure channel is set up by UIA to communicate between the UIA provider process such as an app, and a UIA client process such as a screen reader. The UI framework in the app is implementing the UIA provider API on the app’s behalf.
While a number of UI frameworks on Windows do implement the UIA provider API, there are some cases where this does not happen. For example, if the provider process is using an older type of UI where the UIA provider API has not been implemented, yet the old MSAA data is available, UIA itself will convert the MSAA data available into the equivalent UIA data. This is done through UIA’s “MSAA Proxy”.
It’s worth noting here that the UIA API itself is available in two flavors; a Windows COM API, and a .NET managed API. Given that the question of which API might be most attractive in a given situation typically arises for UIA clients, and this article is aimed at UI builders, discussions on the differences between those APIs is out of scope for this article.
With each release of Windows, there are some enhancements introduced to UIA. Some of these enhancements relate to support for new scenarios, such as those involving Windows Defender Application Guard. Other enhancements include periodic updates to the API itself, such as the introduction of UiaRaiseNotificationEvent(), which enables apps under very targeted scenarios to request that a screen reader make some arbitrary announcement.
It’ll be interesting to watch UIA evolve in the future.
Who is your UIA provider?
Typically, an app doesn’t implement the UIA provider API itself, and if a team really wants to understand the UIA interface they’re shipping, it can be important to know exactly who is implementing the UIA provider API on the app’s behalf.
I periodically get questions around why an app’s UI is being represented in a particular way through UIA. I can’t begin such an investigation without knowing which UI framework is being used. The exact details of how the UI’s being represented through UIA may be different depending on whether UWP XAML, WPF, WinForms, Win32, Edge, or something else is involved. So I must learn what type of UI is involved first.
The product team can give me a head start with that, in that they can tell me that they’ve built (say) a UWP XAML app or a WinForms app. But what if some part of the app’s UI is hosting another type of UI, and it’s that hosted UI that’s really of interest? To be sure I know what’s really going on, I need to use a handy tool that lets me explore the UIA representation of an app. The tool is Accessibility Insights for Windows, (AIWin).
Important: When I want to learn about the programmatic representation of UI, I never start by pointing a screen reader at the UI. If I did that, and the screen reader experience was not as I expect, how can I know whether the issue lies with the UI, or is related to some behavior of the specific screen reader I’m using? So instead, I always point AIWin to the UI, as that can give me a good understanding of the UIA representation of the UI. Once I feel confident that the UIA representation is a good semantic match for the product, I can then point a screen reader to the UI and get a feel for the customer experience.
Note: An earlier tool included in the Windows SDK, called Inspect, could also be used to access the UIA representation of UI. Despite having used Inspect myself for many years, I moved to use the AIWin tool instead last year due to the AIWin tool’s ease of use.
The AIWin tool can be extremely helpful when learning about the UIA hierarchy, properties, patterns and events associated with some UI. For a very quick introduction into those topics, visit the following:
- Introduction to UIA - At a Glance
- Part 1 Hierarchy of UIA Elements - At a Glance
- Part 2 UIA Element Properties - At a Glance
- Part 3 UIA Element Behaviors - At a Glance
- Part 4 UIA Change Notifications - At a Glance
There’s also a longer introduction into UIA at Introduction to UIA: Microsoft's Accessibility API. (That video was made before the AIWin tool was available.)
So when I need to learn about the UIA provider associated with some UI, I’ll point AIWin tool at the UI of interest and check the UI’s UIA properties.
The first property I check is the FrameworkId, which is listed up at Automation Element Property Identifiers. Like all the UIA properties shown in AWin, the FrameworkId is provided by the UIA provider itself. For example, Figure 2 below shows the AIWin tool reporting that the FrameworkId property of a button showing the text “…” at the top right corner of the Microsoft Store app is “XAML”. That means the UWP XAML framework is implementing the UIA provider API for the UI in the case of that button.
Figure 2: The AIWin tool reporting that the UIA FrameworkId property for a button in the Windows Store app is “XAML”.
Figure 3 below shows the AIWin tool reporting that the FrameworkId property of a button showing the text “…” near the middle of the Microsoft Store app is “MicrosoftEdge”. That means the engine for classic Edge is implementing the UIA provider API for the UI in the case of that button.
Figure 3: The AIWin tool reporting that the UIA FrameworkId property for a button in the Windows Store app is “MicrosoftEdge”.
Hours of fun can be had by pointing AIWin to a variety of UI and learning what UIA provider is implementing the UIA provider API on behalf of the UI. The list below shows the various FrameworkIds reported as I pointed AIWin to a bunch of features on my own machine.
Important: Just because the elements I happened to point AIWin at in a feature happened to report the FrameworkId I listed below, doesn’t mean that that’s the only FrameworkId exposed across the entire feature.
- Windows Setting app: "XAML"
- Microsoft Blend for VS: "WPF"
- Barker’s Herbi HocusFocus: "WinForm"
- Windows Explorer: "Win32"
- Classic Edge: "MicrosoftEdge"
- Chromium Edge, with UIA flag enabled: "Chrome"
- Chromium Edge, with UIA flag disabled: "Chrome"
- Teams app: "Chrome"
- “Copy” button on Word ribbon: AIWin reports: “Property does not exist”
Once we know the FrameworkId, we can feel more confident that we know who’s implementing the UIA provider API on behalf of the UI, and we can consider what are the known constraints of that UIA provider. For example, say we’re investigating why in some app, a data grid doesn’t support the UIA Table and Grid patterns which are expected to be supported on this type of UI. Once we know the data grid is a WinForms component, we should consider what version of .NET is being used, given that by leveraging recent versions of .NET Framework, the expected UIA patterns are supported on the DataGridView control. (For more information on that, visit WinForms: The expected UIA patterns are not available on a DataGridView.)
Very occasionally, we may want to dig deeper into exactly who’s implementing the UIA provider interface on behalf of some UI. Another UIA property, the ProviderDescription can sometimes help with that. That property is a long text string, but it’s the stuff at the end of the string that can be particularly interesting.
Below are some thoughts on the ProviderDescriptions reported by the AIWin tool for the various types of UI that I looked at, and which had the different reported FrameworkIds.
Win32 FrameworkId reported
For this older type of UI, UIA might find that the UI supports the old MSAA API, and so will convert that MSAA data to UIA data. In that case the ProviderDescription contains “MSAA Proxy”. And there are some cases where UIA might interact with the target UIA itself through its own control type-specific proxies. For example, when interacting with a Win32 TreeViewItem, the Provider Description would contain “TreeView Item Proxy”.
In both the above cases, given that UIA itself is doing the work, the component listed in the ProviderDescription is “UIAutomationCore.dll”.
WinForm FrameworkId reported
In some cases the ProviderDescription shown for WinForms UI is again UIA’s MSAA proxy or control type-specific proxy. That can happen when WinForms is wrapping a Win32 control, and UIA is interacting with that Win32 control.
However, in other cases WinForms has taken specific action to enhance the UIA representation, in order to improve the customer experience. In that case, the UIA ProviderDescription will include “System.Windows.Forms”, and possibly something similar to “System.Windows.Forms.InternalAccessibleObject”.
All the action taken around accessibility by WinForms is publicly viewable at Reference Source. For example, DataGridViewAccessibleObject.cs shows how WinForms uses UIA’s IAccessibleEx interface to add support for the UIA Grid pattern and Table pattern to the DataGridView.
XAML and WPF FrameworkId reported
Both UWP XAML and WPF are native UIA providers, meaning that they implement the UIA provider API themselves. For UWP XAML and WPF, the ProviderDescription contains “Windows.UI.Xaml.dll” and “PresentationCore” respectively.
Chromium Edge FrameworkId reported
With Chromium Edge’s UIA flag enabled, Chromium Edge behaves as a native UIA provider, and its ProviderDescription contains “msedge.dll”.
With the UIA flag disabled, UIA converts the IAccessible2 data provided by the Chromium engine into UIA data, before returning the data to the UIA client. It does this through its own proxy, and the ProviderDescription will contain “MSAA Proxy (IAccessible2)” in this case.
Important: The data returned by Edge to a UIA client like Narrator, which depend on whether the “Edge” in question is classic Edge, Chromium Edge with its UIA flag enabled, or Chromium Edge with its UIA flag disabled. So when investigating an issue, it’s important to be aware of exactly what form of Edge is being discussed.
FrameworkId “property not available” reported
The UIA FrameworkId property is just a property returned by a UIA provider. If the provider decides not to return a property, then the AIWin tool reports this with “Property does not exist”. So if necessary in that case, the ProviderDescription property might provide more information.
In the case of the Word Ribbon’s Copy button, where a FrameworkId is not provided, the ProviderDescription contains “mso40uiwin32client.dll”, which is presumably some Microsoft Office component.
Your customers are still depending on you to take action
While often the UI framework that’s being used by your app does a great deal of work to help your customers leverage all your great features through UIA, your customers are still depending on you as the app builder to take action to ensure your app’s UIA representation is high quality, or in some cases, usable at all.
For example, depending on how the UI is implemented, in some cases by default, a button, combobox, or edit control might not expose a UIA Name property by default, and so your customer won’t be informed of the purpose of the control when they encounter it. However, in those situations, most UI frameworks make it very straightforward to address issues like this which are so fundamentally important to your customers.
It’s out of scope of this article to discuss all the ways that different UI frameworks make it practical for app builders to influence the UIA representation of their UI, but the sections below include a few references to some relevant resources.
General Windows desktop
- The various items at Code samples for resolving common programmatic accessibility issues in Windows desktop apps were compiled following real-world investigations with apps’ accessibility. For example, if you want thoughts on how to prevent Narrator moving to some hidden WPF elements, visit WPF : Preventing a screen reader from encountering an element.
WPF
- AutomationProperties Class
- UI Automation of a WPF Custom Control
- AutomationPeer Class
- Reference source for WPF UIA providers, such as the DataGridAutomationPeer.
- Common approaches for enhancing the programmatic accessibility of your Win32, WinForms and WPF apps: Part 4 – WPF. (I need to update this article, but it may still be of some interest as it is.)
UWP XAML
WinForms
- AccessibleObject Class
- Reference source for WinForms UIA providers, such as the DataGridViewAccessibleObject.
- Common approaches for enhancing the programmatic accessibility of your Win32, WinForms and WPF apps: Part 3 – WinForms. (Yeah, I need to update this too, but it may still be of some interest.)
Win32
Web
- Core Accessibility API Mappings 1.2 and HTML Accessibility API Mappings 1.0 include details on how browsers are expected to map HTML and ARIA to UIA.
- WAI-ARIA Authoring Practices 1.1 and Accessible Rich Internet Applications (WAI-ARIA) 1.2 help UI builders follow expected practices around use of ARIA.
Summary
All your customers need to be able to efficiently access all your functionality, regardless of whether they do so through one or both of your visual or programmatic representations. In order to enable a screen reader to be able to access your app’s data on behalf of your customer, UIA does all the work to set up and maintain a secure communication channel between your app and the screen reader.
In many cases, the UI framework you use will do a great deal of work for you by default, and implement the UIA provider API through which information about your UI can be accessed by UIA clients such as screen readers. The UI framework will do the work to generate the UIA representation of your app based on how you defined the UI. You’ll get a substantial UIA representation by default, and then you can take the additional steps that your customers are depending on you to take, to complete the customer experience.
In Part 2 of this series, I’ll discuss some real-world investigations into accessibility bugs in apps built in a variety of UI frameworks, where at least one of the apps’ UIA hierarchies, properties, patterns, or events, played some part in the investigations.
Guy
AI/ML Research
1 年Thank you. Is there a github example of an app which uses the API to automatically read the current word a user is typing in a different running app, or increasing/bolding the current word the user is typing?