What is Surface Automation, really?
Last week, when my team was having the final prep before a presentation with a group of senior officers from Operations of a major health insurance company, a teammate asked me:
- Hewitt, I saw you put Surface Automation and Frequency of Screen Changes as two separate drivers in our development effort assessment. Explain to me, why are these two mutually exclusive? and why are these two having the highest weights?
I put my technical thoughts into words without thinking about my teammate's business background:
- Surface Automation is an image-based approach to automation. Whatever keys (alphanumeric or cursor) you sent to screen using this approach are, in essence, sent to an image or a portion of the screen. Frequency of Screen Changes, by itself, is a variable in estimating development efforts, because developers don't need to use Surface Automation and still have to spend more time if the number of changes in screen is high in a process.
My teammate looked as if he got even more confused:
- What's image-based approach? Isn't that supposed how most RPA tools typically work?
- No, typically, most programs will communicate with each other at API level.
- What is API?
- ....
I realized my teammate's lack of fundamental technical understanding behind RPA tools was not unique. I did not know it either just a year ago. Neither did many folks I've worked with so far.
That itch is motivation for me to dig deeper into the tool documentation and venture an answer that is more understandable to many professionals learning about the RPA technology.
So let's begin...
In most modern business applications, there are buttons, fields, check boxes, etc. Imagine this surface of a application as a landscape made of those elements, being arranged such a way for users to click, type and check for a business purpose. Those are, user interface elements aka UI elements.
What's API approach?
Using some convention such as Win32, Active Accessibility or HTML, Blue Prism (BP), can, to some certain extent, extract the "address" of the UI element being captured or spied and suggest options for users to choose what type of actions BP can do on it.
It's similar to in accounting and law, a professional refer to a standard or a law by giving you a reference in a form of notation. Looking at the notation, you can tell quickly the law domain, section, chapter, sub chapter, paragraph, subparagraph and bullet point of the law being referred to. Any trained professionals can tell how helpful the notation is in their law research process.
Similarly, these "addresses" give BP the ability to access the UI elements in a split of seconds as the code runs. Once those "addresses" and the actions have been stored in BP, they are "remembered" or "trained".
A potential downside of this approach is the inherent limitation of the convention mentioned above. For example Win32 is .NET and C-based, so you can't expect it perform well in SAP, a software written in ABAP and Java. Active Accessibility is aging and receiving less support. HTML is only applicable to application hosted in web browser.
What's image-based approach (Surface Automation in BP terms)?
In some situations, BP is unable to use those "addresses" reliably with available conventions, the image-based approach is the go-to alternative. In this approach, BP treats the landscape as a smooth, featureless interface.
In another word, BP treats the interface as an image. Thus, fields, check boxes and buttons are identified through their relative position to the outer frame instead.
As you may not know, one (x,y) coordinate represents one pixel on the screen. Then, two (x,y) coordinates make up the position of a rectangle by referring to the higher left end and lower right end point. One rectangle make up a region. And with region, users essentially can do two types of action: press keys and clicks. To put it simply, BP will have to be "trained" with the series of x,y coordinates and the type of actions to perform on those regions.
An analogy for this approach is a pilot being tasked with sending humanitarian aids to a target from high above. He is informed with the geographic coordinates of the target. Then factoring his directional speed and wind directional speed, the computer gives him the coordinates and time he must drop the aids.
Why is Surface Automation sub-optimal?
Come back with sending human aids example and let's compare two different approaches, which represents Surface Automation and API Automation respectively.
While Surface Automation is similar to sending from high above, API-based Automation approach is similar to having a worker deliver the aids from the ground, identifying targets using street address or a "remembered" path.
However, using coordinates to send clicks and keys underlines a few assumptions developers must be cautious about.
First of all, the position of the region trained as an UI element must not change between the time of configuration and deployments. In the world of software development, UI interfaces change continuously. That means developers will have to spend time "re-train" the tool once it happened.
Second, screen resolutions may differ from environment to environment. A pixel noted as (200,100) in a 1024 x 768 and in a 1920 x 1080 screen means two different places.
Third, in most cases, when developers have to use Surface Automation, it often involves more complex setting of time breaks between actions and knowledge of application shortcuts. Therefore, more advanced skills are required to deal with Surface Automation.
For those reasons, Frequency of Screen Changes and Surface Automation are mutual exclusive and should top the list of factors in assessing development efforts for a project.
Comments are welcome!
[Side Note]: Today, September 28th 2017, Blue Prism announced the release of version 6.0. The tool claims to include more features in Surface Automation that increase performance and reduce complexity for users in designing and building. As a developer, I'm more than ecstatic about this move >:)