Issue Detection Within Minutes?
In this article I would like to express my opinion on the automated testing and monitoring procedures available for operators to use in various phases of their product cycle.
Each listed individual test procedure is known to have limitations and cannot consider all essential aspects, but the combination of Acctopus Degust? with its ability to leverage existing third-party tools provides the ideal test solution for mobile and fixed network operators. It is good to know that Degust? does not need to replace existing systems, but integrates with them.
There are many misunderstandings about which method can help in which area and which method could be obsolete because another method will deliver the same results. I will try to explain why classical end-to-end testing is not enough and I will also focus on the applicability of the approaches for development, type acceptance, deployment and operation.
For the quick readers I have created a table with my assessments. You may immediately think over your current strategy, or you may first read the complete article to see why and how these conclusions were made. In any case, please reach out to me so we can discuss how Acctopus can help you!
In fact, there is not THE PROCEDURE which caters all required test cases. Any of the mentioned approaches are perfect, if you set them in the correct context.
Testing and Monitoring
The reasons for testing and monitoring are obvious:
- Delivery of high-quality products to customers
- Secure revenues / avoid losses
- Increase profits and reduce costs
But as complexity raises and product versatility extends, manual testing has already reached its limits.
The power of automation
You tell me. The greatest strength in automation is the depth of testing, repeatability and accuracy because the human factor has been taken out of the equation.
Assume you want to test 8,000 situations regularly. You won't be able to run them manually. 8,000 tests? I am not kidding!
Operators can easily have 20, 50 or even much more different services. Within these services, they have hundreds of tariff options for different customer groups, such as:
- QoS,
- volume granted,
- time of day,
- device type,
- radio access type dependencies,
- CS, VoLTE or VoWiFi calls,
- combinations with fixed broadband or media services,
- handover and fallback scenarios,
to name just a few.
If you now calculate the possible combinations, you will come to a 6-digit number of usage scenarios. So, 8,000 is only a reasonable fraction of it.
Or think of a nightly service upgrade or activation where you often need the status of your service within minutes to an hour.... think of 8,000 tests now. Scary! Isn't it?
It is also clear that operators need to automate as much as possible in order to
- keep costs down,
- relief of personnel from manual, repetitive and protracted activities
- to get rid of human mistakes (yes, sounds charming!) and
- to speed up the tests to shorten the time-to-market, which again raises revenues.
In this article, I won't go too much into third-party products, but I'll point out some pro and cons. Sure, as a provider of an automation service, I take the liberty of mentioning the benefits of our product Acctopus Degust?. So, if you don't like my review of a particular product or approach, please send me a message.
There are several approaches to automated testing and service monitoring. They are different when it comes to the type of network, be it a mobile network, a broadband network or an IoT service. Even within these areas, testing is different for certain reasons, e.g.
- in case of xDSL, cable or FTTx, the network is built up with completely different systems, or
- in the evolution from LTE to 5G a complete change in signaling protocols has been defined.
Since the networks are different, the test approaches must also be different. Still the challenge is to perform as many tests as possible in a single tool to limit the operator's internal effort for training, reporting, integration and interfaces. It also makes sense to have most of the tests in a single tool, as it is the only possibility to issue cross-technology tests and monitoring.
On the other hand, 'only one single tool at any price' is wrong because there are many highly specialized tools out there or there are market leaders who already have a good, globally installed product with thousands of client systems, but can do just that. In this case the test tool should be able to control or integrate such other tools.
Some tests even need only a part of the whole. For example, the transport network may be completely irrelevant for some of the approaches listed, as they only examine the user's QoE (Quality of Experience) from an APP point of view. I will deal with these dependencies and exceptions as much as possible.
The following list tries to arrange the testing procedures according to the user's point of view and then drills more and more into testing the user's sessions as experienced on the core elements of the network - where most errors may occur.
To get a fair overview of the pro and cons of the approaches, I recommend reading the complete list. The reason for this is that some procedures give a false good feeling despite complex graphs and tables, because they only scratch the surface. This is not because they are not able to find errors, but they are simply caught by operators policies to present "no error to the customer". In these cases, any End-2-End test approach is tricked into getting a completely false impression of the state of your environment.
Also, some approaches, like tracing packages or reading log files, are not considered "testing" at all, but are necessary to me as they will complete the picture.
The major testing approaches
- APP or Web GUI testing
- End to End testing using real devices, involving the complete network
- Evaluation of gathered data
- Simulation of signaling traffic as a client
- Service simulation for signaling
- Gathering log files or traces from operator’s systems and probes
APP or Web GUI tests
The primary goal of this approach is to test, whether an APP or Web GUI is usable and will deliver the correct date in reasonable time. It is also interesting for testing back end services and charging, as the usage of resources by the device will often interact with operator’s services.
The applications under test can be internal applications on this device or operator APPs and are provided in the distribution channel by the device manufacturer or the operator. Examples are an application for booking tariff upgrades or just the internal phone APP to place a call.
In this approach, client software runs on a device to control these applications. In any case, where a local connected PC is required to steer a phone, I recommend moving on to the next vendor, as this requirement will not economically scale at all. Even if you get this PC based approach delivered as a service, I doubt that you will get a fair price.
The client software on the phone mimics user interaction to test a wide range of required functions, such as an application in development, a remote service, by accessing it from that device, establishing connections such as calls or data sessions, or sending messages.
During MWC2019, this seemed to be the most commonly presented testing approach. It is also the only solution for testing an application on a particular device.
When it comes to remote service testing, it seems obvious that it is not a 100% test solution. As mentioned earlier (‘no error to the customer’ - policy), it only gives an overview of the part of the scenario that is presented to the user. To get an overall picture of a service, these tests must be complemented by the results of other tests.
Pro:
- Will approve a certain application runs correctly on that device
- Uses real devices to test a service
Con:
- No parallel testing, as devices are not shareable among different tests. This does not scale.
- No clear report on the source of the error, as it can’t be seriously determined by a user.
- Hard to be used in test beds: e.g. if RAN or Core is not fully running, tests are impossible.
End to End testing using real devices, involving the complete network
While it may look like the above approach, this approach focuses on the End-2-End testing of an operator’s service.
This approach must be split into several derivatives.
- Use of real devices that a user would use (e.g. mobile phones, broadband CPEs, VoIP phones...). To run the actual tests the device can run client software, as mentioned above, or it can run a background task to communicate directly with the core communication services on that device.
- Some implementations require a PC connected to the device to steer the tests. While it appears outdated for phones, it may be necessary for CPEs. This doesn’t scale much but may be the only alternative.
- Use of specially designed hardware to test specific functionality in the network.
When it comes to cellular networks or IoT, any of the above solutions may use SIM card distribution systems or eUICC/eSIM deployments to change the actual subscriber on a device for the test.
This approach is mainly used to test the operator's network and services and always has some challenges to keep in mind:
- The use of real devices that a user would use depends on the software and updates on that device. This can disrupt use due to future updates by the vendor.
- Specially designed hardware is not as vulnerable and only needs to be updated by the vendor when testing requires it. On the other hand, it can be more difficult to keep pace with the development of future technologies used by the operator.
- Using a computer to control a device is a relatively safe option if you turn off automatic updating on the computer. But still scaling is an issue.
- The ability to distribute SIM cards or support eSIM is a must have to switch subscribers on the device and test multiple different tariffs or services from a single device.
Cellular End-2-End, regardless of the above derivation, always rely on not shareable resources. To test connectivity, location services, roaming or even e.g. booking platforms for a subscriber, you always need an available device and a SIM card or eSIM profile provided for the scenario under test. There is no parallelism for SIM cards and cellular modules, which makes it a pain to do bulk testing.
In addition, distributing the SIM card to the device and registering it on the network consumes a larger portion of the test cycle, slowing down the number of tests an individual device could run over time.
The only possibility to overcome this disadvantage is to significantly increase the number of devices and SIM cards assigned to the tests, but there is no economical way to perform 8,000 tests in minutes using this procedure.
Pro:
- real devices to test a service
Con:
- No parallel testing, as devices are not shareable among different tests. This does not scale.
- No clear report on the source of the error, as it can’t be seriously determined by a user.
- Hard to be used in test beds: e.g. if RAN or Core is not fully running, tests are impossible.
Evaluation of gathered data
The goal of this approach is to detect deviations from the normal state in signaling and logging. For this the operator collects as much data as possible from as many systems as possible. This data is collected and stored for later analysis.
The available tools are usually in the area of business or revenue assurance. If they find a problem, it is considered very effective because the problem has been uncovered from thousands or even millions of records.
Although the tools can be quick in storing and evaluation of signaling records, results are usually expected after hours or days. Since this method does not deliver in real time and relies on heavy data, it is not suitable for testing a single user session or tariff model.
However, the results are very well suited to feed other test or monitoring tools, to pay special attention to special events in the future.
Pro:
- Very effective in finding massive problems
Con:
- No real-time reports
- Every finding must be explained separately
- Not suitable for test beds
- Does not detect errors in single sessions
Simulation of signaling traffic as a client
The goal of this approach is to issue signaling traffic for a certain scenario to examine the result of a server. For data connections, the client could assume the role of a packet gateway to pretend a complete user session including roaming status and usage. In this scenario, the simulated PGW would expect replies from AAA, PCRF or OCS which must follow rules to get a correct session set up.
This is by far the fastest way to detect an error at the source, as the replies from the servers are predictable.
The approach has many advantages:
- It is real-time because it focuses on individual, self-created user sessions.
- It performs an instant review of server responses.
- It is very fast because it omits the entire overhead of radio or broadband connectivity and registration.
- It is affordable as it does not require hundreds of UEs, and it does not invoke heavy IOT charges for international tests.
This approach is feasible for testing, service monitoring and revenue assurance system support with the same tools.
It is a more complex approach, but also one of the most efficient for testing and monitoring, as it performs functional testing of services and interfaces in both test beds and live environments.
It outperforms any E2E network test in speed and flexibility. You may realize, that after some 3-4 real E2E tests to check the RAN parameters, you can be sure RAN is fine for now. Additional tariff and service tests do not require any consideration of RAN or connectivity in this test-suite anymore. The next 7.996 tests in the current test-suite can simulate all required tariff uses with e.g. the roaming network credentials to speed up the tests.
Pro:
- Fast and results are available immediately during the test
- Unlimited sessions as only shareable resources are involved
- Focuses on service delivery and not on radio access network
- Applicable in test beds, even if not all test bed systems are up and running
Con:
- Complex in planning as operator's staff is not used to dig into signaling
Another advantage is the ability to perform compliance tests even for destructive scenarios e.g. to check how a new system deals with incorrectly designed request packages.
Service simulation for signaling
In this approach, you see software components that simulate service functionalities and it is mainly used during development, type approval phases or deployment, until real users are using the new service.
Think of activation of a new core component in your live environment. You can’t put real user’s traffic on this new component until you have fully approved the integration and configuration. But you can sheathe the component by signaling simulation and service simulation, until integration tests are passed.
Or take 5G testing as another example. You have chosen a variety of vendors and need to synchronize interoperability tests. If, for example, the supplier of the UDM does not meet the planned delivery date, you cannot start testing at all. If you are able to use a UDM simulation in the meantime, all remaining suppliers can start integration and testing.
Another advantage is the ability to perform compliance tests even for destructive scenarios e.g. to check how a new system deals with incorrectly designed response packages.
Pro:
- Most flexible solution to simulate services in case the real service is currently not available
- Fast and results are available immediately during the test
- Unlimited sessions as no not shareable resources are involved
- Focuses on service delivery and not on radio access network
Con:
- Complex in planning as Operators are not used to dig into signaling
Collecting log files or traces from operator’s probes
This is not a real test procedure but it helps to analyze errors.
Operators are using probes in the network and each service or platform has its own monitoring, tracing, logging or configuration capabilities. Whenever an error occurs, an administrator will use these facilities to investigate and hopefully fix it.
A good automation system can at least do whatever the administrator is able/allowed to do and thus can perform the same investigation. Not all systems provide APIs for these tasks, so automation must also be able to mimic the administrator's investigation by using a command line interface or a WEB GUI.
In this approach, automated investigation has its advantage in its real-time capability. Most systems have buffers for raw session data. These buffers are not that large but can last from a few minutes to a few hours. After this time, the information disappears or is stored somewhere but often just in an aggregated form. Automation will start investigation immediately, so the information are often still available in the buffers.
Therefore, the ability to retrieve them during or immediately after the test speeds up the tests and is required for automatic fixing.
Some additional requirements of modern Automation
... and sure, these are all available in Degust?.
Modern automation systems are taking operator's project organizations into account. They must incorporate an user management, a test-release management (talking for systems under test) and resource management for all objects incorporated into tests.
Projects must support different user rolls, so that some users are able to set up the tests and other users are able to start test suits and are allowed to review results.
While they must be easy accessible ideally using a browser GUI, thy must fit into operator's security policies like object security and two phase authentication for user login.
The two latter requirements will also easily save costs by leveraging off shore resources for the shareable tasks.
As operators tend to involve external staff or consulting service providers, the tool must also support the different competences.
I invite everybody to help to extend this list and - as mentioned above - please reach out to me for any requirements or discussions.
Stefan Auweiler, CEO of Acctopus GmbH