Are the 4G Autonomous Networks getting closer?
Dmitry Sevostsyanchuk, PMP, Telecom
Project Director– ZTE Corporation
The article is focused on legacy network operations automation perspective on the deployed infrastructure.
Introduction
I always faced a fundamental question in managing network operations: how to be efficient. Despite purchasing various tools to address this, we repeatedly encountered the same issue - return on investment (ROI)
I think every middle manager is familiar with the C-level question: 'We are under pressure to reduce OPEX. I have bought you cutting-edge tools; please consider the possibility of staff reduction"
The purchase of specialized tools significantly enhanced our data collection capabilities. However,the further?value creation?steps were assigned?to scarce manpower. Faced with a myriad of tools, teams usually ended up generating reports based on either predefined or customized templates to quickly capture the low-hanging fruits.
I had TEMS Automatic equipped with a fleet of autonomous active modules, which generated a huge data flow. However, the analysis was almost exclusively narrowed down to benchmark reports. I managed network optimization projects based on passive probe data collection in both user and control planes. While these probes produced enormous amounts of data, the analysis capability did not match up with the rate of data generation, resulting in mediocre final outcomes
The main technological constraints were limited rule-based analysis and manual configuration processes. Relying on manual processes also meant overcoming significant resistance in the siloed working environment typical of telecom legacy systems. This is why, being in charge of end-user perception, I was particularly interested in cross-platform tools with automatic issue demarcation functions.
I managed to maintain high network performance, but I was also aware of the operational context. The network was designed to provide two main services for many years. Maintaining consistent configuration of the network elements (NE) and conducting numerous optimization iterations resolved most issues. Essentially, we traded off time and configuration inflexibility for limited analysis capabilities.
The upcoming global rollout of 5G will break this operational context. Communication Service Providers (CSPs) are moving beyond selling broadband to consumers and are now selling a connectivity platform to businesses. It's all about diversity and speed to market. The luxury of having months for service provisioning and quality tuning will no longer be available. It's a case of adapt or die
Operator OPEX could double over the next five years without more automation across deployment and management & operations just to support the expected changes with MBB-driven use cases, Ericsson
The problem is clear, but what is the solution?
At the infrastructure level, telecom companies are transitioning to an Infrastructure-as-Code (IaC) framework, at least in the Core Network domain now.
In network operations management, the manual and static programmatic and rules-based automation are giving way to model and knowledge-driven approaches.
The network operations management is shifting from the traditional "person + process" model to the new "Human + Machine" Collaboration Mode.
The below visualization captures the operations development strategy.
As I have mentioned before, the siloed working environment represents a negative legacy when viewed from the perspective of the current fast-paced, dynamic environment. I would like to add that this environment was created by vertical proprietary solutions, which required extensive manual and specialized operations.
The network architecture is layered, and at least in the Core Network domain, we now view hardware as a common infrastructure layer (VNFI) under the control of a single team. However, they do not fully benefit from this due to the 'old style' approaches persisting in the layers above.
In a real case, the Core Network's VNFI layer is managed by the IT department, but the traditional approach is still maintained at the virtualized network function (VNF) layer. Consider a VNFI task that involves offloading all services from the DC, such as an OpenStack upgrade. How much time and how many different teams will it require? How many risks will be introduced due to potential errors or incomplete configurations? And how significant is the resistance to taking responsibility?
I believe we need to address the issue of the siloed working environment by establishing a new service layer on top of the VNF layer, based on Infrastructure-as-Code (IaC), where workflows are fully automated.
Human involvement should be required only at the highest level of abstraction possible, which is a key factor in achieving an Autonomous Network
The necessity for a shift in the telecommunications operational paradigm is a widely known and accepted issue. This development is ongoing under the governance of TM Forum and 3GPP, categorized under the umbrella terms 'Autonomous Networks' and 'Zero-touch Network.
There are high level Autonomous Networks technical architecture.
The holistic method employed to reduce complexity involves decomposing the network into multiple self-governing domains. These domains interact with other layers, domains, or users exclusively through intent-driven interfaces, which may be APIs or human interactions.
The abstraction of Autonomous Domains by the Service Operation Layer has captured the entire spectrum of ideas, ranging from attempts to deploy cross-domain analytics to adjustments in the operation environment.
"The purpose of abstraction is not to be vague, but to create a new semantic level in which one can be absolutely precise." (E.W. Dijkstra)
It enables the possibility of automated service provisioning implementation from Layer 3 (as mentioned below), and this is becoming a reality today.
Having hands-on experience in either new service provisioning or complex issue debugging management allows one to fully appreciate the power of the vision to abstract and automate low-level operations.
From my experience, I used to spend over 50% of a project's time on low-level Network Element (NE) configuration tasks: planning, waiting for available manpower or timeslots, and debugging misconfigurations. I have always desired a system where I can design at a high level and have it automatically deployed, akin to the GitOps framework with its human-readable, declarative, and executable documentation.
In the short-term perspective, it's not possible to fully realize the benefits of Autonomous Networks due to the current limitations in the CSP network architecture across RAN, Transport, and Core Network domains.
Currently, CSPs are mostly in a transitional phase towards cloud-native architecture, primarily in the Core Network domain. In simple terms, only by fully transitioning to Platform-as-a-Service can you create the necessary abstraction layers to hide the complexity of the network and practically implement Autonomous Domains (AD)
The transition to Autonomous Networks is the long journey and should be tracked with below framework for evaluating autonomous network levels.
TM Forum Autonomous Networks architecture can be viewed as guide to design the network instrumentation strategy The major mistake on the current initial phase is the disaggregated automation in the small space of the specific network domain. You must have the overall vision how to integrate these automation activities to network-wide automation to achieve significant synergy. Autonomous Networks technical architecture helps with it.
What can you do now, and what are the ongoing global efforts for legacy networks?
Legacy networks, such as 3G and LTE, do not have native analytical entities in their architecture, unlike the Network Data Analytic Function (NWDAF) defined in 5G by 3GPP. All optimization and operational processes in these legacy networks are based on data from each Network Element (NE), including performance and fault management data, signaling tracing systems, log management, etc. Furthermore, legacy networks do not collect the comprehensive set of data necessary for in-depth debugging and issue resolution.
This problem is well known and solutions are defined: deploy specialize platform, put the probes on network interfaces to mirror Control plane and pull the MR&CDR from RAN.
The latest development in AI closes the analytical gap and brings commercial justification for the deployment of probes and analytical platforms. Generative AI generates executable objects that can be sent via request to a specific endpoint provided by the Operations and Maintenance (O&M) APIs..
Generally, most ongoing projects are all about additional network domains instrumentation to get its close to Autonomous Domains.
ZTE and AIS have successfully built an L3 autonomous network in Thailand using ZTE's uSmartNet solution. This solution integrates VMAX, AI Platform, and UME (O&M) technologies under the hood to enable auto-analysis, auto-diagnosis, and auto-optimization. Additionally, ZTE's uSmartNet solution has attained TM Forum’s Open API Certification
The common pattern in telecom automation:
This Closed Loop Communication Service Assurance is applied to individual cases.
Data collection
If you are in charge of network management, you should focus on increasing the number of various data collection points within the network
领英推荐
The biggest challenge facing operators on their path to accelerated implementation of telco AI for the automation of their networks is the inability to access high-quality data, Analysys Mason.
AI-based platform
Another possible mistake is attempt to lock in on in-house AI-based development because most of the similar projects have failed.
The recommendation is to look forward?to the right partners beyond traditional vendors who have more advanced expertise in AI-based solution.
Vodafone partnered with Google Cloud to build AI Booster, an AI platform to improve customer experience and the network performance.
The critical aspect here is that AI algorithms need to be trained on data from telco solutions
KT, Korea’s largest telecommunications company, developed an AI-based network failure Root Cause Analysis (RCA) solution called Dr. Lauren in in November 201. Dr. Lauren collects alert data, analyzes it using AI algorithms, and identifies the cause and location of network failures within a minute. This system is a product of KT’s decades of network management experience combined with AI technology, providing highly accurate analyses for identifying and locating failure causes. The implementation of Dr. Lauren has led to estimated OPEX savings of USD 1.2 million annually by offering intelligent remote monitoring and minimizing failure recovery time.
The desired state is to collect enough data volume to build Digital Twins of the network domains. These Digital Twins ensure a safe approach for testing optimization and automation initiatives, as well as validating new service provisions.
Ericsson will continue to intensify its collaboration with Google Cloud, leveraging Google Cloud tools as part of its Managed Service platform to serve customers globally. In addition, Ericsson is exploring Google Cloud’s advanced AI/ML technologies for Telco applications. Ericsson, press release, 29.08.2023
It should also be noted that AI use cases require lengthy development cycles and intensive testing processes in the production network. Additionally, the potential to adopt out-of-the-box AI/ML use cases, which can be immediately integrated into network operations and optimization processes to rapidly generate value, must be considered.
Nokia’s AVA Telco AI ecosystem offers an AI-as-aservic (AIaaS) platform that provides telco specific full lifecycle management for AI driven aplications . AVA platform is open, decoupled and ready to integrate with an operator’s big data ecosystem.
Capabilities exposure
Automation is fundamentally about transformation and the ability to perceive things differently
The main opportunities lie in leveraging the benefits of SDN (Software-Defined Networking) architecture. The Core Network domain has already passed the first phase of virtualization and is moving towards open cloud-native solutions.
However, the Access (RAN) and Transport network domains are still lagging behind.
All domains should undergo an 'open platformization' transformation, which would enable them to expose rich APIs northbound to OSS (Operations Support Systems). This process should also translate incoming calls into low-level commands for automatic execution.
The ultimate goal is to introduce declarative, Intent-based service management in each domain. In other words, this aims to enable higher-level business services to utilize network capabilities at the domain level, rather than at the element level.
A potential mistake at this phase is to follow traditional approach and looking only on the out-of-the-box solution from traditional vendors. The other options like Open Sources projects should not be overseen.
The RAN autonomous domain perspective
There is ongoing competition between RAN development paradigms: Open RAN from the O-RAN Alliance and Cloud RAN (C-RAN) advocated by traditional vendors.
There is no way to avoid addressing this issue today, both from CAPEX and architecture perspectives. Options include Nokia’s Intelligent RAN Operations, Ericsson’s Intelligent RAN Automation, or building an automation platform based on third-party projects, such as HPE Telco RAN Automation. It's important to note that while most newcomers are focused on 5G solutions, current 4G/3G SDR sites will remain operational for many years.
CSPs tend to favor traditional solutions, but competitive pressures are pushing them towards open architectures that provide SDKs for third parties. TM Forum’s Open APIs facilitate intersystem interoperability, paving the way for a larger ecosystem. Consequently, the entire architecture of RAN Self-Optimizing Networks (SON) solutions is shifting from its current, tightly integrated form to a more open and flexible rApps-based architecture
The first and mandatory step is to transition to the latest version of O&M solutions that are convergent for both legacy and 5G networks. These latest O&M solutions are built on an open architecture with Open APIs, enabling the creation of a closed automation loop. It is also important to note that vendors are adopting an 'embedded AI per NE (Network Element)' approach, meaning that legacy networks will also benefit from native AI in O&M.
The Ericsson’s Intelligent RAN Automation reference architecture shows the mainstream approach for RAN automation
Vodafone is using Nokia RAN Intelligence to boost network quality and to implement Zero Touch Operations. Etisalat, Du, STC, and Zain announced at the SAMENA Telecom Summit 2022 that they are collaborating with Huawei to bring more AI into the RAN to improve the performance, enhance the customer experience, and provide the right foundation for more RAN autonomy, Dell’Oro Group
Deutsche Telekom examined both proprietary and open-source solutions, beforechoosing the open-source Open Networking Automation Platform (ONAP) to automatethe O-RAN Town related services. An advantage of ONAP is that it can be used as a platform in multiple network domains, Deutsche Telekom
The Transport autonomous domain perspective
Experts remain skeptical about the short-term feasibility of the autonomous framework. They cite several major obstacles, including a diverse range of legacy equipment, complex architecture, and a large number of CSP-specific policies resulting from fixed-mobile convergence.
Potential solutions lie in simplifying the architecture and routing stacks. The implementation of Segment Routing over IPv6 (SRv6) to streamline the communication stack is seen as promising.
The deployment of platforms such as Cisco Crosswork Network Automation, Nokia Network Services Platform (NSP), and Huawei iMaster will be beneficial in the upcoming IP network redesign and investments, driven by the global rollout of 5G networks
The automation solution for legacy IP networks based on Ansible is gaining ground due to its affordability and adaptability.Ansible's agentless architecture and use of YAML for playbooks (scripts) make it highly adaptable to different network environments. In the context of network management, Ansible enables network engineers to automate various tasks such as configuration management, provisioning, and deployment of network devices.
At the current stage, the most promising investment is in specialized AI-based solutions, which aim to transition all domains from reactive to proactive fault management and provide AI assistance to network engineers, making them more efficient.
I think it is now mandatory to develop a 'Network Engineer + AI' pair configuration approach in order to bypass legacy network limitations and advance the level of network automation
Conclusion
You can advance the legacy Transport and Core network domains to Level 1+ autonomous network levels with modest investments. Additionally, you can advance the RAN domain to Level 3 autonomous network levels by leveraging the SON functions
I hope this article has offered new insights into the perspective of autonomous networks, and that it will inspire you to challenge network development projects and established operational practices.
Asking the right questions is fundamental to automation. Remember, automation is not an all-or-nothing approach.
[1] Autonomous Networks; Technical Architecture. TM Forum IG1230 (2022-12-09).
[2] 5G.Management and orchestration; ETSI TS 128 100 V17.0.0 (2022-05)
[3]Zero-touch network and Service Management (ZSM); Intent-driven autonomous networks; ETSI GR ZSM 011 V1.1.1 (2023-02)
[4] O-RAN Town: Pilloting a High-power multivrndor Open RAN solution in a Brownfield network; Whitepaper by Deutsche Telekom AG,(2023-02)
[5] Leveling Up: achieving Level 3 AN; TM Forum report, 2023