Understanding 5G, A Practical Guide to Deploying and Operating 5G Networks, Strategic Importance of Network Virtualization (Part 6)
Houman S. Kaji
Founder, Board Member & Executive Vice President, Chief Innovation Architect Strategy and Ecosystem
Critical to the successful delivery of 5G is the programmability that will allow the dynamic assignment of resources to the corresponding tasks at any point in time and across the end-to-end network. However, to properly deliver both the incremental capabilities of 5G network programmability compared to legacy networks and to enable relevant cost savings, a disconnect of the peak and average utilization in terms of the cost of delivering the service must be achieved. This means that the operator must be able to deliver service at peak utilization with the desired QoS but as usage scales down to a much lower level, the correlating resources assigned should be scaled back to align costs with the traffic levels. This is practically, perhaps only, achieved with the help of virtualizing the delivery platform.
Virtualization is not a new topic in the IT industry or the mobile telecom market, but to date, most of what is being discussed is how to lower the cost of owning and maintaining networks by sourcing lower-cost components than can be achieved with legacy networks.?While this is important, it is only the beginning of what virtualization can and should do for telecom and mobile operators. Lowering costs is vital, but so is creating new revenue streams that are only possible because of virtualization and 5G. This point can’t be stressed enough as operators are at a crucial point in history regarding the financial viability of their businesses. Operators that see the true possibilities of virtualization can greatly increase their average revenue per user (ARPU) and continue to have businesses with an operational profit model of forty-five to eighty percent in the future. Those that only see virtualization as a way to buy lower-cost components will most likely become a commodity business with a profit between five to eight percent, well-aligned with other utilities.
Virtualization is also at a critical point due to the number of employees retiring from the telecom workforce compared to the number who are entering. For example, a global mobile carrier in Europe has stated that by 2025, seventy-five percent of its workforce will have retired. At the same time, the top ten percent of students graduating from university don’t want to work in what they perceive as the traditional telecom industry; rather, they prefer to work for OTT-leading providers who are perceived as cutting-edge in terms of automation and advanced business processes. All this creates a state of urgency to get to fully automated networks and business processes as soon as possible, and most industry sources recognize that virtualization is a key enabler to automation.
?1.??????Thinking Differently
Mobile operators have the advantage of reaching the public much faster than most companies can match. But how many operators plan to take advantage of this strategic advantage on the path to virtualization? Put another way, if operators are willing to think differently, what can they achieve? Can they gain a significant competitive advantage that lets their company thrive in the future?
Case in point: Reliance Jio completely disrupted the mobile market in India by thinking differently. Working in a greenfield environment, Reliance Jio was able to gain 100 million subscribers in the first 100 days of operations because they built an extremely cost-effective architecture that is fully virtualized and automated. They completely changed the market in what is considered the second-largest mobile market in the world, and they did it all in less than one year of operations. Most important is that Reliance Jio caught most of their competitors off guard.
This is an important lesson. Virtualization will make winning and losing happen at a much faster pace than has been seen before. Also, an operator may do everything right on the path to full virtualization but still lose out because another company was more innovative in its thinking. Thinking differently is a big enabler for success in virtualization. Operators must go well beyond the technical steps toward virtualization and think of the end goal they want to achieve.
How can an operator, especially in this 5G technology inflection, become more agile as a company and create new revenue streams while also lowering the amount of risk by properly leveraging all that virtualization has to offer? This chapter will answer that question by giving operators the elements they need to consider so they can create a?unique road map for the future. No two roadmaps will look the same, as each operator has its unique challenges and advantages based on business models, network architectures, and business climates in its specific area of the world.
?2.?????A Shift in Revenue Models
Total revenue has shifted over time from mobile operators to over-the-top (OTT)?businesses such as Facebook, Netflix, Apple, Amazon, and Google. Mobile operators formerly used growth to determine when to invest in their networks and services, but that growth has largely stopped as OTT companies now provide the services and capabilities that appeal to subscribers. This can be seen in three main trends:
For operators to succeed in the 5G future, they need a new way to manage revenue while also reducing risk. This concept isn’t new to the world of mobile communications.?In the past, DoCoMo had a service called FOMA (Freedom of Mobile Multimedia Access)?that made money from apps and app brokering in the days of 3G. Subscribers used it through a non-web-based portal. It was successful because they had a distributed risk management system—much like Apple does today—that contained the infrastructure and the marketplace. They were not taking on the risk of developing the apps but shared in the revenue from the successful apps. Revenue-share models and a focus on innovation that provides value-added services will be critical for operators in the virtualized world.
One challenge toward innovation is finite scalability. Traditionally, mobile operators have bought solutions from other vendors as they try to evolve, such as buying a rack of special-purpose hardware. For this to work, traffic forecast planning had to be perfect but accurate forecasting has proven to be challenging.
For example, SMS services grew much faster than operators had anticipated. Operators had a hard time keeping up with the network SMS assets to meet demand. The reverse happens as well. For example, operators overestimated the demand for rich communications services enhanced (RCSe) incremental voice/ video services. Operators built out capacity that gained no return based on how slowly demand grew for the services.
One of the great benefits of virtualization is that it gives operators a data center instead of dedicated special-purpose hardware. With virtualization, operators can try out new services with their customer base without much risk. If the service is successful, operators can quickly add more capacity to the data center. If the service is not successful, operators can easily reallocate that capacity for something else. This model lets operators try out new services and apps without the dedicated risk as before.
Virtualization gives operators the starting point for this type of agility where traffic forecasting is not as critical. With the risks greatly reduced, are operators willing to bring their own OTT services and other innovative 5G use cases to market in this new 5G era??The industry would need verticals to buy into their new 5G ecosystems, which means that operators will need to provide services that offer a true value to subscribers for each vertical.
3.?????A Brief History of Virtualization
To fully understand mobile network virtualization, it is important to look outside the mobile industry and see what other industries have done. If operators think they are on the cutting edge of virtualization, they have already set themselves up for mistakes. There is a?lot to learn from other industries.
Virtualization has been around for a long time in the form of enterprise virtualization. At the turn of the twenty-first century, the client-server model connected remote desktops to a server as a way to better utilize resources throughout an enterprise. This was the first step towards full virtualization.
Fast forward to today and enterprise virtualization includes vast data centers and cloud computing that can scale up and down in real-time to take advantage of every market opportunity. This has created agile and resilient networks that model what virtualization can achieve.
The mobile industry took the initial steps toward virtualization by adopting network function virtualization technologies developed by the standards organization ETSI, ETSI?NFV, in 2012. Several operators co-authored an ETSI NFV white paper that kicked off?ETSI ISG with a focus on building a virtualization platform for operators. ETSI NFV plays an important part in replacing specialized hardware network nodes with virtualized network functions (VNF) that greatly reduce the cost associated with a network but offer little in the way of network scalability due to the isolated nature of the technology within a network. For virtualization to succeed, ETSI?NFV for VNFs must incorporate an overall network and service management platform based on cloud-native principles. This will help it gain the web scale needed to offer on-demand services that subscribers want. ETSI is beginning to address this with ETSI ISG ZSM, which will be discussed later.
Another important aspect of virtualization is software-defined networks (SDN). The concept of SDN was first considered by the Internet Engineering Task Force (IETF) in 2004.?Practical SDN-type technology was originally designed and implemented by Google in?2010 for its own business. SDN is an architectural approach that separates the control
and data planes of the network to maximize resource utilization. Mobile networks have incorporated SDN to some extent to replace MPLS, VLAN, or ATM. But to gain the most value from SDN, operators must connect the workloads within their networks with the?SDN management agility. The reason for this will become more apparent as we discuss intent-based orchestration later in the chapter.
The commoditization of hardware has also played an important role in virtualization.?Companies that host their own data centers have worked to disaggregate the network within those data centers as a way to lower the cost of equipment, lower power consumption, and create more flexibility in the way systems operate. One initiative, created by Facebook in 2009 as a way to collaborate with other Internet content providers (ICPs), has become known as the Open Compute Project (OCP). This was a response to similar strategies deployed by Google that were operating at a magnitude of the scale higher than Facebook. Google has not chosen to make their designs public and thus?Facebook and others had to start a new initiative to drive this development in the open.
OCP is now a global phenomenon and includes most of the ICPs throughout the world along with many hardware suppliers and several large financial institutions that host their own data centers, among others. The result has been a complete change to the data center model and has created an unprecedented amount of innovation—all of which is due to collaboration between groups that used to view each other as competitors.
Riding on the success of OCP within the data center, Facebook launched the Telecom?Infra Project (TIP) in 2016 to create the same type of sharing and collaboration within the telecom market. The main goal is to create the same level of innovation and flexibility that was created by OCP. Facebook says that it has saved $2 billion in CapEx through the efforts of OCP. TIP could produce similar benefits for operators as hardware commoditization allows for less specialization, which brings down costs and creates systems that can be used in multiple ways.
4.?????The Main Value of Virtualization
To date, the main value of mobile network virtualization has not been realized on a wide scale. In some cases, operators are currently thinking of lowering costs by piecing together the cheapest open source components or bits and pieces from their current suppliers. But to gain real value out of virtualization—and thrive as a company—operators must look at the real business need for such a network. Virtualization makes networks agile and dynamic, but operators must think of the business services they want to provide and then create a network and ecosystem to meet those needs. This is the only way to gain true value out of virtualization.
Let’s take SDN as an example. In its effort to maximize resource utilization, the SDN?controller watches for the right time where there is enough capacity for a certain application and then informs the business application that the capacity is available for a specific amount of time. SDN gives a specific capacity over a specific time frame for a specific business need, creating a very efficient network resource to perform a specific data transfer objective. This goes well beyond creating network optimization for the sake of network efficiency. By creating that optimization for a real business need,?operators gain maximum utilization of the underlying hardware assets. As networks become fully virtualized, operators will be able to increase this utilization by automating their networks in a way that controls and orchestrates the network from a central location.
Mobile operator Elisa in Finland is a great example. Elisa had a business need to gain the highest data utilization rate possible while using the least amount of resources because it's two to three million subscribers generate more data traffic than does the entire population of Germany. The operator sent its operations staff to training courses on Pearl and Python programming languages and asked that they write scripts that automate tasks they currently do manually. Elisa has now automated ninety percent of everything that happens in its network and within the operations team. The company moved from multiple people working in the network operations center to one person overseeing the system to make sure it runs smoothly. These tools have subsequently been made available commercially to other operators.
Through virtualization and automation, Elisa has created a network that has some of the highest data utilization rates in the world today. From a management perspective, they are already doing over 4G LTE what large operators around the world hope to achieve with 5G.?The success of Elisa is based on determining the real business case for its unique situation and then creating a network to best meet that need. This is the mindset that operators need to have to be successful with virtualization and subsequent automation.
Figure 1. SDN controller maps virtual Infrastructure towards the applications Layer
?
5.?????Key Technology Options
To best evaluate how to virtualize a network and what parts of a network to focus on first, an operator needs to look at the key technology options available. Each technology has challenges and aspects that must be considered to gain the most value out of those technologies. In this section, we’ll talk about the technology options as they relate to business strategy; some technologies are better suited to certain tasks, depending on the end goal an operator is trying to achieve.
We spoke of NFV before, which is a broad topic that encapsulates most aspects of virtualizing a network. Within NFV, there are three main areas: virtualized network functions (VNF), which are the specific functions that a network performs; management and orchestration (MANO), which orchestrates all of the functions within a network and will become critical as networks begin to automate; and NFV infrastructure (NFVI) that comprises the virtualized layer just above the physical hardware. The NFVI does its best to maximize the utilization of those hardware resources to allow the VNFs to perform their duties. We’ll focus on NFVI here as MANO is covered in the automation chapter.
5.1.??NFVI
From an operator perspective, NFVI can be any type of computing environment located anywhere across the network chain. This could be the home gateway that has a computing node, a mobile edge computing platform, an edge data center, a core data center, or even a huge centralized data center. In evaluating NFVI for a network, what does an operator need, and what should be optimized for each of these environments? To answer these questions, an operator needs to know the design criteria for a network. From an internal development cost perspective, certain parts of a network will cost money to develop and other parts will be free based on the current network configuration of each operator. The best way to describe this is with an example from the Deutsche Telekom?TeraStream design. Deutsche Telekom was redesigning their network based on two main criteria: 1) computing resources in the main data center are free to the company, and 2)?bandwidth from the edge to the core is also free and unlimited from a cost perspective.
Based on these design criteria, Deutsche Telekom came up with a network design that has no computing resources between the edge and the core. Also, all functionality to do traffic engineering and quality service management are generated inside the data centers that they already own. This design has to stand the test of time to show that it can, and will, be widely deployed.
Regardless of what technology and architecture prevail in the mobile space, that technology is likely to be based on a set of core principles founded on a clear understanding of business strategy. Another operator might have other design criteria that influence how the network is designed. For example, an operator might already have a?million edge locations with space, power, and networking that they can utilize. If so, then?NFVI needs to be characterized so that these locations are free to the company. What is it going to cost to build out new areas of a network versus what the operator already owns??Operators need to use current assets as a strategic advantage. This is their starting point for setting design criteria and a competitive strategy for building out the rest of their virtualized network. Once an operator is clear on its unique starting point, they need to evaluate different components that will best get them to their end goal for virtualization.
5.1.1.????Evaluating NFVIs
Vendors are rushing to supply the hardware and other components needed to make?NFVI work, but with each operator having unique needs and uses for the components,?how do vendors supply the right components to the right operators at the right time??In most cases, they don’t. Although QuEST/TIA is working to change this, the group is still in its infancy. The reality is that most hardware components are designed with one major operator in mind. Those components meet the characteristics and the performance specifications that the operator has for its unique network.
The hardware is readily available, but operators must ask themselves: Does this product suite meet my need and business case? If the requirements of the type of service being deployed are the same as what that major operator is doing, then most likely a similar component choice makes sense. But if not, then an operator must Figure out which specific combination of vendors will suit its needs. On top of needing to know their unique needs based on performance characteristics, costs, and capacity requirements, an operator must also know how to evaluate the different components.
When evaluating components, the datasheets for each vendor’s offerings will show its specifications and performance levels, but doing an evaluation based on those data sheets can be misleading. QuEST/TIA set out to evaluate the different components vendors offer as a way to help operators get to the right endpoint with virtualization. The group found that the information on the datasheet was not trustworthy and actionable. The vendors weren’t lying, but the datasheet only represented using the component in a very specific environment. For anything outside that environment, the performance was much different,?often worse.
This proves that operators must choose the components in their NFVI hardware and software configurations carefully based on the specific type of application they plan to run on that platform. For example, an NFVI receives a data packet on the network interface.
The packet needs to be read on the interface and then forwarded up to the CPU, processed,?and can then be written down to disk. If the NFVI doesn’t have the correct capacity in all of these points, the operator may get good network and CPU performance, but the storage might not be sufficient because it can’t be written down to disk fast enough.
In addition, just evaluating an NFVI based on items such as raw CPU performance or memory won’t let an operator know how that NFVI will perform with specific applications.?An operator must know the services and applications it plans to run to properly evaluate NFVIs. Once they do know, the person doing the evaluation can come back to the decision-makers within an operator and make recommendations of which NFVI platform to source that meets the needs for each situation in the network. They can also recommend when it makes sense to have a vendor swap out certain components to make the platform perfect for that situation.
There are certain steps operators can follow to evaluate NFVI platforms for their unique situations. This list can include:
North American operator Verizon went through a similar process and decided to use white box hardware from Dell along with ADVA Optical Networking’s Ensemble?Connector software to act as the brains of its NFVI platform. They decided this was the best solution based on their unique business and network needs, among many other choices. If other operators blindly follow this lead without sharing the same or similar business requirements, those operators will likely make an unoptimized decision leading to possibly excessive costs or not enough resources to meet the localized requirements on their NFVI platform.
?5.2.??Virtualized Infrastructure Managers (VIM)
Once NFVIs have been evaluated and selected, they need to be managed along with software to deliver network services to customers. This is done by implementing a virtualized infrastructure manager (VIM). There are several types of VIMs and each has its strengths in certain situations. The oldest and most deployed to date are virtual machines (VMs).
5.2.1.???Virtual Machines (VM)
VMs sit on top of a physical machine and create multiple virtual machines. The operating systems and any relevant applications share hardware resources from one physical server or a group of servers. One of the interesting facts about VMs is that they require their operating system and the hardware is virtualized. This arrangement allows an operator to drive up the utilization of the hardware versus having one machine powering one application on top of it. Operators that use VMs tend to use a model that ties certain hardware infrastructure back into a specific virtual machine to leverage specific hardware-based acceleration techniques.
The challenge VMs pose for operators is that traditional mobile networks use dedicated processors to perform certain tasks that need high-performance levels. When those processes get generalized into a piece of software on a general-purpose CPU, the performance levels are generally much lower. Companies like Intel have tried to solve this problem by building acceleration capabilities into general-purpose hardware. The data plane development kit (DPDK) is an example of this for network processing optimization.?In this case, the acceleration is a functionality of the network interface card. This gives a?server the ability to reach high-performance levels in network applications. Without this feature, operators would get only a fraction of the performance out of the same machine.
If an operator has one to five network cards in a server, it can assign those resources to the VM and the VM then utilizes the acceleration capability. The problem is that a one-to-one ratio of the server to VM rarely occurs in a virtualized network as it becomes too costly and reduces the dynamic scalability of a network. VMs are also notorious for using a lot of?RAM and processing power since they need to run a virtual copy of all the hardware and operating systems individually.
It is more common to see multiple VMs sharing a server. For operators, this creates an issue for acceleration. For example, if a server has only one network card that has acceleration capabilities, but has two VMs sitting on top, then one VM will get the acceleration and the other won’t. This means that the applications running on the second VM suffer. This creates network constraints that must be taken into consideration. Because of this, VMs are better suited in the core of a network where performance is needed and there are more hardware and computing resources available. Various industry groups are working to extract this further and provide non-blocking APIs into the acceleration components.?It is yet to be seen if this delivers one further level of flexibility and agility or just another possibly cumbersome integration point for developers.
Figure 2. Virtual Machine (VM) stack with individual guest OS
?5.2.2.???Containers
Containers are a newer technology than VMs and don’t include the operating system. A?container only contains the binaries and libraries needed to run an application. For example,?if there is an application that runs on Linux, then an operator could run a container that runs a Linux-based host operating system without the need for any Linux environment setup. The container will reuse Linux from the operating system on the server. As a?comparison, VMs have virtual hardware that has all of the characteristics of normal hardware. This means an operator would need to install a complete operating system on the VM with all of the environments needed.
A container is limited to what the host has for installations and capabilities but doesn’t have the same overhead requirements as a VM. This makes a container more efficient as long as it doesn’t need additional capabilities or another type of operating system. A?container makes it easy to build something and is an agile way to deliver items such as micro-services.
For example, a subscriber wants to install a new app on their phone. It doesn’t matter what operating system the phone uses because the platform hosting the app takes care of that in a container environment. This makes some implementation of containers a smart choice in areas such as subscriber-facing application platforms where flexibility is needed, since subscribers may be using different operating systems to access a certain app.
For this reason, containers are becoming more popular. With Intel, among others, the industry is working on a type of API that goes into the hardware and will be able to then go into a container. Over time, this will give containers similar capabilities to a VM as the?VMs capabilities will get exposed up through the layers of software making it possible for the container to use the same type of underlying APIs.
Figure 3. Container stack with reused OS from Host
5.2.3.???Software-Defined Infrastructure (SDI)
One of the interesting aspects of both VMs and containers is that they try to optimize the finite components in a server such as CPU, memory, networking, or storage. This creates network constraints and service challenges depending on what a network is trying to achieve. As networks continue to become more dynamic and automated through virtualization, trying to optimize finite resources within a server will become a limiting factor on a network.
To resolve this issue, Intel has created a new concept called Software Defined Infrastructure?(SDI). Instead of having a rack in a data center filled with individual servers, SDI utilizes a rack that has one part made up of memory, another part is disks, another part is CPUs, and so on. Then based on a specific need within the network, these components would be assembled by a hardware orchestrator to create the optimal server for the network function. In this SDI environment, all of the components are interconnected by an extremely high-speed backplane enabling the instantaneous assembly of a specific configuration for a specific task.
SDI makes it easy to allocate components to certain tasks when they are needed, and each configuration looks like a stand-alone server to the operating system. From the software side, no one would be able to tell this is a software-composed hardware architecture.
Shortly, SDI will provide more flexibility to VMs and containers from a?hardware point of view. Several networking vendors, including Ericsson and Nokia, are already bringing out data center solutions based on SDI. In the not-so-distant future, SDI?may overtake containers and VM by providing the same capabilities in a more holistic environment if the cost overhead of an SDI model is less than the gained efficiency of the resource utilization.
In deciding the best way to move forward with NFVI and VIMs, operators need to take a step back and focus on how network functions can help drive their business goals.
Operators usually focus their optimization efforts in areas of the network where they know they can make money. But a more important question is: Where are the areas in a network that an operator can’t make money? These are the areas that are critical to optimizing so that they take up a few business resources as possible.
?5.3.Private and Public Cloud Environments
All of this talk about evaluating different parts of a network to optimize and the pieces needed to make it happen leads to discussions about private and public cloud environments. Just as what to optimize in a network will have an impact on the business model moving forward, so does the strategy used in determining whether to build a?private cloud environment or buy into the public cloud.
The cloud provides the scalability and redundancy needed to create dynamic and resilient networks to handle all that 5G will bring. For operators that excel at evaluating NFVIs and matching those assets with real business needs, building a private cloud environment is the best solution as it allows the operator to create a network best suited to its individual business needs. They could also build a private cloud environment that leverages their competitive advantages. But some operators may find that they are good at providing services but not good at evaluating NFVIs to build the ideal network. They may also find that it costs too much to build such a network themselves. For these operators, buying into a public cloud environment might be a better solution. It should be noted that in many cases the large public cloud operators have added proprietary hardware acceleration capabilities to their servers to accelerate specific functions.
But each public cloud platform has its strengths and weaknesses, which means that an operator would need to evaluate each public cloud solution based on the VNFs or applications they plan to run. This is the only way to determine the value of each public cloud platform.
In the past, public cloud platforms have not been a good fit for operators to run the core of RAN functions due to latency issues. Low latency and on-demand services are the main goals of 5G, so operators might be better off using their networks that include a Mobile Edge Computing (MEC) platform they can use and control to make low latency a reality.
Recently, Amazon has tried to address the low latency issue in its Amazon Web Services?(AWS) cloud platform by buying grocery store chain Whole Foods. This may seem like an unusual move, but Whole Foods can provide a hosting location for state-of-the-art localized data centers in each of its stores that can serve as a platform for MEC-type services. Whole Foods has locations that are located in densely populated areas near subscribers and could serve the services that need low latency while sending the rest of the services back to the main data centers in the cloud.
AWS could also provide the compute environment or provide the complete service for mobile operators. AWS has been working to make its network more appealing to mobile providers by providing the Evolved Packet Core (EPC) as a service. In this case, mobile operators would only need to own the base stations in the RAN; AWS would provide the complete service to operate on the operator-owned spectrum or possibly in an unlicensed spectrum. Depending on the proximity between the public cloud host location and the end user’s RAN node, this may not allow for 5G Ultra-Reliable Low Latency services. In such cases, the operator would need a MEC-like distributed NFVI to support certain autonomous driving scenarios, for example.
AWS is currently the largest cloud provider in the world, but other cloud providers are building the necessary infrastructure needed by mobile operators. For operators, decisions need to be made about how to put the right compute environment and software architecture around a certain amount of geographic assets and how to best utilize those assets. With this in mind, operators need to decide if a private or public cloud environment is the best solution for their business.
6.?????Open Source Projects and Standardization
The mobile industry has historically developed new technologies with each company trying to make money by getting their technology to be the new standard for the industry and then licensing that technology. This process made creating standards a lengthy process as each company fought to have their technology become the standard.
Mobile standards group 3GPP has a philosophy of standardizing everything from the big systems of a network down to each component. The members of the group must agree upon each standard created. On the other hand, standards group IETF has taken the position of standardizing the components but not the overall systems. Within this group,?there is usually only one type of each component, but the systems are developed in an open-source environment. At the same time, IP standardization happened by companies proposing an RFC to IETF that explained the value their technology brings to the industry.?That RFC was then backed up with software implementation and was evaluated for some time and either accepted into the process flow or not.
But the push toward virtualization is happening too quickly for these traditional methods to be used. Virtualization is being led by open-source projects where the ownership of the intellectual property rights (IPR) in itself does not lead to any commercial licenses. Instead,?it leads to technology that is being developed for the overall good of the industry. That doesn’t mean that everyone is working in the same direction though. Several open-source groups are working on very different strategies to get to what they consider to be the ultimate environment for virtualization. It’s worth looking at some of these projects to see if any of the various strategies can benefit what a specific operator is trying to achieve.
领英推荐
Most of the open-source organizations relevant to this discussion were started as development projects within telecommunications companies. Then these companies brought that development to the open-source community by going to the Linux?Foundation and asking to make a project out of their development work. The Linux?Foundation already has a structure in place for this; companies wanting to start a new project must adhere to that structure and the rules of the organization. This helps each project become beneficial to the mobile industry as a whole as everyone is using the same structure for development. Outside of telecom, this model of open sourcing projects has been common for a long time with significant contributions from companies such as?Netflix, Facebook, and Google.
6.1.ONAP
Open Network Automation Platform (ONAP) is one of the projects to come out of this process. ONAP originally started as internal development work by AT&T. Called ECOMP?internally and AT&T Domain 2.0 externally, it had written specifications of what needed to be built to enhance mobile networks. AT&T wrote close to eight-and-a-half million lines of code, most of which was later donated to the open-source community.
At roughly the same time, China Mobile started the Open O project. The main goal of this project was to develop an orchestrator to drive other aspects of a network. China Mobile donated four million lines of code to the open-source community. Open O and ECOMP eventually merged to create ONAP. According to ONAPs website, the organization “provides a platform for real-time, policy-driven orchestration and automation of physical and virtual network functions that will enable software, network, IT and cloud providers and developers to rapidly automate new services and support complete lifecycle management.”
The system was architected by multiple people and the quality of the code is not quite on target as it was originally written without the realization that it would be used in an open-source environment. Significant portions of the code and architecture are now being redone to make it more viable.
One of the nice aspects of ONAP is that its architecture could be, after the re-architecture, broken down into smaller components with well-defined interfaces. This is useful as operators will be able to use only the parts of the architecture that suit their individual network needs. ONAP’s design does not force compliance with the ETSI NFV project on the MANO level. One of the main reasons for this was the lack of standardization of the OSS process in MANO. This is today an effort undergoing research and standardization to manage. More information on ONAP can be found at https://www.onap.org/.
Figure 4. Comparing traditional telco cycle and the OTT provider cycle
?6.2.OSM
Open-Source MANO (OSM) was created to focus on the MANO deliverable within ETSI NFV.?MANO is a technology that coordinates the efforts of virtual machines that deal with the VNFs in a network. For example, MANO can deal with how to control a machine that is in charge of spinning up a network slice. This group was originally started by Telefonica as an internal effort called UNICQA Project. At the time, Telefonica was trying to find a more efficient way to manage all of its OSS and base station subsystems (BSS).
Telefonica created the new architecture internally, but when the company decided to go down the path to virtualization, they donated large portions of the code to OSM. OSM?now includes operators such as Telenor and Verizon, among others. Currently, OSM is considered by many industry analysts to be better architected with a better set of code than ONAP. But over time, the re-architected ONAP will not only be able to function with?OSM, but it will also most likely have components that extend beyond the capabilities of OSM. This will allow operators to use OSM if they prefer but also use the components from ONAP?that go beyond the capabilities of OSM to build their ideal network. More information on OSM can be found at https://osm.etsi.org.
?6.3.CORD
Some operators will go solely with ONAP as their main framework and others will use?OSM only as a way to simplify their architectural strategies. This may change once ONAP is re-architected for areas beyond OSM. Other operators might choose a framework that is somewhere in between, such as CORD. Central Office Re-architected as a Datacenter?(CORD) was started by bringing some components of its framework from the IT industry that had already virtualized their networks. This creates an interesting combination of IT?and telecom from contributors such as Google, Verizon, and China Unicom among others.
CORD states that it “combines NFV, SDN and the elasticity of commodity clouds to bring?datacenter economics and cloud agility to the Telco central office.” More information on?CORD can be found at: opencord.org.
6.4.Standardization
The road to creating standards for virtualization will look much different than the methods used for legacy telecom standards in 2G, 3G, or 4G networks. Open source plays a big role in this, but so does the difference between the IT and telecom industries. The IT industry is involved in the virtualization of mobile networks and has been doing automation and orchestration in enterprise and web-scale environments for several years. Most telecom vendors are coming at the problem from a traditional telecom angle by trying to create an ecosystem similar to what they have done in the past and to leverage their current position of strength.
This is creating a lot of tension in the industry due to cultural differences between the two groups as well as differences in what the fundamental building blocks should be and what is required to make virtualization happen. The telecom group is more interested in the outcome and how to get there. The IT group is more interested in patterns and repeatability so that they can create a system to be put in place that continues the evolution of a network beyond the initial virtualization phase.
For example, some of the companies from the IT industry want to create a data model that can be used over and over but provide that data model with different inputs to create the needed outcome for each operator. The telecom group would rather build an entire architecture each time a specific outcome is needed. Both have their positive aspects, but the IT model could provide a more flexible environment that can adjust with market conditions and as new technology is developed. For the IT model to work,?an operator must define what specific outcomes are needed. The challenge is that the possible variations and outcomes are significant, and many in the telecom industry think it is close to impossible to know everything that is needed ahead of time. This difference of opinion is slowing down the progress towards creating standards for virtualization. It has become clear that certain operators are not waiting for a standard on every aspect of their network, and instead, are moving forward with their implementations based on architectures from the open-source community. Whether to wait for a standard or move forward using open source is largely up to each operator. Waiting could put an operator behind the competition but moving forward might take an operator down a less successful path as the technologies evolve.
7.?????Operational Considerations and Patterns
One way for an operator to get ahead of the competition is to rethink how it handles its development, operations, and network management internally. The network of the future will be less about trying to create the perfect network for each stage of that network’s lifecycle and more about creating a flexible network and business organization that can easily evolve as subscriber needs change. Several aspects of this will be covered in this section.
?
7.1.DevOps and NetOps
DevOps is a term that was first coined by Facebook and Netflix. It’s a management philosophy that says the same people that develop a capability or service must also be put in charge of the operations of that service. Traditionally, mobile operators have had separate development units that hand off new services to an operations team that must deal with the challenges of that service on their own once it goes live.
Using the DevOps management approach, the team developing the new capability or service know that they are ultimately responsible for all aspects of the new service, including once it goes live. This creates an environment where the team can quickly validate the capability or service and make the needed changes, ensuring that what is built is ready for live operations. If anything doesn’t work as planned, the team can easily make corrections at any time during the life of that service.
DevOps can create a nimbler organization with much smaller teams and is better suited for web-scale-type services and network configurations. It also lends itself to creating automated operations for repetitive tasks, which naturally leads to the automated operations environment needed once networks are fully virtualized and automated.
NetOps is the same concept as DevOps. It is the term being used to apply the same management principles to the networking part of an organization.
?
7.2.Radically Different Concepts
The thought of a network going down due to a disaster is not something most operators want to think about. Currently, operators plan for disasters and then test and validate the resiliency of a network in the controlled environment of the lab. Once the network is live, tests are done once a year during off-peak hours to make sure a network can handle a potential disaster. What operators don’t tend to do is a test in real-life scenarios with peak traffic running on a network.
Of the mobile networks that have gone down, most happened in situations that were designed not to happen. For example, a nationwide European operator was upgrading three home subscriber servers (HSS/HLR) that held all of the subscriber information for the entire country. During the upgrade, the technicians accidentally tripped the main power supply and the secondary power supply failed to start for one of the HSS units. This forced all of the traffic onto the other two units as is the design in this scenario.
But the second HSS unit immediately failed as well, pushing all of the traffic onto the last remaining HSS unit. The third unit became overwhelmed and shut down, taking down the network. Subscribers were without service for several hours—all because the redundant power supply for one HSS unit failed. In another example, a Swedish operator suffered a?power failure to its main data center. The backup generators failed to start, which created a?nationwide service outage.
As virtualization and automation transform the industry, operators must test their networks in real-time with peak traffic to make them as resilient as possible. Operators will no longer be able to design for the best-case scenario and hope for the best. Constant testing will be needed as networks become more dynamic due to virtualization and 5G?services. It will be much easier for operators to start doing this now as they transition to fully virtualized networks, as networks will become more complex as they evolve.
An intriguing solution to this problem can be found in concepts like Chaos Monkey and the?Simian Army, an open-source project by Netflix. These solutions were created to answer the question: Will the network survive a disaster? The solutions travel through a network and randomly shut down parts of a network at any given time including peak and non-peak traffic times. Netflix does this an average of fifty times per month and the timing is completely random so that the company cannot prepare ahead of time. This helps Netflix make sure their networks can deal with any network problem that may arise and design the network accordingly. Over time, this way of testing creates a network that is resilient to any disruption that may occur. Several OTT providers are already using such solutions in their cloud networks. Operators need to decide what is the total cost of a major outage versus training staff to operate and test using these radically different concepts.
In addition, operators need to think about how they can reduce costs during a disaster by leveraging the architecture of a cloud environment. In a cloud environment, standardized data centers run software that is transferable between virtual machines or between containers. In this environment, there is no longer a need for support contracts that say emergency staff will be in place within minutes to fix the problem.
If part of a network goes down, the software and data are switched to another data center owned by the operator or to a third-party cloud provider so that service is not disrupted.
This gives the operator time to fix the problem during normal business hours without the need or expense of an emergency team or emergency service clauses in contracts on hardware as are common today. In this scenario, there could be a maintenance staff in the data center that gets a list once a month that shows what needs to be fixed from a?hardware point of view. There is no longer a rush to make this happen given that another data center is already handling the traffic. This is one more way that the flexibility and agility of virtualization can help operators thrive in the future.
?
7.3.Model and Intent-Driven Orchestration
Another way to add flexibility and agility into a network is through the model and intent-based orchestration. As discussed in the standardization section, the IT groups involved in network virtualization wish to create a data model or system that can be used over and over where different inputs are added each time to create the needed outcome for that specific situation. This model is not as concerned with the individual components needed to make this happen. Instead, it uses intent as the main goal and lets the system figure out the best way to reach that goal.
For example, an operator wants to deliver the best quality high-speed video of a sporting event. This becomes that operator’s intent. This intent is given to the system, which then figures out the best way to make it happen based on the resources available at that time.?In this model, humans no longer need to figure out exactly what needs to be done. The system figures out the most efficient way to make it happen on its own.
An operator must trust that the system is better at orchestrating and fixing a network than they could do it themselves. As networks become more complicated, there will simply be too many scenarios for humans to deal with manually. The nice aspect of intent-based orchestration is that it saves operators from needing to figure out every possible scenario that could happen in every situation, such as trying to deliver the best quality high-speed video. Instead, operators can make overall decisions as events unfold and let the system decide the best way to resolve a network problem or achieve an intent-based goal. For this to work, operators must become comfortable with handing over control of the decision-making to a network.
What is right for each operator and how fast this intent-based orchestration can be put in place will depend on a few factors. Operators must look at their current installed base of vendors and see what can be retired from the network and replaced at what time.
The faster this happens on the path towards full virtualization, the faster intent-based orchestration can be put in place.
The speed at which this model can be leveraged will also depend on the abilities of the current employees of the operator. In the new intent-based model, operators will need employees who require little direction and can act on their own once given a goal or directive. Depending on how that employee base currently looks will determine how quickly an operator can make the transition to intent-based orchestration.
Employees must also know what questions to ask the system so that it starts working on the correct solution. Going back to the example of best quality high-speed video, if an employee asks the system to deliver the best quality video, the system may slow down the service or other services to make this happen. If an employee asks the system to deliver the lowest-latency video service, the system might degrade the video quality to meet the goal of low latency. The employees must be skilled enough to know how to ask the question to get the desired results.
?
7.4.Reactive Versus Proactive Assurance
Up until recently, most assurance has been reactive in that a fault is identified that has already happened in a network. But proactive assurance solutions are now being developed that can proactively look for areas in a network that have a high probability of breaking and notify operations before the fault happens. As networks become more complex and dynamic, it will be impossible for operators to manually keep tabs on all aspects of a?network regarding service assurance. Because of this, proactive assurance solutions are critical to a successful virtualization strategy.
If an operator knows how pieces of a network can break and have expectations of normal network performance that are accurate most of the time, then network failures can be predicted before they happen. This is how most proactive assurance works. For example, a?node might become slightly slower in responding; recent past performance levels give the operator a good idea of how that node should be performing based on current network levels. If the node is reaching its maximum capacity, the node responding slower is a?leading indicator that it could soon fail. An operator will have a small window of time to decide what to do.
More and more, the software is taking the action of what needs to happen in this scenario,?but if the operator is not yet at the level of intent-based operations, they will still need to apply a policy to that software to tell it what to do when this type of scenario arises. For example, should the system block some subscribers so that the subscribers already using the service can continue without degradation, or should it constrain the network, which degrades the service quality for everyone? Most operators would choose the first option in this example, but this is an area that operators must think about as systems become more complex and automated.
Another area of consideration is the data center. As mentioned previously, data centers can now scale up resources as they are needed by adding servers or components of servers, as is the case with SDI. In a fully virtualized and automated environment, policies should be put in place to tell the system what to do as leading indicators hit their trigger point to take action. For example, at what service level should the data center add more servers as traffic increases or take those servers offline as service levels decrease? This is known as scale in/ scale-out; systems will be able to do this on their own but need an operator to decide what the trigger points should be. Once the trigger point is determined, the system can then take the action needed whenever that trigger point scenario arises.
This idea can be taken one step further in that a virtualized and automated network can also be predictive when it scales in or out based on recent past trends. This ensures that a network proactively manages the capacity to accommodate upcoming usage. For example, a network might know that subscribers watch a lot of video at a certain time each Monday night based on past network usage. Knowing this information, a network could proactively scale in the resources needed slightly ahead of time to make sure the subscribers get the video quality they desire. Going back to the idea of intent-based systems, an operator in this scenario might tell the system to provide the best video quality to most users. The system would then look at its options based on currently available resources and initiate the needed network changes to make that intent happen.
As operators build out their 5G networks, proactive assurance should be incorporated into a?network to create more flexibility and resiliency, because the systems will be able to make decisions that not only keep services at optimal performance level but also keep network faults from happening in the first place.
?
7.5.Hybrid Operational Models
All of the items so far in this chapter can help an operator thrive in a fully virtualized and automated environment, but what can an operator do to ensure success as they transition from a legacy network to a fully virtualized one? As that transition happens, part of a?network will be DevOps or NetOps and the other part will be legacy systems. This creates a very complicated environment that operators must get through as quickly as possible.
To do so, operators must have clear goals of what the final network and organization should look like. What parts of the network are virtualized and automated? What does the employee count look like and what are their responsibilities? How open are the employees to change and how fast can that change happen without the employees deciding they want to leave the company? There must be clear goals on what should be optimized and built first based on an operator’s current network. Goals must also specify what gets optimized second, third, and so on.
The recommendation is that an operator takes their best employees and has them build a virtualized environment in a restricted market that can be used as a development area.?Most large operators already have trial markets that can be used for this purpose. If not, choose a city or region as a test market for virtualization. If it doesn’t go well, then the damage is minimal and doesn’t impact main markets or revenue streams. If it does go well, then it becomes a learning experience that can be applied to the network as a whole.
?
8.?????Zero Touch – The Vision of Full Automation
Getting to full automation is the ultimate goal of virtualization. But in the current operations models of most mobile operators, many tasks are both manual and repetitive.?These are often managed by strict processes at the lower tiers of the operations staff.
These models have been developed, tuned, and optimized for years. During periods of growth and high operational margins, the costs of operations were comparatively acceptable and most management attention was focused on other aspects of the business such as customer care, which often got significant scrutiny as each interaction was relatively expensive and led to customer churn if not properly managed.
As operational profits have declined further, cost scrutiny has become important over the last ten years. Most operators have worked step-by-step to identify significant cost drivers and have tried various methods to engineer them out one by one, often leading to stepwise improvements but not allowing the operator to move the needle enough to be comparable to the operational agility of OTTs. To achieve a comparable level of operational agility to an OTT, operators need to rethink everything from scratch—?starting with a blank sheet of paper and building up from there. This requires significant soul searching of what is desired in operational qualities, and operators will need to be able to properly identify the patterns and models required to make this happen going forward.
By the end of 2018, it is still unclear if any established operator can effectively turn their operational organization around and achieve full automation. It has been shown that greenfield operators, such as Reliance Jio, have been able to achieve full automation of most if not all of their operational processes. But this has happened with the added expense of an expanded DevOps or NetOps team. In the case of Reliance Jio, they acquired the specialist firm Radisys to build and extend the overall OSS and BSS landscape as well as the ability to build out their infrastructure.
The cost-effective solution to full automation might be ETSI’s zero-touch network and service management (ZSM). In a white paper from December 2017 (https://bit.ly/2TTf8PS), ?ETSI says, “The goal is to have all operational processes and tasks (e.g., planning and design,?delivery, deployment, provisioning, monitoring, and optimization) executed automatically,?ideally with 100% automation and without human intervention.” The white paper also points toward the critical nature of industry collaboration to drive standards, best practices,?and open-source of the overall industry requirements. This is a daunting task with a strong call to action. This publication has since drawn significant attention to the cause and has grown the membership of the ETSI ZSM industry specification group (ISG) to sixty-five companies and organizations (full list: https://bit.ly/2I5bW1h) with work ongoing. The first set of requirements has been published together with a draft target architecture (read the blog post: https://bit.ly/2TK6UJz) as seen in the diagram below.
The key driving principle of the ETSI ZSM ISG work is the separation of concerns into various domains. Any implementation of a zero-touch framework or architecture will likely be founded on cloud-based principles. The interested reader is encouraged to study the concept of the 12FactorApp by visiting https://12factor.net. Doing so will provide a better understanding of this model as well as the overall process.
An implementation following these principles would have the highest probability of success given the nature of the industry and the significant software development process maturity that has been driven by web-scale providers. Another significant resource to study to gain further understanding is the Google SRE book (https://bit.ly/2TKEFKZ). This provides a unique insight into how one of the world’s largest environments is built and operated, and more importantly, provides insight into the process and methodology behind the current operational model. Not every operator will operate at the scale of Google and will thus not leverage all of the same characteristics, but the overall concepts and thought process are highly leverageable.
Figure 5. E2E ZSM framework according to the ETSI ZSM ISG
?
As previously noted in this chapter, it is possible to take a different approach to the concept of automation based on the current assets of an operator. For example, if a small or mid-sized operator has not been excessively process-driven in their legacy operational model,?but instead focused on building strong expertise and has a group of talented operations staff, then it is possible to automate current processes extremely. The example given earlier about the Finnish operator Elisa illustrates that by taking the right steps based on the individual strengths of the local organization, one can achieve significant results.
The challenge is how larger, more siloed organizations will be able to succeed at this task.?Many of the incumbent mobile operators that grew up via the GSM to 4G transition have a significant number of aging staff who are expected to retire over the next five to ten years. This, combined with the challenge to attract new competent staff members, forces operators to focus on “brutal automation,” as Deutsche Telekom phrased it in October?2017. (Read about it in Light Reading: https://www.lightreading.com/automation/dt-brutal-automation-is-only-way-to-succeed/d/d-id/737111 )
As can be seen from the architecture proposed by ETSI or in the ONAP or OSM models,?automation is based on various decision and control loops that seek to find an optimal state that is defined in one or the other policy. One of the challenges is to be able to define the policy and the states well enough. This has evolved to a new concept of intent-based management covered briefly in the earlier parts of this chapter. The intent-based model is yet a further level of separation of concerns and must be built on the trust that the underlying system is capable and able to manage inside its domain and that the problem statement at hand is possible to manage inside the domain.
At this time, artificial intelligence (AI) and machine learning are hyped to become the main driver of automation and zero-touch networks. But it is less important what the underlying system performing the tasks is as long as the operational efficiencies can be achieved, at an acceptable cost structure.
?
?
?
Director Manager at A. Zarkan Co.
3 年???? ???? ???? ???
Top Voice in AI | CIO at TetraNoodle | Proven & Personalized Business Growth With AI | AI keynote speaker | 4x patents in AI/ML | 2x author | Travel lover ??
3 年Individuals and organizations can be significantly affected by the network strategies they develop to support their aims. In the business world, a key aspect of a network strategy is done through collaboration with other companies. A network strategy is an essential part of any company's business process. With a network strategy, you can effectively structure your company as a network as opposed to a chain. Once you have set the foundation for a network environment, you can construct a strong business foundation that can withstand changes in technology. Houman Sadeghi Kaji amazing post.