How to understand the operation and maintenance problems and challenges brought by cloud architecture?

How to understand the operation and maintenance problems and challenges brought by cloud architecture?

Fancy Wang 1705 2022

Operation and maintenance problems and challenges brought by cloud architecture

Under the wave of "Internet +" in the financial industry, in order to support the rapid launch of business systems, Longhua expansion and higher SLA requirements, coupled with the priority IT operation and maintenance budget, the operation and maintenance personnel will face greater operation and maintenance than before. dimensional pressure. When operating and maintaining a highly complex cloud data center with massive equipment, how to improve high-quality network services, improve efficiency and reduce costs is the biggest challenge the operation and maintenance team is currently facing.

No alt text provided for this image

Compared with the traditional network, the network in the SDN era has the following characteristics: dynamic network, dynamic refers to the creation and deletion of logical networks on demand according to application requirements; real-time response, the design of traditional network is mainly human-oriented interface, such as using dozens of The SNMP mechanism of 2019, this slow mechanism has become a weakness in the fast pace of the SDN era; large-scale, large-scale has two meanings, one is that the number of managed devices is large, from physical network elements to logical network elements. The number has increased by a factor of 50, and the second is the high number of failures handled.

No alt text provided for this image


Due to the development of new services and the application of new technologies, the architecture of cloud-based DCN and the objects of operation and maintenance have undergone tremendous changes. The changes in data center structure and technology have brought great difficulties to the operation and maintenance work, which is embodied in The following aspects.

Virtualization and distributed computing have brought great challenges to data center operation and maintenance. In traditional data centers, computing and storage resources are solidified, traffic is mainly north-south traffic, and operations and maintenance are concentrated within a single data center. In a cloud data center, in order to improve resource utilization, resources are shared under the cloud architecture. Cloud computing achieves a balance between resource sharing, user experience and business availability through automatic elastic scaling strategies, which is one of the core advantages of cloud computing. However, this also brings new requirements and challenges for operation and maintenance, that is, the operation and maintenance personnel often do not know which hardware the business system is running on, and it becomes very difficult to locate faults.

Traditional operation and maintenance methods can no longer meet the needs of massive ICT resource management. Traditional data centers use the method of "manual configuration + traditional network management + tools" for network maintenance. Cloud DCN services span multiple physical data centers. The equipment scale of cloud data centers ranges from tens, hundreds to tens of thousands, and millions Order of magnitude myopia, the use of massive hardware devices brings huge challenges to the rapid location and isolation of hardware faults, and O&M becomes more and more complicated. Cloud DCN adopts "cloud platform + controller + operation and maintenance platform" to automatically issue services, and uses a professional operation and maintenance platform for operation and maintenance.

No alt text provided for this image

Traditional O&M methods do not have monitoring based on application flow, logical network (tunnel, BD), no physical path monitoring of VTEP, and the monitoring period is long (5MIN). In addition, monitoring a large number of network elements poses a challenge to the monitoring system.

The boundary between IT and CT is becoming more and more blurred, and it is difficult to locate and delimit problems. SDN uses the overlay technology to realize network virtualization. There are multiple equal-cost paths between VMs. When a fault occurs, it is impossible to accurately know the server, VM access location, physical network, virtual network, and logical network segmentation. Especially in the scenario of frequent VM migration, network professionals cannot accurately know the VM location. Currently, they can only view the port description on the access switch, which is inefficient and inaccurate.

In addition, when a fault occurs, it is impossible to determine whether the problem is caused by the network, storage or server. It is necessary to check each problem one by one, which wastes a lot of time and manpower, and is inefficient. Networks need automated means to self-diagnose. When the automation of multi-component cooperation fails, it is manifested as failure to create a network/subnet/interface/VM, VM cannot apply for an IP address, server access timeout, packet loss, etc. There are many reasons, which may be cloud platform. Inconsistent with controller data, or inconsistency between controller and network device configuration.

The configuration is relatively complicated. Currently, the basic network of the cloud-based DCN adopts manual configuration or batch configuration through the network management, and the service configuration is automatically delivered by the controller. When a fault occurs, it is impossible to locate the problem of manual configuration or automatic configuration as soon as possible.

No alt text provided for this image

In a word, from traditional DCN to cloud-based DCN, due to the use of technologies such as virtualization and distributed computing, the boundaries of network, computing and storage are blurred, and it is difficult to demarcate the boundaries. Compared with traditional DCN, cloud-based DCN has an exponential increase in the amount of data, and network failures Difficult to locate and isolate quickly.

In addition to the above problems, the operation and maintenance of cloud-based DCN also has the following five trends.

  • Mass operation and maintenance requires network automation, capable of collecting and processing massive information, supporting second-level monitoring, and accurate configuration.
  • Intelligent, refers to intelligent alarm, flexible scheduling, fault self-healing, unattended, intelligent fault handling.
  • Refined O&M requires end-to-end monitoring, refined management, and O&M analysis.
  • Visualization, realizing the visualization of the physical topology, logical topology, forwarding path and forwarding quality of tenants and tenant services.
  • Service-oriented refers to the productization of operation and maintenance services, development and opening of operation and maintenance service interfaces, and Yonghua can customize operation and maintenance methods.

We are a 100G switch with Nos, 100G module/network card factory in Shenzhen, China. We can provide you with one-stop service on products, transportation, customs clearance, and tariffs.

要查看或添加评论,请登录

Shenzhen 10Gigabit Ethernet Technology Co.,ltd的更多文章

社区洞察

其他会员也浏览了