登录查看更多内容

How SD-WAN improves Mean Time To Repair: WHILE outage { CASE detect(); diagnose(); resolve(); i++ }

Fusion Broadband South Africa

Our proprietary Antares & Illuminate portal transforms SD-WAN site deployment, management & support.

发布日期: 2022年6月15日

The important consideration when working on improving Mean Time to Repair (MTTR) is to understand the time in between. It is not about an outage occurring at a specific time and the link coming back online at another time. The above is what is meant by MTTR but to have a meaningful conversation about it more information is required. In the context of software defined wide area networks (SD-WAN), a comparison needs to be made between how a SD-WAN deployment would have improved MTTRs above that of a legacy wide area network (WAN) installation using old school routers.

Based on risk mitigations and industry norms, ISPs often contract SLA's based on these MTTRs. A poorly managed MTTR can result in heavy penalties or having to incur additional costs by correcting excessive times using more resources (either headcount or automated service tools) which might not be optimal. Another negative consequence would be customer churn.

Incident life cycle

To understand the times involved in MTTR we need to fully understand all the steps that happen from outage to repair, which in ITIL terms is often referred to as the incident life cycle. Here are the steps at a high level:

Outage occurs;
The outage is detected either by human notification or automated systems such as Network Management Systems;
A process of diagnosis occurs whereby resources determine the outage causation and repair process. During this step, a number of tools can potentially assist. Causation can be immediate (visual), intermediate (underlying) or root (underpinning);
Typically when the underlying causation is determined a repair can be initiated.
If appropriate a workaround might be available to temporary return the link/connectivity to service as a short term alternative while normal operations are completed at a later stage;
The link is ready for repair when diagnosis is complete, the repair process determined and any logistics such as delivery of spare parts/components completed;
The components that have caused the outage are then repaired and this includes restoring the required configuration for normal operations; and
The link starts operating normally again when traffic starts flowing again over the link in a manner similar to before the outage.

Programmatically this would be:

WHILE outage {

step; i++

}

The video below explains in in greater detail and it it can also be applied to security related incidents:

领英推荐

Frost & Sullivan applauds Vi Business’s SIP Trunking…

Vi Business India 1 年前

Understanding the CrowdStrike Outage: Causes, Effects,…

Zync. 7 个月前

The Impact of a Connected World: CrowdStrike Outage…

Ingenuity Group 7 个月前

The SD-WAN architecture inherently improves the MTTR in a number of ways. The connectivity is controlled and managed from aggregators / concentrators located in data centres. Thus unlike a legacy distributed wide area network, any link outage is immediately detected by the aggregators / concentrators without the requirement of a remote polling system.

Configuration

The setup and configuration of a SD-WAN is simplistic at an administrative level. There are no realms of text to copy and past via telnet/ssh sessions. The diagnosis is immediately partitioned between the lower transport protocol levels versus the high connectivity protocol levels. SD-WAN makes this diagnosis immediately apparent and there is not extended finger pointing between layer 2 or 3 which so often befalls legacy wide area network deployments.

Logistics

Logistics and spare parts is common across SD-WAN and legacy wide area network deployments and is not necessarily better optimised in either scenario. However, since SD-WAN hardware is more likely to be built using white box instead of proprietary hardware there is a potential improvement in overall parts availability. Another benefit of SD-WAN is that the diagnosis and management ability of the product set is more update which will result in a greater success rate of first resolutions with rolling wheels. One of the biggest curses of current legacy WAN installations is the disproportionate number of second visits required by rolling wheels due to component mismatches. Some of these installations have been in the field for years and the new stock often does not inter-operate with what is in the field.

Automation

The restore of the link is extremely optimized and automated within SD-WAN. This is as a result of the simplistic provisioning mechanism used to initially deploy SDWAN and leveraged to restore service. It automatically connects to the aggregator / concentrator, downloads the configuration and service is restored. In a legacy environment there is a often a process required of laptops using specialised cables, remote session consoles over 3G such as Teamviewer, and the cursed cut and paste required with legacy consoles. The skill level for remote hands in SD-WAN is thus less and therefore more readily available.

SDWAN links are often deployed whereby multiple paths and mediums are utilised. Given this inherent ability, a workaround is more readily available in SD-WAN deployments than with legacy WAN installations. In my situations, SD-WAN protects the overall availability as when more than one last mile is in place, it is unlikely that they are all suffering for outages simultaneously!

At a basic and practical level SD-WAN improves MTTR. Any contributions and comments welcomed.

This article was written by Ronald Bartels who works connecting Internet inhabiting things at Fusion Broadband.

How SD-WAN improves Mean Time To Repair: WHILE outage { CASE detect(); diagnose(); resolve(); i++ }

Fusion Broadband South Africa

Our proprietary Antares & Illuminate portal transforms SD-WAN site deployment, management & support.

Incident life cycle

领英推荐

Configuration

Logistics

Automation

Fusion Broadband South Africa的更多文章

社区洞察

其他会员也浏览了

Staying Ahead of the Curve: Proactive Maintenance with IT AMCs

Say Goodbye to Downtime

LightEdge Company News - March 2024

Why Telecoms Repairs Matter

Have you a contingency plan for your business if you have a power outage ?,

The 8 Expert-Backed Secrets to Balancing System Uptime and Critical Updates (Without Breaking a Sweat)

Speed and security from essential redundancy

There's many reasons why the network diagram is old, missing, untrusted...

5 common critical infrastructure problems our customers face

Incident life cycle

领英推荐

Configuration

Logistics

Automation

Fusion Broadband South Africa的更多文章

Fusion SD-WAN Traffic Visualisation: Unmasking Cyber and Network Incidents

Fusion's Secure SD-WAN: A Robust Response to the Challenges of the New Work Environment

Elevate Your Internet Experience with Fusion's Secure SD-WAN ??

Unlocking Excellence: Fusion Broadband's Award-Winning 2nd Generation SD-WAN Network Solution ??

Navigating IT Risk: A Prudent Approach for South African Businesses

Bridging the Connectivity Gap: A Testimonial on iTried and Fusion Broadband South Africa's SD-WAN Solution

Is your Internet solution an ugly baby?

The primary goal of SD-WAN is making networking less complicated for the business

The business of SD-WAN from the top down!

Exception handling in the field for telecommunications by applying the IF THEN ELSE of SD-WAN

社区洞察

其他会员也浏览了

Staying Ahead of the Curve: Proactive Maintenance with IT AMCs

Say Goodbye to Downtime

LightEdge Company News - March 2024

Why Telecoms Repairs Matter

Have you a contingency plan for your business if you have a power outage ?,

The 8 Expert-Backed Secrets to Balancing System Uptime and Critical Updates (Without Breaking a Sweat)

Speed and security from essential redundancy

There's many reasons why the network diagram is old, missing, untrusted...

5 common critical infrastructure problems our customers face