Enhancing the Availability and Reliability of ISP Network Infrastructures
Introduction
This article presents relevant thoughts about the correct sizing of one of the most primary "functional towers" of an ISP's networking infrastructure. Among the leading key business performance indicators (KPI) of an ISP, two are viewed as primaries: "Performance" and "Availability." The reasons for these determinations are pretty simple, as we analyze the effects of each case as follows:
As we can see, these two indicators are the basic ones in networking infrastructures, as subscribers/customers can easily perceive those when using data, video, or voice services.
This article focuses more closely on the relationship between the Availability and Reliability indicators ("functional towers" of a technical design). I will comment more closely on other areas such as Performance and others in future articles.
Definition of the Availability Concept and other Peripheral Functional Towers
I mainly treat Availability as a "functional tower," as it is possible to identify, typify, categorize, and merge sets of technical specifications and processes into this concept, leading to much better network uptime altogether. This strategy includes the design of proper physical, electromechanical, and logical specifications (i.e., hardware in redundant configuration; power and cooling requirements; reliability block diagram, clusters of devices, network links, etc.). Then blend it all with software-level systemic approaches that include services, resources, or facilities, such as protocols and the sort, to increase the desired state of the Availability indicator. Improving this indicator means a whole new thing regarding customer satisfaction, competitiveness, and infrastructure costs!
Availability aims to provide the obvious: ideally, whenever a user (customer or subscriber, whatever you prefer to call them) wants to use the contracted product or service, it is available, ready for whatever the interests of that user. On the other hand, whenever the user tries to access something online and it is unavailable, its downtime frequency characterizes it, and we all know what happens next. Networks are not immune to failures, so we need to predict and anticipate these incidents so that service restoration times meet users' expectations and tolerances.
The Availability indicator is affected by combining two other functional towers that participate in the same proposed mission, supporting each other, which satisfies users with their contracted services. These disciplines would be Reliability and Resilience, respectively.
When studying the concepts of computer network reliability, we can identify issues such as manufacturing quality of networking gear, and the presence or lack of specialized technologies, both physical and logical, in addition to other mechanisms, peripheral resources, and processes that participate in aggregating the intended redundancy + reliability + resiliency = availability set. In my personal view, the reliability of a network by itself is also an indicator of a functional tower of its own. Still, it adds up positively to the overall (and desired) state of network availability.
Resilience, in turn, is related to how a device and the network as a whole react in situations where infrastructure failures (link, devices) occur, whether these failures are equipment components or incidents of logical context.
I particularly like to treat these three as follows: the intended beacon indicator is Availability, which can be calculated and improved by sets of technological specifications derived from the principles of Reliability and Resilience.
The Challenges of Providers in the Question of Network Availability
Internet Service Providers (ISP) need to understand the fundamentals of redundancy + reliability + resilience = availability with absolute clarity so that their infrastructures can be modified to meet or exceed their customers' expectations and desired service level aggreements. Among the many challenges, we can list some situations or truths on the subject:
How much availability does your network infrastructure need, and how much are you willing to pay for it?
One thing that may not seem obvious to many individuals and companies: way too much redundancy can be terrible because, in addition to significantly increasing the costs of the project and the infrastructure as a whole, it makes the logical functions of the network equally way too complex. Think about it! And it can even become a problem to your intended network uptime and operational management goals.
Perhaps one of the biggest challenges here is designing a redundant, reliable, and resilient infrastructure with the desired/ideal Availability indicator or state. The choice of quantity or quality (of redundancy) in a network cannot be treated as "how do you like your steak done?" (rare, medium, well-done), analogies here; that is, it is not exactly a matter of personal choice. Infrastructure projects aiming at better availability need to have confident and ideal physical and logical redundancy standards, which cannot be too scarce or excessive. The costs of adopting these approaches must be understood and compatible with the business missions and strived outcomes. And the same rules must apply to the financial reality of the network operator (you or your company).
领英推荐
Here's my first tip:
Determine DOWNTIME COSTS first, then determine and balance the Availability costs. It will be easier for you to accept the harsh reality of the investments required when you clearly understand the business impacts of a failure, whether it's a simple low-spectrum annoyance failure or a catastrophe on your network.
Practice precisely the above three questions before you even try to design your next infrastructure project!
Matching Downtime Costs versus Availability Costs
Above all, seek to identify and quantify the following impacts on your business.
Immediate impacts:
Long-term impacts:
Check out the unfolding of this story in the full version of this article, available on the Wiki do Brasil Peering Forum (BPF), written in Brazilian Portuguese:
https://wiki.brasilpeeringforum.org/w/Aprimorando_a_Disponibilidade_da_rede_do_ISP
In this full version of the article, I present some critical fundamentals about MTBF, MTTR, MDT, concepts of parallel and serial physical and logical redundancy, technological facilities (protocols, services), and many ideas related to this subject. Ultimately, where all this falls in and affects or adds positively to the availability metrics.
Let me know your thoughts about this subject!
Until next time!
Leonardo Furtado
Network Engineer
2 年O Mestre
Business Development and Management | TI + Sales | Prospecting + Pre + Sales + After | Partnerships + Teams | Outsourcing + Services + Consultancy | PUCCAMP + FGV + LABDATA/FIA/USP + IDESP/DARYUS + UNICAMP |
3 年Very good!!!
IT Network Analyst and Consultant / Consultor e Analista de Redes e TIC
3 年Leo, esse material é OURO PURO! Obrigado por compartilhar!
Network&Telecommunications Analyst and CyberSecurity | CCNP 300-410 | CCNA 200-301 | HCIA R&S | AWS Cloud Practitioner | ITIL4 | BIG-IP F5 | DWDM | GPON | NSEs | Autist Father
3 年Very good.