MSTP review - part.1
Putting it to context: Why the need for MSTP?
Historically STP as defined by 802.1D was created to circumvent bridging loops, the protocol as defined considers one instance to wich all defined VLANs are mapped which results on a poor utilisation of resources in a redundant topologies. Cisco introduced PVST which enhanced the previous standard by first running over ISL and after PVST+ which added compatibility for Dot1q.
The issue with the proposition made by Cisco is that we started with One ring to rule them all (aka 802.1D) to a perfect state of fairness with One STP instance for each VLAN. Even if it permits an amazing freedom for tweaking paths, it is not simple at big scale and the resource needed for maintainin' all that instances is considerable.
To make an emphasis on the overkill aspect of it, let's say we have a network of 3 switches, the possible logical topologies in a fully-meshed network is n**(n-2) (multiple links between two switches are not considered for simplicity :)) which gives us 3 possible topologies. If we had 20 VLANs, PVST/PVST+ RPVST would create 20 instances where 17 of them are obviously redundant. (more on STP, Kirchhoff's matrix tree generalization and Cayley's formula).
Cisco introduced MISTP (Multiple Instance Spanning Tree) where the idea was to abstract the VLANs using Instances. The idea is that, the instance will behave as a container for a set or range of VLANs. This concepte made its way to standardization and here we are with MSTP (IEEE 802.1s).
Now to compare the 3 approach let consider 30 VLANs for a physical topology of 3 interconnected switches, as discussed above we can have only 3 different logical topologies:
Path 1: Switch A --> Switch B --> Switch C
Path 2: Switch A --> Switch C --> Switch B
Path 3: Switch C --> Switch A --> Switch B
- 802.1D and RSTP we will use only one topology either 1, 2 or 3, resulting on a link witch will be unused.
- Cisco Per VLAN approach creats 30 STP instances that are fully independent where we can assign 10 vlans to path 1, 10 vlans to path 2 and 10 to path 3. Resulting in a load-sharing far better than 802.1D / RSTP (In fact, it is optimal as prediction of the load can be) but the burden of maintaining 30 instancies of STP is not negligible.
- For MSTP the instance creation is not tied to the VLAN, they are administratively created containers. In this case, we creat, three instances to which we assign the VLANs as we would like to. Resulting in a better use of topology resources by traffic engineering and a better use of computing resource.
How does MSTP work?
Frist of all we must define some common language. MSTP has two concepts:
- Intra-region: how the MSTP operates and the processes happening inside each region.
- Inter-region: how each MSTP region interacts with each other and how the backword compatibility with STP and RSTP is handled.
1 - MST Region:
Defined as all the switches that share the same configuration identifier, meaning that they appear to the external topology as a single switch, to be more clear if we consider a topology which has three regions, even if each one of them has 10 switches, from a global topology view each region will see the other two as being one switch. In a sense a region hides its complexity to other swhithes (MST, RSTP or STP).
For two switches to become members of the same region, the following attributes must be identical:
- Configuration name, a variable length string (max 32 bytes)
- Revision level (Cisco: configuration revision number) (two bytes)
- VLANs to instance mapping
- Format selector, one byte field encoded with 0 (for future use) , always True.
An important, observation to make is: How switches know that the mapping of VLANs to Instances are matching? Here lies an important difference between pre-standard Cisco MISTP and MSTP 802.1s.
The IEEE standerd, introduce a concept of MST configuration table which containe 4096 consicutive two bytes elements to represent the VLAN to INSTANCE mapping. For example, lets say we have vlan 1, 2, and 3 in Instance 1 and vlan 4, 5, and 6 in Instance 2. The resulting table should look as follows: (1, 1) (1,2) (1,3) (2,4) (2,5) (2,6).
This MST configuration table is hashed to creat 16 bytes HMAC-MD5 signature. The receiving switch compares the hashes and if they are the same this means that the mapping is identical. Note that in the standard there is a comment about the probability of having the same hash for different MST config tables (due to the space of the hash) but the probability is negligible, plus the configuration name and the revision level can also be used to segregate the regions.
2 - Internal Spanning Tree (IST):
Its in itself an instance of MSTP. In the standard and in any compliant implementation, the IST is referred to as the MST Instance zero (MSTI0), previously we discussed how two switches can findout that they have the same mapping (Instances to vlans), and it was via a hash that is added to the MST BPDU. For that the IST or MSTI0 is the only instance that carries that info, in fact IST is the only instance that can send or receive MST BPDUs inside a region. But if the IST is the only instance which can send and receive STP BPDUs how the other MSTIs (1 to 4094) works?
IST work like a carrier for all the MSTIs. Lets consider a "region A" with 10 switches: All the switches to be part of that region A must agree on the attributes discussed earlier. All the switches agree to exchange their information based on the timers of the IST, and each other MSTI configuration is contained in what is called an M-record (MST Record), which creats a neat and elegant solution to the massive number of BPDUs that would have been necessary to exchange between the switches.
We can think of it this way. All the switches agree to use the rules defined by the IST, each switch sends any complementary information about how the MSTIs are configured locally via its Instance 0. So the IST can be though of as a representative.
3 - Common Spanning Tree (CST):
The CST is the global topology, consider that each region behaves like a virtual switch. The CST is the STP which connects all MST regions, 802.1w and 802.1D switches together. This means that if we have two regions each with 5 switches inside and one 802.1w switch and one 802.1D switch, from the point of view of the CST topology we have 4 nodes, meaning that the 5 switches in the regions are obfuscated from CST.
4 - Common and Internal Spanning Tree (CIST):
Defined in Cisco doc as "A CIST is a collection of the ISTs in each MST region. The CIST is the same as an IST inside an MST region and the same as a CST outside an MST region.". The way I understood this (I hope.) is by first asking why the need to CIST, so lets beging:
We know that from an outside view the CST will take care of creating the topology, on the inside the IST (MSTI0) is used to propagate the information about each MSTI to create different possible topologies depending on the need. BUT the CST is just a simple spanning tree after all, so the QUESTION is:
Given that a region may have multiple switches which are shrunk down to a single virtual switch from the point of view of the CST, What is the BID of that virtual switch? Or simply put, How CST do its work?. This is where the CIST comes to play by defining two roles:
- CIST ROOT
- CIST Regional ROOT
Inside a given region, each switch at initialization will declare itself as CIST Root and CIST Regional Root, via IST. The switches on the region A will elect who is the CIST Root and the CIST Regional Root, it is as simple root election can be. Note that the CIST Root and CIST Regional Root is the same at this point, because the selection is based on the IST parameters, not the MSTIs instances, hence the definition "The CIST is the same as an IST inside an MST region."
Now let say in region A; one of the switches was elected as CIST Root and CIST Regional Root. What if on the region B the switch how was elected as CIST Root and CIST Regional Root has a superior BPDU than the switch in the region A?
What will happen is that in the region B all will stay as is, and in the region A the switch which is the nearest to the region B will becom the CIST Regionl Root. The process will be as follow:
- In region A, we have the CIST Root BID, it can be any switch on the region. The region itself is connected to the region B via switches that are at edge of region A. Each switch in region A that has a port connected to another switch which is NOT on the same region will be denoted as "Boundary port". On each boundary port the BID of the elected switch to be the CIST Root will be sent to the neighboring region (region B).
- On the region B the same process as in region A is donne.
- Let say that "Switch-1" which is part of region A located at the edge, receives a superior BPDU compared to the elected switch on its region. Switch-1 will forward the information in IST, and promote itself to the role of CIST Regional Root because costwise it is the nearest switch of its region to the CIST Root.
- In region B, nothing will change because the CIST Root is in that region.
Recap:
- On a given topology, the switches are divided in regions, switches on a region have to share the same attributes.
- On each region an election is processed to elect the CIST Root/Regional Root, based on its IST.
- Each switch that has an interface at the "boundary" of its region will send the BPDU to participate in the CST election.
- When the CIST Root "the root of all the switches" is defined, the nearest switch to the region where the CIST Root resides will be the CIST Regional Root of that region, even if on a local scope meaning internal to its region the switch has the worst BID. This is due to the fact that the election of the CIST Regional Root is done thought the lowest root path cost (as usual in common spanning tree).
- Each region can have internal topologies to manage VLAN traffic through changing parameters of each MSTI.