BGP timers and convergence

BGP timers and convergence

This post will explore the process of BGP convergence and its interaction with IGP to achieve network convergence. In any large-scale network, route reflectors (RRs) are typically employed, so this discussion will focus on setups that incorporate RRs. In a full mesh network, all paths are available, enabling hot potato routing and rapid convergence, though this comes with challenges in scalability and management. It's also possible to combine full mesh and RRs, such as running a full mesh within a point of presence (PoP) while using RRs for central peering.BGP is utilized for both internal (iBGP) and external (eBGP) peerings, with different convergence behaviors and timer settings depending on the type of peering.As a path vector protocol, BGP functions similarly to a distance vector protocol, advertising only routes installed in the RIB.By default, BGP installs only the best path into the RIB, even when multiple equal candidates exist, and it will only advertise this selected route

BGP relavant timers

When BGP establishes its initial peer connection, it activates a timer known as update-delay. By default, this timer is set to 120 seconds. During this period, the BGP best path algorithm does not execute until either the timer expires or the peer signals that all routes have been transmitted. The peer can indicate completion by sending either a BGP Keepalive or a BGP End of RIB message, with the latter often used in conjunction with graceful restart (GR). The update-delay timer helps to reduce the frequency of BGP best path calculations, streamline update generation, and enhance the efficiency of packing routes into TCP segments. It’s important to note that the update-delay timer is triggered only when the first peer is established, and it typically requires a large volume of routes to exceed the 120-second duration.

Another key timer in BGP is the Minimum Route Advertisement Interval (MRAI). For iBGP, this timer defaults to 0 seconds, allowing updates to be sent immediately. In contrast, for eBGP, the MRAI is set to 30 seconds, meaning updates may be delayed by up to 30 seconds. The MRAI's main role is to minimize route churn and reduce the volume of BGP updates, though this can lead to slower convergence. The MRAI can be adjusted for each neighbor individually. It’s important to note that the MRAI timer monitors the time elapsed since the last set of updates was sent; if this period exceeds 30 seconds, updates can be dispatched without further delay.

BGP Next Hop Tracking

BGP uses a mechanism known as the BGP Scanner to periodically check the BGP table and verify the reachability of BGP next hops. For MPLS VPNs, this process also handles the import and export of routes into a VRF. By default, the BGP Scanner runs every 60 seconds. However, this fixed interval can cause delays in detecting changes in the network. A BGP prefix generated by a node (e.g. node A) is valid if its Next Hop remains accessible, which in a classic scenario is learned via an IGP protocol. If a remote router (e.g. Node B) experiences a failure (e.g., power or hardware failure), the IGP would quickly recognize that next hop for that prefix is no longer valid. However, BGP, which relies on the BGP Scanner, would not immediately recognize that router is down. Since BGP timers are typically set to 60 seconds for keepalives and 180 seconds for the hold time, it could take up to three minutes for BGP to detect it. The core issue here is that the BGP Scanner process is not event-driven. BGP Next Hop Tracking (NHT) addresses this by essentially turning the BGP Scanner into an event-driven process. BGP NHT reacts to various events, such as changes in next-hop reachability, alterations in the IGP metric to the next hop, and more. In IOS-XR, a distinction is made between critical and non-critical events (e.g., a next-hop change is critical, while a metric change is non-critical), whereas in IOS, the system responds the same way to both. In the previous example the Node e.g B receives alerts from the Routing Information Base (RIB) whenever there's a change in next-hop details, enabling BGP to quickly react to updates from the IGP. If a next-hop becomes unreachable, the corresponding route is marked as invalid. However, BGP NHT introduces a brief delay before prompting BGP to scan the table and adjust routes. This delay, which depends on the platform and operating system, allows the IGP time to disseminate information, consolidate events, and complete convergence. On Cisco IOS, the delay is typically 5 seconds for any event, while on IOS-XR, it's 3 seconds for critical events and 10 seconds for non-critical ones.

PIC Edge and PIC Core: Enhancing BGP Convergence

In large-scale networks, fast convergence is critical to maintaining high availability and minimizing downtime. Two important concepts in achieving fast convergence within BGP are PIC Edge (Prefix Independent Convergence Edge) and PIC Core (Prefix Independent Convergence Core). These mechanisms are designed to improve BGP convergence times significantly in the event of a network failure, particularly for networks using MPLS (Multiprotocol Label Switching).

PIC Edge

PIC Edge is a mechanism that improves convergence times for edge routers in a network. Typically, when a failure occurs in a network, BGP routers must recalculate the best path for each prefix, which can be time-consuming. PIC Edge, however, allows routers to precompute backup paths for each prefix. These backup paths are stored in the forwarding information base (FIB), enabling the router to quickly switch to the backup path if the primary path fails, without having to wait for a full BGP convergence.

PIC Edge is particularly beneficial in scenarios where a large number of prefixes are advertised from a single edge router, as it enables rapid failover by using precomputed alternate paths. This mechanism is especially useful in service provider networks where edge routers handle thousands of customer routes.

PIC Core

While PIC Edge focuses on the edge of the network, PIC Core deals with improving convergence within the network's core. Similar to PIC Edge, PIC Core precomputes backup paths but does so for the core routers. In the event of a core link or node failure, core routers can instantly switch to the precomputed backup paths, thereby minimizing packet loss and downtime.

PIC Core is particularly effective in MPLS networks where the core network is responsible for forwarding traffic across multiple paths. By precomputing alternate paths within the MPLS core, PIC Core ensures that traffic can be rerouted quickly, maintaining high levels of network resilience and reducing the impact of core failures on the overall network.

Relevant RFCs

Several RFCs provide the foundation for the principles and implementation of PIC Edge and PIC Core:

  1. RFC 5714 - "IPv6 Address Specific BGP Extended Community Attribute" This RFC discusses the extended community attribute for BGP, which can be utilized in scenarios involving PIC mechanisms, although it is not exclusively about PIC.
  2. RFC 7432 - "BGP MPLS-Based Ethernet VPN" This RFC outlines the use of BGP in MPLS networks, which is directly relevant to how PIC Core can be implemented in MPLS environments.
  3. RFC 8277 - "Using BGP to Bind MPLS Labels to Address Prefixes" RFC 8277 covers how BGP can be used to distribute MPLS labels, which is crucial for the functioning of PIC Core in MPLS networks.
  4. RFC 4090 - "Fast Reroute Extensions to RSVP-TE for LSP Tunnels"

Although primarily focused on fast reroute in MPLS TE, the concepts here are related to how rapid failover mechanisms like PIC Core can operate

要查看或添加评论,请登录

Giovanni Iavarone的更多文章

  • Rendering by Jinja2

    Rendering by Jinja2

    Jinja2 is a powerful templating engine for Python, widely used in network automation to generate configuration files…

  • Yaml: a begginers's guide

    Yaml: a begginers's guide

    YAML (YAML Ain’t Markup Language) is a human-readable data serialization format often used for configuration files and…

  • Against the Syn Flood attack

    Against the Syn Flood attack

    One of the most common DoS (Denial of Service) attacks is TCP SYN flooding. In this attack, the hacker sends an initial…

  • Understanding Layer 3 DDoS Attacks: SYN Flood, Smurf Attack, and ICMP Flooding

    Understanding Layer 3 DDoS Attacks: SYN Flood, Smurf Attack, and ICMP Flooding

    Layer 3 Distributed Denial of Service (DDoS) attacks are a serious threat to network infrastructure, exploiting…

  • RFC 7381, starting IPv6 for Enterprise

    RFC 7381, starting IPv6 for Enterprise

    Understanding RFC 7381: Enterprise IPv6 Deployment Guidelines Introduction The adoption of IPv6 is a crucial step for…

  • Security aspects of SRv6

    Security aspects of SRv6

    This is my first article about SRv6 security. I must admit that the study is not simple but very vast.

    1 条评论
  • IPv6 SLAAC and RFC 4941: Enhancing Network Efficiency and Privacy

    IPv6 SLAAC and RFC 4941: Enhancing Network Efficiency and Privacy

    Introduction to IPv6 and SLAAC The Internet Protocol version 6 (IPv6) was developed to replace the older IPv4 due to…

  • SRv6 and IPFIX working togheter

    SRv6 and IPFIX working togheter

    Combining SRv6 and IPFIX netflow can be a successful strategy to create a monitoring and analytics tool for your…

    4 条评论
  • Network topology

    Network topology

    I often discuss the best network architectures for data centers or ISP environments. The answer is always "It depends!"…

  • RFC 5549 and BGP dynamic Neighbor for full IPv6 fabric

    RFC 5549 and BGP dynamic Neighbor for full IPv6 fabric

    In the previous article I talked about the possibility of creating a data center with only IPv6. BGP dynamic neighbor…

    2 条评论

社区洞察

其他会员也浏览了