Case Study - BGP Routing Policy
Introduction
BGP Routing policy is a very interesting topic. I get asked about it formally and informally all the time. I have to admit, there are lots of ways to organize an autonomous system. Vendors have unique features and templating / procedural functions, but in the end, BGP routing policy all boils down to two+two things:
> For those prefixes accepted, ensure they have correct attributes.
> For those prefixes announced, ensure they have correct attributes.
At IPng Networks, I’ve cycled through a few iterations and landed on a specific setup that works well for me. It provides sufficient information to enable our downstream (customers) to make good decisions on what they should accept from us, as well as enough expressivity for them to determine which prefixes we should propagate for them, where, and how.
This article describes one approach to a relatively feature rich routing policy which is in use at IPng Networks (AS8298). It uses the?Bird2?configuration language, although the concepts would be implementable in ~any modern routing suite (ie. FRR, Cisco, Juniper, Arista, Extreme, et cetera).
Interested in one operator’s opinion? Read on!
1. Concepts
There are three basic pieces of routing filtering, which I’ll describe briefly.
Prefix Lists
A prefix list (also sometimes referred to as an access-list in older software) is a list of IPv4 of IPv6 prefixes, often with a prefixlen boundary, that determines if a given prefix is “in” or “out”.
An example could be:?2001:db8::/32{32,48}?which describes any prefix in the supernet?2001:db8::/32?that has a prefix length of anywhere between /32 and /48, inclusive.
AS Paths
In BGP, each prefix learned comes with an AS path on how to reach it. If my router learns a prefix from a peer with AS number?65520, it’ll see every prefix that peer sends as a list of AS numbers starting with 65520. With AS Paths, the very first one in the list is the one the router directly learned the prefix from, and the very last one is the origin of the prefix. Often times the prefix is shown as a regular expression, starting with?^?and ending with?$?and to help readability, spaces are often written as?_.
Examples:?^25091_1299_3301$?and?^58299_174_1299_3301$
BGP Communities
When learning (or originating) a prefix in BGP, zero or more so called?communities?can be added to it along the way. The?Routing Information Base?or?RIB?carries these communities and can share them between peering sessions. Communities can be added, removed and modified. Some communities have special meaning (which is agreed upon by everyone), and some have local meaning (agreed upon by only one or a small set of operators).
There’s three types of communities:?normal?communities are a pair of 16-bit integers;?extended?communities are 8 bytes, split into one 16-bit integer and an additional 48-bit value; and finally?large?communities consist of a triplet of 32-bit values.
Examples:?(8298, 1234)?(normal), or?(8298, 3, 212323)?(large)
Routing Policy
Now that I’ve explained a little bit about the ingredients we have to work with, let me share an observation that took me a few decades to make: BGP sessions are really all the same. As such, every single one of the BGP sessions at IPng Networks are generated with one template. What makes the difference between ‘Transit’, ‘Customer’ and ‘Peer’ and ‘Private Interconnect’, really all boils down to what types of filtering are applied on in- and outbound updates. I will demonstrate this by means of two main functions in Bird:?ebgp_import()?discussed first in the section?Inbound: Learning Routes?section, and?ebgp_export()?in the section?Outbound: Announcing Routes.
2. Inbound: Learning Routes
Let’s consider this function:
function ebgp_import(int remote_as) {
if aspath_bogon() then return false;
if (net.type = NET_IP4 && ipv4_bogon()) then return false;
if (net.type = NET_IP6 && ipv6_bogon()) then return false;
if (net.type = NET_IP4 && ipv4_rpki_invalid()) then return false;
if (net.type = NET_IP6 && ipv6_rpki_invalid()) then return false;
# Demote certain AS nexthops to lower pref
if (bgp_path.first ~ AS_LOCALPREF50 && bgp_path.len > 1) then bgp_local_pref = 50;
if (bgp_path.first ~ AS_LOCALPREF30 && bgp_path.len > 1) then bgp_local_pref = 30;
if (bgp_path.first ~ AS_LOCALPREF10 && bgp_path.len > 1) then bgp_local_pref = 10;
# Graceful Shutdown (RFC8326)
if (65535, 0) ~ bgp_community then bgp_local_pref = 0;
# Scrub BLACKHOLE community
bgp_community.delete((65535, 666));
return true;
}
The function works by order of elimination – for each prefix that is offered on the session, it will either be rejected (by means of returning?false), or modified (by means of setting attributes like?bgp_local_pref) and then accepted (by means of returning?true).
AS-Path Bogon?filtering is a way to remove prefixes that have an invalid AS number in their path. The main example of this are private AS numbers (64496-131071) and their 32 bit equivalents (4200000000-4294967295). In case you haven’t come across this yet, AS number 23456 is also magic, see?RFC4893?for details:
function aspath_bogon() {
return bgp_path ~ [0, 23456, 64496..131071, 4200000000..4294967295];
}
Prefix Bogon?comes next, as certain prefixes that are not publicly routable (you know, such as?RFC1918, but there are many others). They look differently for IPv4 and IPv6:
function ipv4_bogon() {
return net ~ [
0.0.0.0/0, # Default
0.0.0.0/32-, # RFC 5735 Special Use IPv4 Addresses
0.0.0.0/0{0,7}, # RFC 1122 Requirements for Internet Hosts -- Communication Layers 3.2.1.3
10.0.0.0/8+, # RFC 1918 Address Allocation for Private Internets
100.64.0.0/10+, # RFC 6598 IANA-Reserved IPv4 Prefix for Shared Address Space
127.0.0.0/8+, # RFC 1122 Requirements for Internet Hosts -- Communication Layers 3.2.1.3
169.254.0.0/16+, # RFC 3927 Dynamic Configuration of IPv4 Link-Local Addresses
172.16.0.0/12+, # RFC 1918 Address Allocation for Private Internets
192.0.0.0/24+, # RFC 6890 Special-Purpose Address Registries
192.0.2.0/24+, # RFC 5737 IPv4 Address Blocks Reserved for Documentation
192.168.0.0/16+, # RFC 1918 Address Allocation for Private Internets
198.18.0.0/15+, # RFC 2544 Benchmarking Methodology for Network Interconnect Devices
198.51.100.0/24+, # RFC 5737 IPv4 Address Blocks Reserved for Documentation
203.0.113.0/24+, # RFC 5737 IPv4 Address Blocks Reserved for Documentation
224.0.0.0/4+, # RFC 1112 Host Extensions for IP Multicasting
240.0.0.0/4+ # RFC 6890 Special-Purpose Address Registries
];
}
function ipv6_bogon() {
return net ~ [
::/0, # Default
::/96, # IPv4-compatible IPv6 address - deprecated by RFC4291
::/128, # Unspecified address
::1/128, # Local host loopback address
::ffff:0.0.0.0/96+, # IPv4-mapped addresses
::224.0.0.0/100+, # Compatible address (IPv4 format)
::127.0.0.0/104+, # Compatible address (IPv4 format)
::0.0.0.0/104+, # Compatible address (IPv4 format)
::255.0.0.0/104+, # Compatible address (IPv4 format)
0000::/8+, # Pool used for unspecified, loopback and embedded IPv4 addresses
0100::/8+, # RFC 6666 - reserved for Discard-Only Address Block
0200::/7+, # OSI NSAP-mapped prefix set (RFC4548) - deprecated by RFC4048
0400::/6+, # RFC 4291 - Reserved by IETF
0800::/5+, # RFC 4291 - Reserved by IETF
1000::/4+, # RFC 4291 - Reserved by IETF
2001:10::/28+, # RFC 4843 - Deprecated (previously ORCHID)
2001:20::/28+, # RFC 7343 - ORCHIDv2
2001:db8::/32+, # Reserved by IANA for special purposes and documentation
2002:e000::/20+, # Invalid 6to4 packets (IPv4 multicast)
2002:7f00::/24+, # Invalid 6to4 packets (IPv4 loopback)
2002:0000::/24+, # Invalid 6to4 packets (IPv4 default)
2002:ff00::/24+, # Invalid 6to4 packets
2002:0a00::/24+, # Invalid 6to4 packets (IPv4 private 10.0.0.0/8 network)
2002:ac10::/28+, # Invalid 6to4 packets (IPv4 private 172.16.0.0/12 network)
2002:c0a8::/32+, # Invalid 6to4 packets (IPv4 private 192.168.0.0/16 network)
3ffe::/16+, # Former 6bone, now decommissioned
4000::/3+, # RFC 4291 - Reserved by IETF
5f00::/8+, # RFC 5156 - used for the 6bone but was returned
6000::/3+, # RFC 4291 - Reserved by IETF
8000::/3+, # RFC 4291 - Reserved by IETF
a000::/3+, # RFC 4291 - Reserved by IETF
c000::/3+, # RFC 4291 - Reserved by IETF
e000::/4+, # RFC 4291 - Reserved by IETF
f000::/5+, # RFC 4291 - Reserved by IETF
f800::/6+, # RFC 4291 - Reserved by IETF
fc00::/7+, # Unicast Unique Local Addresses (ULA) - RFC 4193
fe80::/10+, # Link-local Unicast
fec0::/10+, # Site-local Unicast - deprecated by RFC 3879 (replaced by ULA)
ff00::/8+ # Multicast
];
}
That’s a long list!! But operators on the?DFZ?should really never be accepting any of these, and we should all collectively yell at those who propagate them.
RPKI Filtering?is a fantastic routing security feature, described in?RFC6810?and relatively straight forward to implement. For each?originating?AS number, we can check in a table of known?<origin,prefix>?mapping, if it is the correct ISP to originate the prefix. The lookup can either match (which makes the prefix RPKI valid), the lookup can fail because the prefix is missing (which makes the prefix RPKI unknown), and it can specifically mismatch (which makes the prefix RPKI invalid). Operators are encouraged to flag and drop?invalid?prefixes:
function ipv4_rpki_invalid() {
return roa_check(t_roa4, net, bgp_path.last) = ROA_INVALID;
}
function ipv6_rpki_invalid() {
return roa_check(t_roa6, net, bgp_path.last) = ROA_INVALID;
}
NOTE: In NLNOG my post sparked a bit of debate on the use of?bgp_path.last_nonaggregated?versus simply?bgp_path.last. Job Snijders did some spelunking and offered?this post?and a reference to?RFC6907?for details, and Tijn confirmed that Coloclue (on which many of my approaches have been modeled) indeed uses?bgp_path.last. I’ve updated my configs, with many thanks for the discussion.
Alright, now that I’ve determined the as-path and prefix are kosher, and that it is not known to be hijacked (ie. is either?ROA_VALID?or?ROA_UNKNOWN), I’m ready to set a few attributes, notably:
Alright, based on this one template, I’m now ready to implement all three types of BGP session:?Peer,?Upstream, and?Downstream.
Peers
function ebgp_import_peer(int remote_as) {
# Scrub BGP Communities (RFC 7454 Section 11)
bgp_community.delete([(8298, *)]);
bgp_large_community.delete([(8298, *, *)]);
return ebgp_import(remote_as);
}
It’s dangerous to accept communities for my own AS8298 from peers. This is because several of them can actively change the behavior of route propagation (these types of communities are commonly called?action?communities). So with peering relationships, I’ll just toss them all.
Now, working my way up to the actual BGP peering session, taking for example a peer that I’m connecting to at LSIX (the routeserver, in fact) in Amsterdam:
filter ebgp_lsix_49917_import {
if ! ebgp_import_peer(49917) then reject;
# Add IXP Communities
bgp_community.add((8298,1036));
bgp_large_community.add((8298,1,1036));
accept;
}
protocol bgp lsix_49917_ipv4_1 {
description "LSIX IX Route Servers (LSIX)";
local as 8298;
source address 185.1.32.74;
neighbor 185.1.32.254 as 49917;
default bgp_med 0;
default bgp_local_pref 200;
ipv4 {
import keep filtered;
import filter ebgp_lsix_49917_import;
export filter ebgp_lsix_49917_export;
receive limit 100000 action restart;
next hop self on;
};
};
Parsing this through: the ipv4 import filter is called?ebgp_lsix_49917_import?and its job is to run the whole kittenkaboodle of filtering I described above, and then if the?ebgp_import_peer()?function returns false, to simply drop the prefix. But if it is accepted, I’ll tag it with a few communities. As I’ll show later, any other peer will receive these communities if I decide to propagate the prefix to them. This is specifically useful for downstream (customers), who can decide to accept/deny the prefix based on a wellknown set of communities we tag.
IXP Community: If the prefix is learned at an IXP, I’ll add a large community?(8298,1,*)?and backwards compat normal community?(8298,10XX).
One last thing I’ll note, and this is a matter of taste, is for most peering prefixes picked up at internet exchanges (like LSIX), are typically much cheaper per megabit than the transit routes, so I will set a default?bgp_local_pref?of 200 (higher localpref is more likely to be selected as the active route).
领英推荐
Upstream
An interesting observation: from Peers and from Upstreams I typically am happy to take all the prefixes I can get (but see the epilog below for an important note on this). For a Peer, this is mostly “their own prefixes” and for a Transit, this is mostly “all prefixes”, but there’s things in the middle, say partial transit of “all prefixes learned at IXP A B and C”. Really, all inbound sessions are very similar:
function ebgp_import_upstream(int remote_as) {
# Scrub BGP Communities (RFC 7454 Section 11)
bgp_community.delete([(8298, *)]);
bgp_large_community.delete([(8298, *, *)]);
return ebgp_import(remote_as);
}
… is in fact identical to the?ebgp_import_peer()?function above, so I’ll not discuss it further. But for the sessions to upstream (==transit) providers, it can make sense to use slightly different BGP community tags and a lower localpref:
filter ebgp_ipmax_25091_import {
if ! ebgp_import_upstream(25091) then reject;
# Add BGP Large Communities
bgp_large_community.add((8298,2,25091));
# Add BGP Communities
bgp_community.add((8298,2000));
accept;
}
protocol bgp ipmax_25091_ipv4_1 {
description "IP-Max Transit";
local as 8298;
source address 46.20.242.210;
neighbor 46.20.242.209 as 25091;
default bgp_med 0;
default bgp_local_pref 50;
ipv4 {
import keep filtered;
import filter ebgp_ipmax_25091_import;
export filter ebgp_ipmax_25091_export;
next hop self on;
};
};
Again, a very similar pattern; the only material difference is that the inbound prefixes are tagged with an?Upstream Community?which is of the form?(8298,2,*)?and backwards compatible?(8298,20XX). Downstream customers can use this, if they wish, to select or reject routes (maybe they don’t like routes coming from AS25091, although they should know better because IP-Max rocks!).
The other slight change here is the?bgp_local_pref?is set to 50, which implies that it will be used only if there are no alternatives in the?RIB?with a higher localpref, or with a similar localpref but shorter as-path, or many other scenarios which I won’t get into here, because BGP selection criteria 101 is a whole blogpost of its own.
Downstream
That brings us to the third type of BGP sessions – commonly referred to as customers except that not everybody pays :) so I just call them?downstreams:
function ebgp_import_downstream(int remote_as) {
# We do not scrub BGP Communities (RFC 7454 Section 11) for customers
return ebgp_import(remote_as);
}
Here, I have a special relationship with the?remote_as, and I do not scrub the communities, letting the downstream operator set whichever they like. As I’ll demonstrate in the next chapter, they can use these communities to drive certain types of behavior.
Here’s how I use this?ebgp_import_downstream()?function in the full filter for a downstream:
# bgpq4 -Ab4 -R 24 -m 24 -l 'define AS201723_IPV4' AS201723
define AS201723_IPV4 = [
185.54.95.0/24
];
# bgpq4 -Ab6 -R 48 -m 48 -l 'define AS201723_IPV6' AS201723
define AS201723_IPV6 = [
2001:678:3d4::/48,
2001:67c:6bc::/48
];
filter ebgp_raymon_201723_import {
if (net.type = NET_IP4 && ! (net ~ AS201723_IPV4)) then reject;
if (net.type = NET_IP6 && ! (net ~ AS201723_IPV6)) then reject;
if ! ebgp_import_downstream(201723) then reject;
# Add BGP Large Communities
bgp_large_community.add((8298,3,201723));
# Add BGP Communities
bgp_community.add((8298,3500));
accept;
}
protocol bgp raymon_201723_ipv4_1 {
local as 8298;
source address 185.54.95.250;
neighbor 185.54.95.251 as 201723;
default bgp_med 0;
default bgp_local_pref 400;
ipv4 {
import keep filtered;
import filter ebgp_raymon_201723_import;
export filter ebgp_raymon_201723_export;
receive limit 94 action restart;
next hop self on;
};
};
OK, so this is a mouthful, but the one thing that I really need to do with customers is ensure that I only accept prefixes from them that they’re supposed to send me. I do this with a?prefix-list?for IPv4 and IPv6, and in the importer, I simply reject any prefixes that are not in the list. From then on, it looks very much like a peer, with identical filtering and tagging, except now I’m using yet another?Customer Community?which starts with?(8298,3,*)?and a vanilla?(8298,3500)?community. Anybody who wishes to, can act on the presence of these communities to know that it’s a downstream of IPng Networks AS8298.
A note on Peers and Downstreams:
Some ISPs will not peer with their customers (as in: once you become a transit customer they will terminate all BGP sessions at public internet exchanges), and I find that silly. However, for me the situation becomes a little bit more complex if I were to have AS201723 both as a Downstream (as shown here) as well as a Peer (which in fact, I do, at multiple Amsterdam based internet exchanges). Note how the?bgp_local_pref?is 400 on this session, and it will always be lower on other types of sessions. The implication is that this prefix from the?RIB?which carries?(8298,3,201723)?will be selected, and the ones I learn from LSIX will carry?(8298,1,*)?and the ones I learn from A2B (a transit provider) will carry?(8298,2,51088)?and both will not be selected due to those having a lower localpref. As I’ll demonstrate below, I can make smart use of these communities when announcing prefixes to my own peers and upstreams, … read on :)
3. Outbound: Announcing Routes
Alright, the?RIB?is now filled with lots of prefixes that have the right localpref and communities, for example from having been learned at an IXP, from an Upstream, or from a Downstream. Now let’s consider the following generic exporter:
function ebgp_export(int remote_as) {
# Remove private ASNs
bgp_path.delete([64512..65535, 4200000000..4294967295]);
# Well known BGP Large Communities
if (8298, 0, remote_as) ~ bgp_large_community then return false;
if (8298, 0, 0) ~ bgp_large_community then return false;
# Well known BGP Communities
if (0, 8298) ~ bgp_community then return false;
if (remote_as < 65536 && (0, remote_as) ~ bgp_community) then return false;
# AS path prepending
if ((8298, 103, remote_as) ~ bgp_large_community ||
(8298, 103, 0) ~ bgp_large_community) then {
bgp_path.prepend( bgp_path.first );
bgp_path.prepend( bgp_path.first );
bgp_path.prepend( bgp_path.first );
} else if ((8298, 102, remote_as) ~ bgp_large_community ||
(8298, 102, 0) ~ bgp_large_community) then {
bgp_path.prepend( bgp_path.first );
bgp_path.prepend( bgp_path.first );
} else if ((8298, 101, remote_as) ~ bgp_large_community ||
(8298, 101, 0) ~ bgp_large_community) then {
bgp_path.prepend( bgp_path.first );
}
return true;
}
Oh, wow! There’s some really cool stuff to unpack here. As a belt-and-braces type safety, I will remove any private AS numbers from the as-path - this avoids my own announcements from tripping any as-path bogon filtering. But then, there’s a few well-known communities that help determine if the announcement is made or not, and there are three-and-a-half ways of doing this:
All four of these methods will tell the router to refuse announcing the prefix on this session. Note that downstreams are allowed to set?(8298,*,*)?and?(8298,*)?communities (and they’re the only ones who are allowed to do so). So here is where some of the cool magic starts to happen.
Then, to drive prepending of the prefix on this session, I’ll again match certain communities?(8298, 103, *)?will prepend the customer’s AS number three times, using?102?will prepend twice, and?101?will prepend once. If the third digit is?0, then any session with this filter will prepend. If the third digit is the AS number, then only sessions to this AS number will be prepended.
Using these types of communities allow downstream (customers) incredibly fine grained propagation actions, at the per-IPng-session level. Not many ISPs offer this functionality!
Peers
Exporting to peers, I really need to make sure that I don’t send too many prefixes. Most of us have at some point gone through the embarassing motions of being told by a fellow operator “hey you’re sending a full table”. It is paramount to good peering hygiene that I do not leak. So I’ll define a healthy set of?defense in depth?principles here:
# bgpq4 -A4b -R 24 -m 24 -l 'define AS8298_IPV4' AS8298
define AS8298_IPV4 = [ 92.119.38.0/24, 194.1.163.0/24, 194.126.235.0/24 ];
# bgpq4 -A6bR 48 -m 48 -l 'define AS8298_IPV6' AS8298
define AS8298_IPV6 = [ 2001:678:d78::/48, 2a0b:dd80::/29{29,48} ];
# bgpq4 -A4b -R 24 -m 24 -l 'define AS_IPNG_IPV4' AS-IPNG
define AS_IPNG_IPV4 = [ ... ## Removed for brevity ];
# bgpq4 -A6bR 48 -m 48 -l 'define AS_IPNG_IPV6' AS-IPNG
define AS_IPNG_IPV6 = [ .. ## Removed for brevity ];
# bgpq4 -t4b -l 'define AS_IPNG' AS-IPNG
define AS_IPNG = [112, 8298, 50869, 57777, 60557, 201723, 212323, 212855];
function aspath_first_valid() {
return (bgp_path.len = 0 || bgp_path.first ~ AS_IPNG);
}
# A list of well-known tier1 transit providers
function aspath_contains_tier1() {
return bgp_path ~ [
174, # Cogent
209, # Qwest (HE carries this on IXPs IPv6 (Jul 12 2018))
701, # UUNET
702, # UUNET
1239, # Sprint
1299, # Telia
2914, # NTT Communications
3257, # GTT Backbone
3320, # Deutsche Telekom AG (DTAG)
3356, # Level3
3549, # Level3
3561, # Savvis / CenturyLink
4134, # Chinanet
5511, # Orange opentransit
6453, # Tata Communications
6762, # Seabone / Telecom Italia
7018 ]; # AT&T
}
# The list of our own uplink (transit) providers
# Note: This list is autogenerated by our automation.
function aspath_contains_upstream() {
return bgp_path ~ [ 8283,25091,34549,51088,58299 ];
}
function ipv4_prefix_valid() {
# Our (locally sourced) prefixes
if (net ~ AS8298_IPV4) then return true;
# Customer prefixes in AS-IPNG must be tagged with customer community
if (net ~ AS_IPNG_IPV4 &&
(bgp_large_community ~ [(8298, 3, *)] || bgp_community ~ [(8298, 3500)])
) then return true;
return false;
}
function ipv6_prefix_valid() {
# Our (locally sourced) prefixes
if (net ~ AS8298_IPV6) then return true;
# Customer prefixes in AS-IPNG must be tagged with customer community
if (net ~ AS_IPNG_IPV6 &&
(bgp_large_community ~ [(8298, 3, *)] || bgp_community ~ [(8298, 3500)])
) then return true;
return false;
}
function prefix_valid() {
# as-path based filtering
if !aspath_first_valid() then return false;
if aspath_contains_tier1() then return false;
if aspath_contains_upstream() then return false;
# prefix (and BGP community) based filtering
if (net.type = NET_IP4 && !ipv4_prefix_valid()) then return false;
if (net.type = NET_IP6 && !ipv6_prefix_valid()) then return false;
return true;
}
function ebgp_export_peer(int remote_as) {
if !prefix_valid() then return false;
return ebgp_export(remote_as);
}
Wow, alrighty then!! All I’m doing here is checking if the call to?prefix_valid()?returns true. That function isn’t very complex. It takes a look at three as-path based filters and then a prefix-list based filter. Let’s go over them in turn:
aspath_first_valid()?takes a look at the first hop in the as-path. I need to make sure that I’ve received this prefix from an actual downstream, and those are collected in a RIPE?as-set?called?AS-IPNG. So if the first BGP hop in the path is not one of these, I’ll refuse to announce the prefix.
aspath_contains_tier1()?is a belt-and-braces style check. How on earth would I provide transit for any prefix for which there’s already a global?Tier1?provider in the path? I mean, in no universe would AS174 or AS1299 need me to reach any of their customers, or indeed, any place in the world. So this filter helps me never announce the prefix, if it has one of these ISPs in the path.
aspath_contains_upstream()?similarly, if I am receiving a full table from an upstream provider, I should not be passing this prefix along - I would for similar reasons never be a transit provider for A2B or IP-Max or Meerfarbig. Due to a bug in my configuration, my buddy Erik kindly pointed out this issue to me, so hat-tip to him for the intelligence.
ipv[46]_prefix_valid()?is the main thrust of prefix-based filtering. At this point we’ve already established that the as-path is clean, but it could be that the downstream is sending prefixes they should not (possibly leaking a full table) so let’s take a look at a good way to avoid this.
So before I were to announce anything on such a session, all?four?of as-path, inbound prefix-list, outbound prefix-list and bgp-community are checked. This makes it incredibly unlikely that AS8298 ever leaks prefixes – knock on wood!
Upstream
Interestingly and if you think about it, unsurprisingly, an upstream configuration is exactly identical to a peer:
function ebgp_export_upstream(int remote_as) {
if !prefix_valid() then return false;
return ebgp_export(remote_as);
}
Alright, nothing to see here, moving on …
Downstream
Now the difference between a Peer and an Upstream on the one hand, and a Downstream on the other, is that the former two will only see a very limited set of prefixes, heavily guarded by all of that filtering I described. But a downstream typically has the luxury of getting to learn every prefix I’ve learned:
function ipv4_acceptable_size() {
if net.len < 8 then return false;
if net.len > 24 then return false;
return true;
}
function ipv6_acceptable_size() {
if net.len < 12 then return false;
if net.len > 48 then return false;
return true;
}
function ebgp_export_downstream(int remote_as) {
if (source != RTS_BGP && source != RTS_STATIC) then return false;
if (net.type = NET_IP4 && ! ipv4_acceptable_size()) then return false;
if (net.type = NET_IP6 && ! ipv6_acceptable_size()) then return false;
return ebgp_export(remote_as);
}
So here I’ll assert that the prefix has to be either from the?RTS_BGP?source, or from the?RTS_STATIC?source. This latter source is what Bird uses for locally generated routes (ie. the ones in AS8298 itself). Locally generated routes are not known from BGP, but known instead because they are blackholed / null-routed on the router itself. And from these routes, I further deselect those prefixes that are too short or too long, which are slightly different based on address family (IPv4 is anywhere between /8-/24 and for IPv6 is anywhere between /12-/48).
Now, I will note that I’ve seen many operators who inject OSPF or connected or static routes into BGP, and all of those folks will have to maintain elaborate egress “bogon” route filters, for example for those IXP prefixes that they picked up due to them being directly connected. If those operators would simply not propagate directly connected routes, their life would be so much simpler .. but I digress and it’s time for me to wrap up.
Epilog
I hope this little dissertation proves useful for other Bird enthusiasts out there. I myself had to fiddle a bit over the years with the idiosyncracies (and bugs) of Bird and Bird2. I wanted to make a few comments:
Sr. Network Architecture and Engineering Consultant -|- [email protected]
1 年Excellent outline on the policy and code. This topic reminds me of a book I read on this topic, I believe suggested by Ivan P. way back in 2014. "The Internet Peering Playbook" by William Norton. I am curious if that playbook has been updated and do you folks on this topic thread refer to it for ideas as well.
Lead of core network team
1 年what about blackhole shorter than exact match in prfx list?
The Internet works because a lot of people cooperate to do things together. - Jon Postel
1 年Bird's routing policy is one of its kind, Its Brilliant !!!
On mission to deliver affordable DDoS protection
1 年Nice! Do you use IPFIX on VPP? We recently had issues with their implementation on Netgate TNSR
Network Expert Engineer
1 年Why Bird out of interest over FRR or Quagga?