Operating AS112
I’m one of those people who is a fan of low-latency and high performance distributed service architectures. After building out the IPng Network across europe, I did notice a rather stark difference in presence of one particular service: AS112 anycast nameservers. In particular, I only have one Internet Exchange in common with a direct presence of AS112, FCIX in California. Big-up to the kind folks in Fremont who operate www.as112.net.
The Problem
Looking around Switzerland, no internet exchanges actually have AS112 as a direct member and as such you’ll find the service tucked away behind several ISPs, with AS paths such as 13030 29670 112, and 6939 112 or 34019 112. A traceroute from a popular swiss ISP, Init7 will go to Germany, at a roundtrip latency of 18.9ms. My own latency is 146ms as my queries are served from FCIX:
pim@spongebob:~$ traceroute prisoner.iana.org
traceroute to prisoner.iana.org (192.175.48.1), 64 hops max, 40 byte packets
1 fiber7.xe8.chbtl0.ipng.ch (194.126.235.33) 2.658 ms 0.754 ms 0.523 ms
2 1790bre1.fiber7.init7.net (81.6.42.1) 1.132 ms 1.077 ms 3.621 ms
3 780eff1.fiber7.init7.net (109.202.193.44) 1.238 ms 1.162 ms 1.188 ms
4 r1win12.core.init7.net (77.109.181.155) 2.096 ms 2.1 ms 2.1 ms
5 r1zrh6.core.init7.net (82.197.168.222) 2.086 ms 3.904 ms 2.183 ms
6 r1glb1.core.init7.net (5.180.135.134) 2.043 ms 3.621 ms 2.088 ms
7 r2zrh2.core.init7.net (82.197.163.213) 2.353 ms 2.522 ms 2.289 ms
8 r2zrh2.core.init7.net (5.180.135.156) 2.08 ms 2.299 ms 2.202 ms
9 r1fra3.core.init7.net (5.180.135.173) 7.65 ms 7.582 ms 7.546 ms
10 r1fra2.core.init7.net (5.180.135.126) 7.928 ms 7.831 ms 7.997 ms
11 r1ber1.core.init7.net (77.109.129.8) 19.395 ms 19.287 ms 19.558 ms
12 octalus.in-berlin.a36.community-ix.de (185.1.74.3) 18.839 ms 18.717 ms 29.615 ms
13 prisoner.iana.org (192.175.48.1) 18.536 ms 18.613 ms 18.766 ms
pim@chumbucket:~$ traceroute blackhole-1.iana.org
traceroute to blackhole-1.iana.org (192.175.48.6), 30 hops max, 60 byte packets
1 chbtl1.ipng.ch (194.1.163.67) 0.247 ms 0.158 ms 0.107 ms
2 chgtg0.ipng.ch (194.1.163.19) 0.514 ms 0.474 ms 0.419 ms
3 usfmt0.ipng.ch (194.1.163.23) 146.451 ms 146.406 ms 146.364 ms
4 blackhole-1.iana.org (192.175.48.6) 146.323 ms 146.281 ms 146.239 ms
This path goes to FCIX because it’s the only place where AS8298 picks up AS112 directly, at an internet exchange, and therefore the localpref will make this route preferred. But that’s a long way to go for my DNS queries!
I think I can do better.
Introduction
Taken from RFC7534:
Many sites connected to the Internet make use of IPv4 addresses that are not globally unique. Examples are the addresses designated in RFC 1918 for private use within individual sites.
Deployment
It’s actually quite straight forward, the deployment consists of roughly three steps:
Let me discuss each in turn.
Hardware
For the hardware, I’ve decided to use existing server platform at IPng Networks which runs typically on Dell PowerEdge R630 or R730 (depending on the need for more or less disks/SSDs).
Considering each vendor ships specific parts and each are different, many appliance vendors choose to virtualize their environment such that the guest operating system finds a very homogenous configuration. For my purposes, the virtualization platform is Qemu/KVM and the guest is a (para)virtualized Debian.
I will be starting with three nodes, one in Geneva, one in Zurich, and one in Amsterdam, hosted on hypervisors of IPng. I have a feeling a few more places will follow.
Install the OS
KVM makes this repeatable and straight forward. Other systems, such as Xen, have very similar installers, for example VMBuilder is popular. Both work roughly the same way, and install a guest in a matter of minutes.
I’ll install to a ZFS block device volume on all machines, backed by pairs of SSD for throughput and redundancy. I give the guest 4GB of memory and 4 CPUs, and I grab a stock Debian Bookworm image that I maintain to cut down on the time to bring up a new VM. I love how the machine boots, fully on serial, and is fully booted and running in 20 seconds.
[email protected]:~$ ssh [email protected] 'zfs send ssd-vol0/bookworm-proto-disk0' | pv | sudo zfs recv ssd-vol1/libvirt/as112-disk0
[email protected]:~$ virsh start --console as112-chrma0
After logging in, the following additional software was installed. I'm going to be using Bird2, which comes by default on Debian Bookworm. Otherwise, the machines are pretty vanilla:
pim@as112-chrma0:~$ sudo apt update
pim@as112-chrma0:~$ sudo apt install tcpdump sudo net-tools \
bridge-utils nsd bird2 netplan.io traceroute ufw curl \
bind9-dnsutils
pim@as112-chrma0:~$ sudo apt purge ifupdown
I removed the /etc/network/interfaces approach and configured Netplan, a personal choice, which aligns the machines more closely with other servers in the IPng fleet. I'll add the machines to Ansible as well, to ensure the configuration is consistent. But since Ansible is a whole other ball of yarn, I'll keep describing that for another day.
With regards to the AS112 node itself, really the only trick is to ensure that the anycast IP addresses are available for the nameserver to listen on, so at the top of Netplan’s configuration file, I will add them like so:
network:
version: 2
renderer: networkd
ethernets:
lo:
addresses:
- 127.0.0.1/8
- ::1/128
- 192.175.48.1/32 # prisoner.iana.org (anycast)
- 2620:4f:8000::1/128 # prisoner.iana.org (anycast)
- 192.175.48.6/32 # blackhole-1.iana.org (anycast)
- 2620:4f:8000::6/128 # blackhole-1.iana.org (anycast)
- 192.175.48.42/32 # blackhole-2.iana.org (anycast)
- 2620:4f:8000::42/128 # blackhole-2.iana.org (anycast)
- 192.31.196.1/32 # blackhole.as112.arpa (anycast)
- 2001:4:112::1/128 # blackhole.as112.arpa (anycast)
Nameserver
My nameserver of choice is NSD, and its configuration is similar to BIND, which is described in RFC7534. In fact, the zone files are identical, so all I need to do is create a few listen statements and load up the zones:
$ cat << EOF | sudo tee /etc/nsd/nsd.conf.d/listen.conf
server:
ip-address: 127.0.0.1
ip-address: ::1
ip-address: 46.20.249.197
ip-address: 2a02:2528:a04:202::197
ip-address: 192.175.48.1 # prisoner.iana.org (anycast)
ip-address: 2620:4f:8000::1 # prisoner.iana.org (anycast)
ip-address: 192.175.48.6 # blackhole-1.iana.org (anycast)
ip-address: 2620:4f:8000::6 # blackhole-1.iana.org (anycast)
ip-address: 192.175.48.42 # blackhole-2.iana.org (anycast)
ip-address: 2620:4f:8000::42 # blackhole-2.iana.org (anycast)
ip-address: 192.31.196.1 # blackhole.as112.arpa (anycast)
ip-address: 2001:4:112::1 # blackhole.as112.arpa (anycast)
server-count: 4
EOF
$ cat << EOF | sudo tee /etc/nsd/nsd.conf.d/as112.conf
zone:
name: "hostname.as112.net"
zonefile: "/etc/nsd/master/db.hostname.as112.net"
zone:
name: "hostname.as112.arpa"
zonefile: "/etc/nsd/master/db.hostname.as112.arpa"
zone:
name: "10.in-addr.arpa"
zonefile: "/etc/nsd/master/db.dd-empty"
# etcetera
EOF
While all of the zones are captured by db.dd-empty or db.dr-empty, which can be found in the RFC text, I’ll note the top two are special, as they are specific to the instance. For example on my Geneva instance:
领英推荐
$ cat << EOF | sudo tee /etc/nsd/master/db.hostname.as112.arpa
$TTL 1W
@ SOA as112.chplo01.ipng.ch. noc.ipng.ch. (
1 ; serial number
1W ; refresh
1M ; retry
1W ; expire
1W ) ; negative caching TTL
NS blackhole.as112.arpa.
TXT "AS112 hosted by IPng Networks" "Geneva, Switzerland"
TXT "See https://www.as112.net/ for more information."
TXT "See https://ipng.ch/ for local information."
TXT "Unique IP: 194.1.163.147"
TXT "Unique IP: [2001:678:d78:7::147]"
LOC 46 9 55.501 N 6 6 25.870 E 407.00m 10m 100m 10m
This is super helpful to users, who want to know which server, exactly, is serving their request. Not all operators added the Unique IP details, but I found it useful when launching the service, as several anycast nodes quickly become confusing otherwise :-)
After this is all done, the nameserver can be started. I rebooted the guest for good measure, and about 19 seconds later (a fact that continues to amaze me), the server was up and serving queries, albeit only from localhost because there is no way to reach the server on the network, yet.
To validate things work, I can perform a few SOA or TXT queries, like this one:
pim@as112-nlams1:~$ ping -c5 -q prisoner.iana.org
PING prisoner.iana.org(prisoner.iana.org (2620:4f:8000::1)) 56 data bytes
--- prisoner.iana.org ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 34ms
rtt min/avg/max/mdev = 0.041/0.045/0.053/0.004 ms
pim@as112-nlams1:~$ dig @prisoner.iana.org hostname.as112.net TXT +short +norec
"AS112 hosted by IPng Networks" "Amsterdam, The Netherlands"
"See https://www.as112.net/ for more information."
"Unique IP: 94.142.241.187"
"Unique IP: [2a02:898:146::2]"
Network
Now comes the fun part! I will be running these instances of the nameservers in a few locations, and to ensure that I don’t route traffic to the incorrect location, I will announce them using BGP as per recommendation of RFC7534.
My choice of routing suite is Bird2, which comes with a lot of extensibility and a programmatic validation of routing policies.
I decide to only use static and BGP routing protocols for Bird, so the configuration is relatively straight forward. First, I create a routing table export for IPv4 and IPv6, then define some static Nullroutes, which ensure that our prefixes are always present in the RIB (otherwise BGP will not export them), then I create some filter functions (one for routeserver sessions, one for peering sessions, and one for transit sessions), and finally I include a few specific configuration files, one-per-environment where the nameserver will be active.
pim@as112-chrma0:~$ cat << EOF | sudo tee /etc/bird/bird.conf
router id 46.20.249.197;
protocol kernel fib4 {
ipv4 { export all; };
scan time 60;
}
protocol kernel fib6 {
ipv6 { export all; };
scan time 60;
}
protocol static static_as112_ipv4 {
ipv4;
route 192.175.48.0/24 blackhole;
route 192.31.196.0/24 blackhole;
}
protocol static static_as112_ipv6 {
ipv6;
route 2620:4f:8000::/48 blackhole;
route 2001:4:112::/48 blackhole;
}
include "bgp-freeix.conf";
include "bgp-ipng.conf";
include "bgp-ipmax.conf";
EOF
The configuration file per environment, say bgp-freeix.conf, can (and will) be autogenerated, but the pattern is of the following form:
pim@as112-chrma0:~$ cat << EOF | tee /etc/bird/bgp-freeix.conf
#
# Bird AS112 configuration for FreeIX
#
define my_ipv4 = 185.1.205.252;
define my_ipv6 = 2001:7f8:111:42::70:1;
protocol bgp freeix_as51530_1_ipv4 {
description "FreeIX - AS51530 - Routeserver #1";
local as 112;
source address my_ipv4;
neighbor 185.1.205.254 as 51530;
ipv4 {
import where fn_import_routeserver( 51530 );
export where proto = "static_as112_ipv4";
import limit 120000 action restart;
};
}
protocol bgp freeix_as51530_1_ipv6 {
description "FreeIX - AS51530 - Routeserver #1";
local as 112;
source address my_ipv6;
neighbor 2001:7f8:111:42::c94a:1 as 51530;
ipv6 {
import where fn_import_routeserver( 51530 );
export where proto = "static_as112_ipv6";
import limit 120000 action restart;
};
}
# etcetera
EOF
If you’ve seen IXPManager’s approach to routeserver configuration generators, you’ll notice I borrowed the fn_import() function and its dependents from there. This allows imports to be specific towards prefix-lists, as-paths and ensure some Belts and Braces checks are in place (no invalid or tier1 ASN in the path, a valid nexthop, no tricks with AS path truncation, and so on).
After bringing up the service, the prefixes make their way into the routeserver and get distributed to the FreeIX participants:
pim@as112-chrma0:~$ sudo systemctl start bird
pim@as112-chrma0:~$ sudo birdc show protocol
BIRD 2.0.12 ready.
Name Proto Table State Since Info
fib4 Kernel master4 up 2021-06-28 11:01:35
fib6 Kernel master6 up 2021-06-28 11:01:35
device1 Device --- up 2021-06-28 11:01:35
static_as112_ipv4 Static master4 up 2021-06-28 11:01:35
static_as112_ipv6 Static master6 up 2021-06-28 11:01:35
freeix_as51530_1_ipv4 BGP --- up 2021-06-28 11:01:17 Established
freeix_as51530_1_ipv6 BGP --- up 2021-06-28 11:01:19 Established
freeix_as51530_2_ipv4 BGP --- up 2021-06-28 11:01:32 Established
freeix_as51530_2_ipv6 BGP --- up 2021-06-28 11:01:37 Established
Internet Exchanges
Having one configuration file per group helps a lot with integration of IXPManager where I might autogenerate the IXP versions of these files and install them periodically. That way, when members enable the AS112 peering checkmark, the servers will automatically download and set up those sessions without human involvement – typically this is the best way to avoid outages: never thinker with production config files by hand. I decide to test this out with FreeIX, but hope as well to offer our service to other internet exchanges, notably SwissIX and CIXP.
One of the huge benefits of operating within IP-Max network is their ability to do L2VPN transport from any place on-net to any other router. As such, connecting these virtual machines to other places, like SwissIX, CIXP, CHIX-CH, Community-IX or other further away places, is a piece of cake. All I must do is create an L2VPN and offer it to the hypervisor (which usually is connected via a LACP BundleEthernet) on some VLAN, after which I can bridge that into the guest OS by creating a new virtio NIC. This is how, in the example above, these AS112 machines were introduced to FreeIX. This scales very well, requiring only one guest reboot per internet exchange, and greatly simplifies operations.
Monitoring
Of course, one would not want to run a production service, certainly not on the public internet, without a bit of introspection and monitoring.
There are four things that I will want to ensure:
In a followup post, I’ll demonstrate how these things come together into a comprehensive anycast monitoring and alerting solution. As a fringe benefit I can then show contemporary graphs and dashboards. But seeing as the service hasn’t yet gotten a lot of mileage, it deserves its own followup post, some time in August.
The results
First things first - latency went waaaay down:
pim@squanchy:~$ traceroute blackhole-1.iana.org
traceroute to blackhole-1.iana.org (192.175.48.6), 30 hops max, 60 byte packets
1 chbtl1.ipng.ch (194.1.163.67) 0.257 ms 0.199 ms 0.159 ms
2 chgtg0.ipng.ch (194.1.163.19) 0.468 ms 0.430 ms 0.430 ms
3 chrma0.ipng.ch (194.1.163.8) 0.648 ms 0.611 ms 0.597 ms
4 blackhole-1.iana.org (192.175.48.6) 1.272 ms 1.236 ms 1.201 ms
pim@chumbucket:~$ dig -6 @prisoner.iana.org hostname.as112.net txt +short +norec +tcp
"AS112 hosted by IPng Networks" "Zurich, Switzerland"
"See https://www.as112.net/ for more information."
"See https://ipng.ch/ for local information."
"Unique IP: 46.20.246.67"
"Unique IP: [2a02:2528:1703::67]"
and this demonstrates why it’s super useful to have the hostname.as112.net entry populated well. If I’m in Amsterdam, I’ll be served by the local node there:
pim@pencilvester:~$ traceroute6 blackhole-2.iana.org
traceroute6 to blackhole-2.iana.org (2620:4f:8000::42), 64 hops max, 60 byte packets
1 nlams0.ipng.ch (2a02:898:146::1) 0.744 ms 0.879 ms 0.818 ms
2 blackhole-2.iana.org (2620:4f:8000::42) 1.104 ms 1.064 ms 1.035 ms
pim@pencilverster:~$ dig -4 @prisoner.iana.org hostname.as112.net txt +short +norec +tcp
"Hosted by IPng Networks" "Amsterdam, The Netherlands"
"See https://www.as112.net/ for more information."
"Unique IP: 94.142.241.187"
"Unique IP: [2a02:898:146::2]"
Of course, due to anycast, and me being in Zurich, I will be served primarily by the Zurich node. If it were to go down for maintenance, or hardware failure, BGP will immediately converge on alternate paths, there are currently three to choose from:
pim@chrma0:~$ show protocols bgp ipv4 unicast 192.31.196.0/24
BGP routing table entry for 192.31.196.0/24
Paths: (10 available, best #2, table default)
Advertised to non peer-group peers:
185.1.205.251 194.1.163.1 [...]
112
194.1.163.32 (metric 137) from 194.1.163.32 (194.1.163.32)
Origin IGP, localpref 400, valid, internal
Community: 8298:3500 8298:4099 8298:5055
Last update: Mon Jun 28 11:13:14 2021
112
185.1.205.251 from 185.1.205.251 (46.20.246.67)
Origin IGP, localpref 400, valid, external, bestpath-from-AS 112, best (Local Pref)
Community: 8298:3500 8298:4099 8298:5000 8298:5020 8298:5060
Last update: Mon Jun 28 11:00:45 2021
112
185.1.205.251 from 185.1.205.253 (185.1.205.253)
Origin IGP, localpref 200, valid, external
Community: 8298:1061
Last update: Mon Jun 28 11:00:20 2021
(and more)
I am expecting a few more direct paths to come, as I harden this service, and offer it to other swiss internet exchange points in the future. But mostly, my mission of reducing the round trip time from 146ms to 1ms from my desktop at home was successfully accomplished.
DevOps | SRE | Networking
1 年At edgeuno we have a couple of AS112 nodes on LATAM :D