FreeIX Remote - Part 1
Introduction
Tier1 and aspiring Tier2 providers interconnect only in large metropolitan areas, due to commercial incentives and politics. They won’t often peer with smaller providers, because why peer with a potential customer? Due to this, it’s entirely likely that traffic between two parties in Thessaloniki is sent to Frankfurt or Milan and back.
One possible antidote to this is to connect to a local Internet Exchange point. Not all ISPs have access to large metropolitan datacenters where larger internet exchanges have a point of presence, and it doesn’t help that the datacenter operator is happy to charge a substantial amount of money each month, just for the privilege of having a passive fiber cross connect to the exchange. Many Internet Exchanges these days ask for per-month port costs and meter the traffic with policers and rate limiters, such that the total cost of peering starts to exceed what one might pay for transit, especially at low volumes, which further exacerbates the problem. Bah.
This is an unfortunate market effect (the race to the bottom), where transit providers are continuously lowering their prices to compete. And while transit providers can make up to some extent due to economies of scale, at some point they are mostly all of equal size, and thus the only thing that can flex is quality of service.
The benefit of using an Internet Exchange is to reduce the portion of an ISP’s (and CDN’s) traffic that must be delivered via their upstream transit providers, thereby reducing the average per-bit delivery cost and as well reducing the end to end latency as seen by their users or customers. Furthermore, the increased number of paths available through the IXP improves routing efficiency and fault-tolerance, and it avoids traffic going the scenic route to a large hub like Frankfurt, London, Amsterdam, Paris or Rome, if it could very well remain local.
IPng Networks really believes in an open and affordable Internet, and I would like to do my part in ensuring the internet stays accessible for smaller parties.
Sm?l IXPs
One notable problem with small exchanges, like for example [FNC-IX] in the Paris metro, or [CHIX-CH], [Community IX] and [Free-IX] in the Zurich metropolitan area, is that they are, well, small. They may be cheaper to connect to, in some cases even free, but they don’t have a sizable membership which means that there is inherently less traffic flowing, which in turn makes it less appealing for prospect members to connect to.
At IPng, I have partnered with a few super cool ISPs and carriers to offer a Free Internet Exchange platform. Just to head the main question off at the pass: Free here actually does mean “Free as in beer” or [Gratis], a gift to the community that does not cost money. It also more philosophically wants to be “Free as in open, and transparent” or [Libre].
Two examples are:
.. but there are actually quite a few out there once you start looking :)
Growing Sm?l IXPs
Some internet exchanges break through the magical 1Tbps barrier (and get a courtesy callout on Twitter from Dr. King), but many remain sm?l. Perhaps it’s time to break the chicken-and-egg problem. What if there was a way to interconnect these exchanges?
Let’s take for example the Free IX in Greece that was announced at GRNOG16 in Athens on April 19th. This exchange initially targets Athens and Thessaloniki, with 2x100G between the two cities. Members can connect to either site for the cost of only a cross connect. The 1G/10G/25G ports will be Gratis. But I will be connecting one very special member to Free IX Greece, AS50869:
Free IX: Remote
Here’s what I am going to build. The Free IX Remote project offers an outreach infrastructure which connects to internet exchange points, and allows members to benefit from that in the following way:
Members at smaller internet exchanges greatly benefit from this type of outreach, by receiving large portions of the public internet directly at their preferred peering location. Similarly, the Free IX Remote routers will carry their traffic to these remote internet exchanges.
Detailed Design
Peer types
There are two types of BGP neighbor adjacency:
BGP sessions with members use strict ingress filtering by means of bgpq4, and will be tagged with a set of informational BGP communities, such as where the prefix was learned, and what propagation permissions that it received (eg. at which internet exchanges will it be allowed to be announced). Of course, prefixes that are RPKI invalid will be dropped, while valid and unknown prefixes will be accepted. Members are granted permissions by FreeIX, which determine where their prefixes will be announced by AS50869. Further, members can perform optional actions by means of BGP communities at their ingress point, to inhibit announcements to a certain peer or at a given exchange point.
Peers on the other hand are not granted any permissions and all action BGP communities will be stripped on prefixes learned. Informational communities will still be tagged on learned prefixes. Two things happen here. Firstly, members will be offered only those prefixes for which they have permission – in other words, I will create a configuration file that says member AS8298 may receive prefixes learned from Frys-IX. Secondly, even for those prefixes that are advertised, the member AS8298 can use the informational communities to further filter what they accept from Free IX Remote AS50869.
BGP Classic Communities
Members are allowed to set the following legacy action BGP communities for coarse grained distribution of their prefixes through the FreeIX network.
领英推荐
Peers, on the other hand, are not allowed to set any communities, so all classic BGP communities from them are stripped on ingress.
BGP Large Communities
Free IX Remote will use three types of BGP Large Communities, which each serve a distinct purpose:
Regular peers of AS50869 at exchange points and private network interconnects will not be able to set any communities, so all large BGP communities from them are stripped on ingress.
Informational Communities
When FreeIX routers learn prefixes, they will annotate them with certain communities. For example, the router at Amsterdam NIKHEF (which is router #1, country #2), when learning a prefix at FrysIX (which is ixp #1152), will set the following BGP large communities:
When propagating these prefixes to neighbors (both members and peers), these informational communities can be used to determine local policy, for example by setting a different localpref or dropping prefixes from a certain location. Informational communities can be read, but they can’t be set by peers or members – they are always cleared by FreeIX routers when learning prefixes, and as such the only routers which will set them are the FreeIX ones.
Permission Communities
FreeIX maintains a list of permissions per member. When members announce their prefixes to FreeIX routers, these permissions communities are set. They determine what the member is allowed to do with FreeIX propagation - notably which routers, countries, and internet exchanges the member will be allowed to propagate to.
Usually, member prefixes are allowed to propagate everywhere, so the following communities might be set by the FreeIX router on ingress:
If the member prefixes are allowed to propagate only to certain places, the ’everywhere’ communities will not be set, and instead lists of communities with finer grained permissions can be used, for example:
Permission communities can’t be set by peers, nor by members – they are always cleared by FreeIX routers when learning prefixes, and are configured explicitly by FreeIX operators.
Action Communities
Based on the permission communities, zero or more egress routers, countries and internet exchanges are eligible to propagate member prefixes by AS50869 to its peers. Members can define very fine grained action communities to further tweak which prefixes propagate on which routers, in which countries and towards which internet exchanges and private network interconnects:
Four actions can be placed on a per-remote-asn basis:
Peers cannot set these actions, as all action communities will be stripped on ingress. Members can set these action communities on their sessions with FreeIX routers, however in some cases they may also be set by FreeIX operators when learning prefixes.
What’s next
Perhaps this interaction between informational, permission and action BGP communities gives you an idea on how such a network may operate. It’s somewhat different to a classic Transit provider, in that AS50869 will not carry a full table. It’ll merely provide a form of partial transit from member A at IXP #1, to and from all peers that can be found at IXPs #2-#N. Makes the mind boggle? Don’t worry, we’ll figure it out together :)
In an upcoming article I’ll detail the programming work that goes into implementing this complex peering policy in Bird2 as driving VPP routers (duh), with an IGP that is IPv4-less, because at this point, I [may as well] put my money where my mouth is.
If you’re interested in this kind of stuff, take a look at the IPng Networks AS8298 [Routing Policy]. Similar to that one, this one will use a combination of functional programming, templates, and clever expansions to make a customized per-member and per-peer configuration based on a YAML input file which dictates which member and which prefix is allowed to go where.
First, I need to get a replacement router for the Thessaloniki router, which will run VPP of course. My buddy Antonis noticed that there are CPU and/or DDR errors on that chassis, so it may need to be RMAd. But once it’s operational, I will start by deploying one instance in Amsterdam NIKHEF, and another in Thessaloniki Balkan Gate, with a 100G connection between them, graciously provided by [LANCOM]. Just look at that FD.io hound runnnnn!!1
Network Engineer @ Cloudflare
2 个月Great stuff! I've observed at least 1 IX currently offering a similar service in the wild. Unfortunately this actually led to a worse experience for our peers, drawing select traffic via another region despite local peering presence (manifesting essentially as a route leak), due to differential routing policies . It also adds a layer of complexity to any smart egress traffic engineering. I think for this to work, communities at the very least need to be industry standardised. Looking at the other IX offering this, I can see their informational communities are structured differently. Until then, I'd suggest not proactively bilating with peers on other fabrics unless you've explicitly informed them, so they can maintain any unicorn configurations as necessary.
Cyber Security, Networking & Datacenter. Infrastructure Leader.
2 个月Exciting. Following as I dive deeper into BGP, Routing, Communities etc etc. Officially entered the rabbit hole. ???
Independent Consultant - Telecommunications & Carrier
2 个月Do you allow remote peers via IP tunnel or anything?
Infrastructure Engineer
2 个月OK; your christmas project trumps mine?? Does a ‘member tagged inhibit community’ also apply in the reverse direction? So if M1 from IXP1 doesn’t want to be propagated to IXP4, does it suffice for M1 to attach the appropriate community? Or will this tag only work in the ‘Member-to-IXP’ direction?