ASIC's vs CPU Routing

As leaps and bounds are made on the technological front, and as the Linux kernel becomes more efficient, people are looking towards Linux based routers to help lower costs. The idea of an open source operating system, with (for the most part) open source software running on it which can be fine tuned and hacked to your preference is highly appealing. After all, no matter how much API's and automation that Cisco, Juniper, Fortinet, or any of the other tier 1 vendors provide us, they will never give us full control over how things operate. With this said, more and more people are facing bottleneck's when attempting to route traffic through these highly efficient, extremely powerful servers, and they are stumped at why.

The way that the TCP protocol works is quite interesting, and it is the main reason why Linux based routers fail. The very first packet that is sent out, is a type of hello, saying that I would like to request something from you, and as any fruitful conversation would go, you await an acknowledgement before you continue with your next statement. This is pretty much how TCP works. You first request a packet, and the other end replies with a response, when the other end receives an acknowledgement (that the packet they sent was received and not corrupted) they then go on to send you the next packet, and so on and so forth. This process is very taxing on the CPU, because enterprise routers usually handle several million packets per second, and each packet requires at least twice the amount of operations to get this done (because each packet received, requires an ACK sent back to the sender, for the next packet to come). The CPU, which is an all purpose processor, can handle the load to a certain point. In general, it requires around 1 Mhz of processing power to handle 1 Mbps of traffic. Great! We have multi-processor, multi-core systems scaling up to 128 cores, and in some cases 256 cores. Simple math says that if we have a minimum processor speed of 2000 Mhz, each core should be able to handle 2 Gbps of traffic, and 2 Gbps of traffic multiplied by 256 cores, that equals 512 Gbps of traffic on a single server, awesome! Unfortunately that's not how it works.

Without going into too much technical details, certain processes (like the TCP/IP stack) cannot be distributed over multiple cores efficiently. Usually what happens is there will be a particular CPU or core which is handling a really daunting task, which hits 100% and then we hit a bottle neck. When the CPU hits 100%, two things happen

1) We can't push anymore TCP packets through the server because the CPU which is handling the SYN/ACK process is busy and cannot respond

2) Even the packets which are being processed by the CPU itself will start seeing drops, because of the small latency windows that modern applications require. When packets are dropped, re-transmissions happen which puts even more pressure on the already burdened CPU.

So how do modern routers like the Cisco ASR9K, or the Juniper MX series routers push terabits of throughput without breaking a sweat? They use ASIC's. ASIC is an abbreviation for Application Specific Integrated Circuit. The keyboard being specific. Each "ASIC" has 1 task only. Anything connected to my communication port (in this case Ethernet, or SFP(+), or QSFP(+)) is handled by me. I have nothing to do in life except examine the packets coming in and out of this communication port and make routing or switching decisions. This specialization makes the process extremely fast, and extremely efficient. So how do we go about combining the power of the CPU for things such as array processing, and string processing, and arithmetic, with the packet pushing power of ASIC's? Modern routers and switches have added regular CPU's into the mix, and given them the task of handling thing's such as NAT's, ACL's, prefix-lists, route-maps, etc, while leaving the packet pushing to the ASIC's.

Only after understanding this predicament can one truly make any progress in solving the major problems that we are facing in the cost vs. performance arm wrestle. With this in mind, I have done some intensive research and written a project plan for a router which fully utilizes the highly efficient Linux kernel, and can push several millions of packets per second resulting in 80Gbps+ throughput on modern computers. It is possible now, to turn a modern computer into a PPP server which can handle 100,000+ subscribers, push 80Gbps+ of bandwidth, and not cost more than 15,000$. It is also possible to turn your modern computer into a deep packet inspection engine which can easily outperform many of the prohibitively expensive appliances, with open source software.

If you are interested in discussing this project more in-depth, please reach out to me, and we can discuss in more detail what's left to do, and how to take these concepts from POC's into production.

Ihab Al-Refai

Technology Strategic Planning Specialist at LPTIC

4 年

But i have a question if you don't mind, in cisco's sup720 the pfc or dfc is the one who is implementing ACLs and the prefix-list, i thought PFC/DFC are types of ASICs and tcams are they?

回复
Ihab Al-Refai

Technology Strategic Planning Specialist at LPTIC

4 年

Clear and easy to understand tnx for share it

回复
Hadi Arnaout

IT Specialist, ISP Consultant

4 年

Nice job Maher hope to meet u soon to get more info about the project

回复

要查看或添加评论,请登录

Maher Kassem的更多文章

社区洞察

其他会员也浏览了