Explaining sch_cake's statistics

Explaining sch_cake's statistics

One universal complaint I have with nearly the entire fq_codel'd and cake'd userbase nowadays is the lack of statistics collection. While some graphing tools exist, few post results, and we had sunk ages of time into coming up with valuable and useful statistics on how the link was behaving in the cake implementation in particular. But we haven't explained them all that well before, so perhaps the onus is more on us as to describe their value to the user. So here's a quick run through of what they mean, and perhaps you'll be inspired to take a look at your own stats.

root@turris:~# tc -s qdisc show dev eth2

qdisc cake 8010: root refcnt 9 bandwidth 35Mbit diffserv3 triple-isolate nat nowash ack-filter split-gso rtt 100.0ms raw overhead 0

... I went into what these options can mean over here: https://forum.mikrotik.com/viewtopic.php?t=179307

?Sent 597930110 bytes 2985748 pkt (dropped 958, overlimits 1327048 requeues 10) ?backlog 0b 0p requeues 10?memory used: 140800b of 4Mb

... every drop generally represents saving a latency excursion measured in 100s of milliseconds, lasting for potentially many minutes. This particular link is not highly loaded at the moment. If there is a persistent backlog, you might consider looking harder at your traffic.

?capacity estimate: 35Mbit

... This shaper is configured for 35Mbit.

?min/max network layer size:??????????42 /???1514

?min/max overhead-adjusted size:??????42 /???1514

?average network hdr offset:??????????14

... Doing framing right is especially important on DSL, PPPoe and cable

... As for the below.. This instance is configured to take advantage of the most common diffserv markings and is a superset of the venerated wondershaper tool. Remarkably, some traffic on this network (somewhere) is actually trying to mark some packets appropriately!

??????????????????Bulk?Best Effort???????Voice

?thresh??????2187Kbit??????35Mbit????8750Kbit

... bulk is limited to a minimum of 5% of the bandwidth. Voice is the most common set of diffserv marks for voice (some cell phones do use this), and in this case is gross overkill (64Kbit is the limit for most voice), so it is hard to exceed this figure. Our hope was that more videoconferencing traffic would mark appropriately. Best effort is where everything else goes.

?target?????????8.3ms???????5.0ms???????5.0ms

... this is the "codel target" for queuing latency. It is a target, not a fixed figure. At really low rates (below 4Mbit), cake autoscales the target to account for the largest packet possible.

?interval?????103.3ms?????100.0ms?????100.0ms

... Interval is an assumption of the max RTT on the path. Now we get into more detailed stats:

?pk_delay????????11us????????24us???????1.3ms

?av_delay?????????3us?????????5us????????90us

?sp_delay?????????3us?????????2us?????????2us

?backlog???????????0b??????????0b??????????0

... peak Delay measures the impact of recent bursts on the system. Average delay is that. Sparse delay is probably the most important stat out of these in that if your sparse packets are getting delayed you have a really large workload on the system. All these are EWMAs and in order to make sense of them need to be sampled and plotted every few seconds.

... a persistent backlog is not an error, but a sign you have one or more long-running flows, like a backup,or bittorrent. If this is really big and stays that way, you might have a unresponsive flow on the network.

?pkts????????????7720?????2965829???????13157

?bytes????????3919616???589592983?????4665886

... just bytes and packets. Seeing stuff actually fall into these classes indicates you are using them. There are tools to reclassify certain kinds of traffic into these tins like https://forum.openwrt.org/t/qosify-new-package-for-dscp-marking-cake/111789/ - I note that I just prefer to slam cake on an interface first, get it configured properly, and then, maybe, worry about further classification.

?way_inds???????????0???????41685???????????0

?way_miss???????????3??????181466?????????281

?way_cols???????????0???????????0???????????0

... these are statistics on how well the 8 way set associativity of cake is working. A lot of way_cols means you have a LOT of different kinds of traffic flowing through and most likely a persistent backlog. It's really rare to see way_cols except under a sophisticated DDOS. Another subtle point is with big numbers here, the fair queuing portion of cake is doing it's job, much, much better than a FIFO ever could.

?drops??????????????0?????????125???????????0

... We helped save on 125 latency excursions (bufferbloat) on this cake instance thus far. Not a lot, but I've only been running it for a few hours with just me as a workload!

?marks??????????????0???????????0???????????0

... No ECN enabled transports are enabled on this link. We kind of expect to see this number be bigger in the future if L4S is rolled out.

?ack_drop???????????0?????????833???????????0

... This cake instance is configured to drop extra TCP acks under pressure. This helps increasingly more on asymmetric links with Down/up ratios worse than 10x1.

?sp_flows???????????1???????????2???????????1

?bk_flows???????????0???????????1???????????0

?un_flows???????????0???????????0???????????0

... This is the currently active number of flows that meet each catagory cake tracks.

?max_len?????????1514????????6056????????1514

... The 6056 figure indicates this router does GSO/GRO - bulking up sequential packets into one big packet. While GSO and GRO save on CPU, the big packets really can hurt interflow latencies, so cake splits them back up into individual packets.

?quantum??????????300????????1068?????????300

... At low bandwidths, a smaller quantum interleaves packets better, but costs CPU. A 300 byte quantum costs 6x through a loop than a 1500 byte quantum. Cake has a heirustic to set this that we could possibly increase to larger than a MTU but we haven't got around to that.

I hope y'all find this useful, and check your stats when your network is behaving badly... AND when it's behaving well!

Frantisek Borsik

"If you do not take risks for your ideas you are nothing. Nothing." N.N.T. | #LibreQoS & #bufferbloat :-) PS: Bandwidth is a lie!

1 年

ha!

回复
Dave Taht

@dtaht:matrix.org - Truly speeding up the Net, one smart ISP at a time

2 年

Trick question, does anyone know why there's a spike at T+30 in this plot?

要查看或添加评论,请登录

Dave Taht的更多文章

  • Verifying tcp correctness with xplot.org

    Verifying tcp correctness with xplot.org

    So few take packet captures nowadays. Detailed knowledge of how the underlying protocols of the internet actually work…

  • fixing videoconferencing lag (part 1)

    fixing videoconferencing lag (part 1)

    I really wish more folk understood how the bufferbloat epidemic messes up voip and videoconferencing, and how easy it…

  • Why you can't share your home network

    Why you can't share your home network

    Seeing the FCC recommend strategies like schedule various users of your network's time kind of irked me. Really, if you…

    3 条评论
  • The first sax in space

    The first sax in space

    Not a lot of people know the story Ron McNair and of the first Sax in space. It's a good read.

  • Some congestion experienced

    Some congestion experienced

    I'm just going to avoid writing about all the controversy and debate generated by the SCE vs a vs L4S proposals, and…

  • LBIP - Load Bearing Internet People

    LBIP - Load Bearing Internet People

    ESR - author of the core books on "Open Source" - has come up with a new concept for trying to put a floor under more…

  • Bufferbloat & Beyond

    Bufferbloat & Beyond

    Toke H?iland-J?rgensen's seminal PHD thesis on how to speed up the edges of the Internet is now available both i n…

    3 条评论
  • Comcast Innovation fund call for grant proposals

    Comcast Innovation fund call for grant proposals

    I am very happy to have had a portion of the work on make-wifi-fast and fixing #bufferbloat funded this year by the…

    1 条评论
  • Covering costs on the FCC vs Wifi lockdown issue

    Covering costs on the FCC vs Wifi lockdown issue

    I am trying to at least cover the basic costs of the original outreach we did on our recent letter to the FCC. Details…

社区洞察

其他会员也浏览了