A guide to diagnosing network issues using MTR
Priyanka Shyam
Network Geek with a robust skill set | CCDE (Written) | CCIE | CWNA | Cisco SCOR | Cisco SD-WAN Expert | Technical Writer | Multitasker | Considerate & Empathic Communicator
In my previous article, I discussed the two very important diagnostic monitoring tools of ping and traceroute. However, one tool that offers extra features is MTR (formerly known as Matt's Traceroute; now My Traceroute).
With MTR, administrators can diagnose and isolate network errors and report network status to upstream providers.?In MTR, the Ping and traceroute utilities are combined to provide a robust tool for troubleshooting. Since MTR combines both Ping and Traceroute tools, we will look at these two utilities individually first (I have already mentioned these two utilities in my previous article) and see if they can be used for troubleshooting.
The most common tool for testing network connectivity is ping. The sender sends ICMP echo request packets (ICMP type 8 code 0) to the receiver, and the receiver replies with ICMP echo reply packets (ICMP type 0 code 0) if it is available.
ICMP packets are used to test contention and traffic between two points on the Internet by Networking diagnostic tools such as ping, traceroute, and MTR. In order to ping a host on the Internet, a user sends ICMP packets. The host sends packets in response. As a result, the client of the user can calculate the round trip time between two points on the Internet.
Note that ping is not always accurate - a firewall may be in the path between the sender and the receiver, filtering ICMP packets. Thus, a host is not unavailable just because it does not respond to ICMP.
Depending on your operating system, ping works differently. By default, ping sends four packets and ends by itself on Windows OS. In Unix-based systems and MacOS, the ping will run until you stop it (using CTRL+C). With the -t option, you can also run a continuous ping on Windows OS.
Traceroute
Ping and traceroute work differently. Unlike ping, traceroute tells you the path between the sender and receiver. You can use this especially if you have administrative control over the entire network.
On Unix-based systems and MacOS, Traceroute sends UDP packets from the sender to the destination. Traceroute uses ICMP echo requests on Windows systems. The command to invoke traceroute on Windows is "tracert" whereas on most other operating systems it is "traceroute".
MTR
Let's now examine what MTR is now that we've seen the two utilities that make it up. Unlike Ping and Traceroute, which are enabled by default on most systems, MTR may require installation.
In the same way that you run ping and traceroute, you run MTR by using the mtr command followed by the destination address.
You get real-time connectivity information when you run MTR, since it continuously polls the destination (and devices in the path). By pressing CTRL+C or the Q key, you can stop it at any time.
Let's mtr at 8.8.8.8
According to the output above, MTR combines ping (RTT and packet loss) with traceroute (devices in the path). On your network, you can determine the following:
You know there is connectivity between source and destination if the MTR successfully reaches the destination. However, if it cannot reach the destination, it does not mean there is no connectivity - there could be something blocking the path. Later, we'll discuss other options.
Packet Loss:
If there are too many packet losses between the source and destination, you may need to further troubleshoot. There can be packet loss along the path between source and destination as some devices may be rate limiting (ICMP rate limiting or filtering) packets used by ping, traceroute, and MTR.
In general, ICMP Rate Limitation is configured to prevent DDOS attacks.?A built-in Deniel-Of-Service protection mechanism limits the number of transmitted ICMP packets out an interface.?Due to the destination router silently discarding the second packet, one ICMP destination unreachable message is sent every 500 milliseconds (1/2 second).
Round-Trip Time:
Your link may be malfunctioning if packets are taking too long to get from source to destination. The distance between source and destination could also be quite large.
Report Mode
The default interactive mode of MTR can result in a large number of packets being sent continuously, which can have a negative impact on network performance. Thus, we can run MTR in "report" mode, which sends 10 packets by default to each device and displays the network statistics:
8.8.8.8 mtr -report
You enable report mode by using two hyphens (-) followed by report, i.e. -report. This report was generated using mtr --report 8.8.8 This uses the?report?option, which sends 10 packets to the IP address 8.8.8.8?and generates a report. MTR will run continuously in an interactive environment without the --report option. Each host's round trip time is reflected in the interactive mode. The --report mode provides sufficient data in a useful format in most cases.
A hop is represented by a numbered line in the report. To reach their destination, packets pass through hops. Reverse DNS lookups determine the names of the hosts (e.g. a72-247-36-1.deploy.stati and xe2-3-0.hh-sjc5-a.netarch in the example). MTR provides valuable statistics regarding the longevity of the connection in the seven columns following the path packets travel between servers. Each hop's loss percentage is shown in the Loss% column. Packets sent are counted in the Snt column. With the --report option, you will send 10 packets, unless you specify --report-cycles=[number-of-packets], where [number-of-packets] is the number of packets you want to send.
The next four columns?measure latency in milliseconds (e.g. ms): Last, Avg, Best, and Worst.?Last?is the latency of the last packet sent,?Avg?is the average latency of all packets, while Best and Worst display the best (shortest) and worst (longest) round trip times. Most of the time, you should focus on the average (Avg) column.
Each host's standard deviation is shown in the last column, StDev. There is a greater difference between latency measurements when the standard deviation is higher. The standard deviation allows you to determine whether the mean (average) provided represents the true center of the data set, or if it has been skewed due to a phenomenon or measurement error. Inconsistent latency measurements are indicated by a high standard deviation. Averages of the latencies of the 10 packets sent appear normal, but may not represent the data accurately. Take a look at the best and worst latency measurements if the standard deviation is high to make sure the average is a good representation of the true latency.
Increase Test Speed
When you run MTR, it will use reverse DNS to resolve IP addresses to hostnames by default. If you are not interested in DNS or do not use DNS on your network, this can slow down your troubleshooting process. With -n or -no-dns, we can disable DNS resolution.
mtr -r -n 8.8.8.8
MTR also sends successive packets every second. When a network is operating normally, this may be fine. During congestion, packets usually arrive faster. To simulate a congested network, we can use the -i option (or -interval) to specify how often MTR should send packets:
mtr -r -i 0.1 4.2.2.2
In the path between the source and some devices, I now notice some packet loss with a shorter interval between packets and sending 50 packets.
Analyze the MTR reports
Verify the packet loss
Two things should be considered when analyzing MTR output: loss and latency. There may be a problem with that particular router if you see a percentage of loss at a particular hop. It is common practice among some service providers to rate limit MTR's ICMP traffic. Consequently, it may appear that packets are lost when they are not. Take a look at the subsequent hop to determine if the loss you're experiencing is real or due to rate limiting. You are likely seeing ICMP rate limiting rather than actual loss on that hop if that hop shows a loss of 0.0%:
The loss reported between hops 1 and 2 is likely due to rate limiting on the second hop. The remaining eight hops all touch the second hop, but no packets are lost. The loss may be caused by packet loss or routing issues if it persists for more than one hop. Rate limiting and loss can occur simultaneously. To determine the actual loss, take the lowest percentage of loss in the sequence:
领英推荐
Between hops 2 and 3 and between hops 3 and 4, there is a 60% loss. No subsequent host reports zero traffic loss, so you can assume the third and fourth hops are losing some traffic. However, several of the final hops only experience 40% loss due to rate limiting. Always trust the reports from later hops when different loss amounts are reported.
Problems with the return route can also explain some loss. It is not uncommon for packets to reach their destination without error, but they have trouble returning. If you have an issue, it is often best to collect MTR reports in both directions.
A route's latency may also be affected by the connection quality.?A high latency is shown in the following MTR report:
Network Latency
You will also be able to assess the latency of a connection between your host and the target host with MTR.?The number of hops in a route always increases latency. It is important, however, that the increases are consistent and linear. It is unfortunate that latency is often relative and is highly dependent on both the quality of the host's connection and the physical distance between them.
Between hops 3 and 4, latency jumps significantly and remains high. Considering that round trip times remain high after the fourth hop, we can assume, based on the data, that the latency might be caused by a poorly configured router or a congested link.
It is unfortunate that high latency does not always indicate a problem with the current routing. Reports like the one above indicate that traffic is still reaching the destination host and returning to the source host despite some sort of issue with the 4th hop. The return route could also cause latency. Your MTR report will not show the return route, and packets can take completely different routes to and from a destination.
There is a large jump in latency between hosts 3 and 4, but the latency does not increase unusually in subsequent hops. As a result, we can assume that the 4th router is malfunctioning.
Like packet loss, ICMP rate limiting can also create the appearance of latency:
Initially, the latency between hops 4 and 5 stands out. The latency, however, drops dramatically after the fifth hop. Here, we measured a latency of 40 milliseconds. As a result, MTR draws attention to an issue that does not affect the service in such cases. The latency to the final hop should be considered when evaluating an MTR report.
An incorrect configuration of the destination host's network
An incorrectly configured router appears to cause a 100% loss to the destination host in the next example. It appears that the packets are not reaching the host, but this is not true.
Traffic reaches the destination host. In spite of this, the MTR report shows loss because the destination host is not responding. The loss can be caused by improperly configured networking or firewall (iptables) rules that drop ICMP packets. If the hop shows 100% loss, it is a misconfigured host. MTR does not attempt additional hops based on previous reports. Without a baseline measurement, it is difficult to isolate this issue, but these types of errors are quite common.
Routers for residential or business use
Reports from residential gateways can sometimes be misleading:
100% loss at the second hop does not indicate a problem. On subsequent hops, there is no loss
Incorrectly configured ISP router
Your packets may never reach their destination if a router on the route your packet takes is incorrectly configured:
When there is no additional route information, the question marks appear. Poorly configured routers can send packets repeatedly. The following example illustrates that:
According to these reports, the router at hop 4 is not configured correctly. In these situations, the only way to resolve the issue is to contact the source host's network administrator.
ICMP Rate Limiting
The purpose of ICMP rate limiting is generally to prevent DDoS attacks.?A built-in Deniel-Of-Service protection mechanism limits the rate of ICMP packets transmitted out of an interface.?Since the destination router silently discards the second packet, 1 in 3 requests from the destination appear as a timeout because the default value is one unreachable message per 500 milliseconds (1/2 second).
There can be apparent packet loss caused by ICMP rate limiting. ICMP limiting causes packet loss to one hop that does not persist to subsequent hops. The following example illustrates this:
The following points should be kept in mind:
Techniques for advanced MTR
Newer versions of MTR can now run in TCP mode on a specified TCP port instead of using ICMP (ping) by default. In most cases, this mode should not be used because TCP reports can be misleading. The TCP MTR uses SYN packets instead of ICMP pings, and most internet-level routers won't respond, erroneously reporting loss.
The purpose of a TCP test is to determine whether firewall rules on a router somewhere are blocking a protocol or port, perhaps due to improper port forwarding settings. A TCP test over a certain port would reveal this more clearly than an ICMP test.
MTR vs traceroute: what's the difference?
The traceroute command (tracert in Windows) prints the route packets taken in a TCP/IP network.
Three UDP packets with a TTL of 1 are sent by the command traceroute hostname. Upon arriving at the closest router, the TTL value is decreased by one, making it 0. Traceroute notes the IP address of the router that sends back the 3 ICMP packets with a TTL value of 0 when it notices a packet with TTL value 0. After calculating the time to receive each packet, it sends out three more UDP packets with a TTL value of 2.
MTR, however, combines the functionality of 'traceroute' and 'ping'. The MTR program investigates the network connection between the host on which it runs and the host on which it is running, as soon as it starts. It sends ICMP ECHO requests to each machine after determining the address of each network hop to determine the quality of the link between them. MTTR uses ICMP Time Exceeded (type 11) packets returning from routers, or ICMP Echo Reply packets once they reach their destination. The process prints out running statistics about each machine.
What makes mtr faster than traceroute
A primary reason for this is the way traceroute runs. UDP (or ICMP on Windows) packets are sent to the first host with TTL of one, and when the host replies with a timeout (or passes an internal timeout), the next packet is generated with a TTL of two, and so on. For each host, the traceroute's total time includes sending and receiving packets sequentially.
The MTR sends all the ICMP ECHO packets in parallel once it determines the path the packets will take.
By contrast, tools like traceroute and MTR send ICMP packets with incrementally increasing TTLs to view the route or series of hops between origin and destination. TTL, or time to live, controls how many hops a packet must make before it dies. MTR assembles the route that traffic takes between hosts on the Internet by sending packets and watching them return after one hop, two hops, and three hops.
Advantage of MTR over ping or traceroute
In comparison to ping or traceroute, MTR shows exactly where packet loss occurs in the route to the destination host. In addition to showing the loss percentage for each host, it gives us valuable insight into which specific provider is experiencing a problem with their network. Moreover, since MTR uses ICMP ECHO requests, it will go through routers that block UDP traffic. MTR may work where traceroute does not.
Engineer | Pushing bugs to Production
1 周Great Blog Priyanka Shyam every aspect of MTR explained in simplified manner helpful.
SRE
4 个月what does it mean when the nimber of hops are fluctuating between 1 and max number of hops for some time and recovers?
??? Engineer & Manufacturer ?? | Internet Bonding routers to Video Servers | Network equipment production | ISP Independent IP address provider | Customized Packet level Encryption & Security ?? | On-premises Cloud ?
1 年It's no surprise that network diagnostics are essential to keeping a network running smoothly. I'm interested in hearing how others are using traceroute and ping to troubleshoot issues. Do you have any unique techniques that you've found to be especially effective? Additionally, I wonder what tools or processes network engineers are using to proactively monitor network performance?
IT Systems Engineer at Zscaler | Ex- DXC Technology| (DM for a Referral)
2 年++
On mission to deliver affordable DDoS protection
2 年Great content. Love it. Just checked few more your articles and they're excellent.