Diagnosing Network Issues with MTR
Priyanka Shyam
Network Geek with a robust skill set | CCDE (Written) | CCIE | CWNA | Cisco SCOR | Cisco SD-WAN Expert | Technical Writer | Multitasker | Considerate & Empathic Communicator
In my previous article, I discussed the two very important diagnostic monitoring tools of ping and traceroute. However, one tool that offers extra features is MTR (formerly known as Matt's Traceroute; now My Traceroute).
With MTR, administrators can diagnose and isolate network errors and report network status to upstream providers.?In MTR, the Ping and traceroute utilities are combined to provide a robust tool for troubleshooting. Since MTR combines both Ping and Traceroute tools, we will look at these two utilities individually first (I have already mentioned these two utilities in my previous article) and see if they can be used for troubleshooting.
The most common tool for testing network connectivity is ping. The sender sends ICMP echo request packets (ICMP type 8 code 0) to the receiver, and the receiver replies with ICMP echo reply packets (ICMP type 0 code 0) if it is available.
ICMP packets are used to test contention and traffic between two points on the Internet by Networking diagnostic tools such as ping, traceroute, and MTR. In order to ping a host on the Internet, a user sends ICMP packets. The host sends packets in response. As a result, the client of the user can calculate the round trip time between two points on the Internet.
Note that ping is not always accurate - a firewall may be in the path between the sender and the receiver, filtering ICMP packets. Thus, a host is not unavailable just because it does not respond to ICMP.
Depending on your operating system, ping works differently. By default, ping sends four packets and ends by itself on Windows OS. In Unix-based systems and MacOS, the ping will run until you stop it (using CTRL+C). With the -t option, you can also run a continuous ping on Windows OS.
Traceroute
Ping and traceroute work differently. Unlike ping, traceroute tells you the path between the sender and receiver. You can use this especially if you have administrative control over the entire network.
On Unix-based systems and MacOS, Traceroute sends UDP packets from the sender to the destination. Traceroute uses ICMP echo requests on Windows systems. The command to invoke traceroute on Windows is "tracert" whereas on most other operating systems it is "traceroute".
MTR
Let's now examine what MTR is now that we've seen the two utilities that make it up. Unlike Ping and Traceroute, which are enabled by default on most systems, MTR may require installation.
In the same way that you run ping and traceroute, you run MTR by using the mtr command followed by the destination address.
You get real-time connectivity information when you run MTR, since it continuously polls the destination (and devices in the path). By pressing CTRL+C or the Q key, you can stop it at any time.
Let's mtr at 8.8.8.8
As you can see from the output above, MTR combines ping (RTT and packet loss) with traceroute (devices in the path between sender and receiver). Using this information, you can determine the following on your network:
Connectivity to destination device : If the MTR successfully gets to the destination, then you know there is connectivity between source and destination. However, if it is unable to reach the destination, it does NOT mean there is no connectivity – there could be something in the path blocking traceroute. We will talk about other options around this later.
Packet Loss: The packet loss column tells you about the quality of the link between source and destination – too many packet losses and you may need to troubleshoot further. Sometimes, packet loss along the path between source and destination is common as some devices may be rate limiting?(ICMP rate limiting or filtering) packets used by ping/traceroute/mtr.
Please be noted that ICMP Rate?limiting?is generally configured to prevent DDOS.?This is due to a built-in Deniel-Of-Service protection mechanism, to limit the?rate?of transmitted?ICMP?packets out an Interface.?The default value is one?ICMP?destination unreachable message per 500 milliseconds ( 1/2 second), this would be why 1 in 3 response from the destination appears as a timeout, since the destination router silently discards the second packet.
Round-Trip Time: If it’s taking too long for packets to go from source to destination, there may be something wrong with the quality of your link. It could also be that the distance between source and destination is quite large as well.
Report Mode
Running MTR in the default interactive mode can result in a lot of packets being sent continuously which may affect network performance. Therefore, we can run MTR in the “report” mode where 10 packets are sent by default to each device and the report of the network statistics is shown to us:
mtr —report 8.8.8.8
Note: You enable report mode using two hyphens (-) followed by report i.e. –report. Or you can simply use the -r option.The report was generated with?mtr --report 8.8.8.8 This uses the?report?option, which sends 10 packets to the IP address 8.8.8.8?and generates a report. Without the?--report?option,?mtr?will run continuously in an interactive environment (I have explained that above). The interactive mode reflects current round trip times to each host. In most cases, the?--report?mode provides sufficient data in a useful format.
Each numbered line in the report represents a?hop. Hops are the Internet nodes that packets pass through to get to their destination. The names for the hosts (e.g. a72-247-36-1.deploy.stati and xe2-3-0.hh-sjc5-a.netarch in the example) are determined by reverse DNS lookups. Beyond simply seeing the path between servers that packets take to reach their host, MTR provides valuable statistics regarding the durability of that connection in the seven columns that follow. The?Loss%?column shows the percentage of packet loss at each hop. The?Snt?column counts the number of packets sent. The?--report?option will send 10 packets unless specified with?--report-cycles=[number-of-packets], where?[number-of-packets]?represents the total number of packets that you want to send to the remote host.
The next four columns?Last,?Avg,?Best, and?Wrst?are all measurements of latency in milliseconds (e.g.?ms).?Last?is the latency of the last packet sent,?Avg?is average latency of all packets, while?Best?and?Wrst?display the best (shortest) and worst (longest) round trip time for a packet to this host. In most cases, the average (Avg) column should be the focus of your attention.
The final column,?StDev, provides the standard deviation of the latencies to each host. The higher the standard deviation, the greater the difference is between measurements of latency. Standard deviation allows you to assess if the mean (average) provided represents the true center of the data set, or has been skewed by some sort of phenomena or measurement error. If the standard deviation is high, the latency measurements were inconsistent. After averaging the latencies of the 10 packets sent, the average looks normal but may in fact not represent the data very well. If the standard deviation is high, take a look at the best and worst latency measurements to make sure the average is a good representation of the actual latency and not the result of too much fluctuation.
Increase test speed
By default, when you run MTR, it will try to use reverse DNS to resolve IP addresses to hostnames. This can slow down your troubleshooting process especially if you are not interested in DNS or using DNS on your network. We can disable the DNS resolution process using the -n or –no-dns option.
mtr -r -n 8.8.8.8
Also, MTR sends successive packets every 1 second. This may be fine when a network is operating normally. However, during network congestion, packets usually arrive at a faster rate. We can use the -i option (or –interval) to specify how often we want MTR to send packets, thus simulating a congested network:
mtr -r -i 0.1 4.2.2.2
Notice that with a shorter interval between packets and sending 50 packets, I now see some packet loss between the source and some devices in the path.
Analyze MTR Reports
Verify Packet Loss
When analyzing MTR output, you are looking for two things: loss and latency. If you see a percentage of loss at any particular hop, that may be an indication that there is a problem with that particular router. However, it is common practice among some service providers to rate limit the ICMP traffic that MTR uses. This can give the illusion of packet loss when there is in fact no loss. To determine if the loss you’re seeing is real or due to rate limiting, take a look at the subsequent hop. If that hop shows a loss of 0.0%, then you are likely seeing ICMP rate limiting and not actual loss:
In this case, the loss reported between hops 1 and 2 is likely due to rate limiting on the second hop. Although traffic to the remaining eight hops all touch the second hop, there is no packet loss. If the loss continues for more than one hop, than it is possible that there is some packet loss or routing issues. Remember that rate limiting and loss can happen concurrently. In this case, take the lowest percentage of loss in a sequence as the actual loss:
In this case, there is 60% loss between hops 2 and 3 as well as between hops 3 and 4. You can assume that the third and fourth hop is likely losing some amount of traffic because no subsequent host reports zero loss. However, some of the loss is due to rate limiting as several of the final hops are only experiencing 40% loss. When different amounts of loss are reported, always trust the reports from later hops.
领英推荐
Some loss can also be explained by problems in the return route. Packets will reach their destination without error, but have a hard time making the return trip. For this reason it is often best to collect MTR reports in both directions when you’re experiencing an issue.
The connection quality may also affect the amount of latency you experience for a particular route.?The following MTR report shows a high latency:
Network Latency
MTR will also help assess the latency of a connection between your host and the target host.?Latency always increases with the number of hops in a route. However, the increases should be consistent and linear. Unfortunately, latency is often relative and very dependent on the quality of both host’s connections and their physical distance.
The amount of latency jumps significantly between hops 3 and 4 and remains high. This may point to a network latency issue as round trip times remain high after the fourth hop.By looking at this we can assume that the latency might be because of a poorly configured router, or a congested link are frequent causes.
Unfortunately, high latency does not always mean a problem with the current route. A report like the one above means that despite some sort of issue with the 4th hop, traffic is still reaching the destination host?and?returning to the source host. Latency could be caused by a problem with the return route as well. The return route will not be seen in your MTR report, and packets can take completely different routes to and from a particular destination
In the above example, while there is a large jump in latency between hosts 3 and 4 the latency does not increase unusually in any subsequent hops. From this it is logical to assume that there is some issue with the 4th router.
ICMP rate limiting can also create the appearance of latency, similar to the way that it can create the appearance of packet loss:
At first glance, the latency between hops 4 and 5 draws attention. However after the fifth hop, the latency drops drastically. The actual latency measured here is about 40ms. In cases like this, MTR draws attention to an issue which does not affect the service. Consider the latency to the final hop when evaluating an MTR report.
Destination Host Networking Improperly Configured
In the next example, it appears that there is 100% loss to the destination host because of an incorrectly configured router. At first glance it appears that the packets are not reaching the host but this is not the case.
The traffic does reach the destination host. However, the MTR report shows loss because the destination host is not sending a reply. This may be the result of improperly configured networking or firewall (iptables) rules that cause the host to drop ICMP packets.The way you can tell that the loss is due to a misconfigured host is to look at the hop which shows 100% loss. From previous reports, you see that this is the final hop and that MTR does not try additional hops. While it is difficult to isolate this issue without a baseline measurement, these kinds of errors are quite common.
Residential or Business Router
Residential gateways sometimes cause misleading reports:
The 100% loss reported at the second hop does not indicate that there is a problem. You can see that there is no loss on subsequent hops
An ISP Router Is Not Configured Properly
Sometimes a router on the route your packet takes is incorrectly configured and your packets may never reach their destination:
The question marks appear when there is no additional route information. Sometimes, a poorly configured router will send packets in a loop. You can see that in the following example:
These reports show that the router at hop 4 is not properly configured. When these situations occur, the only way to resolve the issue is to contact the network administrator’s team of operators at the source host.
ICMP Rate Limiting
ICMP Rate?limiting?is generally configured to prevent DDOS.?This is due to a built-in Deniel-Of-Service protection mechanism, to limit the?rate?of transmitted?ICMP?packets out an Interface.?The default value is one?ICMP?destination unreachable message per 500 milliseconds ( 1/2 second), this would be why 1 in 3 response from the destination appears as a timeout, since the destination router silently discards the second packet.
ICMP rate limiting can cause apparent packet loss. When there is packet loss to one hop that doesn’t persist to subsequent hops, the loss is caused by ICMP limiting. See the following example:
Few Important Point we should note
Advanced MTR techniques
Newer versions of MTR are now capable of running in TCP mode on a specified TCP port, compared to the default use of the ICMP (ping) protocol. However, in most cases?this mode shouldn’t be used?as TCP reports can be misleading in diagnosing inter-route issues. A TCP MTR will use SYN packets in place of ICMP pings, and most internet-level routers will not respond to these, erroneously indicating loss.
What a TCP test is useful for is determining whether firewall rules on a router somewhere are blocking a protocol or port, perhaps because port forwarding has not been configured properly. Running a TCP test over a certain port could more clearly reveal this whereas an ICMP test may not.
What is the difference between mtr and traceroute
Traceroute (tracert in Windows) prints the route which the packets takes in a TCP/IP network on their way to destination.
The command traceroute hostname sends three UDP packets having a TTL value of 1. On arrival of the packets at the closest router, the router decreases the TTL value by one, thus making it 0. When a packet with TTL value 0 is noticed by the router, it responds by sending an ICMP packet “time exceeded” (Type 11 Code 0) as “time to live exceeded in transit.” The IP address of the router that sends back the 3 ICMP packets is noted by the traceroute utility. It will then calculate the time to receive each of the packets and then sends out three more UDP packets, this time with a TTL value of 2.
On the other hand, mtr combines the functionalities of the ‘traceroute’ and ‘ping’ utilities. When mtr starts, it investigates the network connection between the host in which it runs, and a user-specified destination host. After determining the address of each network hop between these machines, it sends out a sequence of ICMP ECHO requests to each machine to check the quality of the link to each of them. mtr uses ICMP Time Exceeded (type 11) packets returning from routers, or ICMP Echo Reply packets when the packets have hit their destination host. Running statistics about each machine is printed out as the process is being run.
Why mtr is faster than traceroute
The primary reason is the way traceroute runs. It sends a UDP (or ICMP on windows) packet with a TTL of one to the first host, and when it receives a timeout reply (or it passes an internal timeout), it then generates the next packet for the next host with a TTL of two, and so on (adding one to the TTL for each host). So traceroute's total time includes the sending and receiving of packets for each host, sequentially,.
mtr, after determining the path the packets take, sends all of the ICMP ECHO packets in parallel.
In contrast, tools such as traceroute and MTR send ICMP packets with incrementally increasing TTLs in order to view the route or series of hops that the packet makes between the origin and its destination. The TTL, or time to live, controls how many hops a packet will make before “dying” and returning to the host. By sending a series of packets and causing them to return after one hop, then two, then three, MTR is able to assemble the route that traffic takes between hosts on the Internet.
Advantage of MTR over ping or traceroute
The real advantage of mtr over ping or traceroute is, it shows where exactly the packet loss is happening in the route to the destination host in realtime. It shows the loss percentage on each hosts, which can give us valuable information on which specific provider is having a network issue. Also, since mtr is using ICMP ECHO requests, it will go through the routers which have blocked udp packets. So mtr may work where traceroute is not working.
Customer Success @ Celona | Ex-Ruckus | Enterprise Wireless | Learning and Working on 5G LAN | Private Cellular LAN
5 年Wow great. Will definitely use this mtr next.
Member of Technical staff [Team lead] Wireless | Automation | Security
5 年Ultimate one. Must read.