Introduction to MQTT: Chapter 7 (Keep Alive & Client Takeover)
GARRETT SCHMIDT
Automating the Power Grid of the Future | Wireless & 4G/5G | Industrial Protocols | Master Tinkerer | Ghostbuster
A problem that can be encountered when conducting device-to-device communication is when one entity has transmission issues or crashes while the second device remains active. In the language of the Transmission Control Protocol (TCP) used by MQTT, this is known as a half-open connection. The problem of half-open networks is exacerbated in mobile networks where TCP sessions can appear to be open when, in fact, they are not.
In a half-open connection, the functioning side of the connection continues to send messages and waits for acknowledgment that they have been received. This can go on forever if the device is not notified of the failure at the other end of the connection. This is not an acceptable state of affairs in the type of dynamic networks for which MQTT was designed. For that reason, the Keep Alive feature was incorporated into the protocol.
What is the Keep Alive Function?
Simply put, MQTT’s keep alive function provides a method to address the issue of half-open connections. It does not alleviate the problem but enables a determination to be made regarding the connection’s state. Proper use of keep alive ensures that the connection between a broker and client is open and that the two entities are aware that they are connected.
Keep alive is defined in MQTT as the maximum time interval that can elapse between the point a client finishes transmitting a control packet and when it starts to send the next packet. The keep alive interval is a numeric value that indicates the number of seconds that a client can be dormant and still considered functioning correctly. Setting the value to zero deactivates keep alive functionality. The maximum keep alive interval that can be specified is 18 hours, 12 minutes, and 15 seconds.
The broker takes the keep alive interval and multiples it by 1.5 to determine the length of time that can elapse between messages from the given client, so a keep alive interval of 60 seconds will result in a gap of at most 90 seconds between messages emanating from a specific client. It is the client’s responsibility to ensure that the interval is not exceeded. If the client does not have a valid control packet to send, it sends a PINGREQ packet to inform the broker that it is still available and to verify that the broker is as well. The broker replies to the PINGREQ message by sending the client a PINGRESP message.
Brokers that do not receive, at a minimum, a PINGREQ message from a client in the calculated keep alive interval terminate the connection. At this point, the broker will send the client’s last will and testament message if it has been defined. Sending PINGREQ messages within the keep alive interval will maintain the connection between client and broker indefinitely, even if no other messages are generated.
Inside a Client Take-Over
Under normal circumstances, clients that disconnect from the broker attempt to reconnect. MQTT lets the broker perform a client take-over in which a half-open connection can be closed by the broker to allow the same client to establish a new connection. It’s a fail-safe to guard against devices falling off the network as the result of whatever issue caused the half-open connection.
Enabling brokers to engage in client take-overs can prove to be extremely important in maintaining communication in an MQTT network. The take-over ensures that the problem that led to a half-open connection will not impact the ability of the client to reestablish its conversation with the broker. It removes a potential point of failure from the network when clients cannot terminate the connection themselves.
An Example of Keep Alive in Action
Here is a simple example of the advantages that MQTT’s keep alive functionality contributes to maintaining network communication. A sensor connected to the network is expected to send a message when certain conditions, perhaps related to temperature, are met. The specific condition that triggers the message cannot be quantified and can occur at any time. This sensor is particularly important in maintaining the safety of the automated processes connected to the network.
Setting the keep alive interval to a small value ensures that the broker is constantly kept updated as to the health of the client’s connection. Either an informative message dictated by conditions will be sent, or a steady stream of PINGREQ messages will ensure that the client is connected. If the interval is exceeded, the broker will close the connection and send the last will and testament message. In the case of a critical network component like this sensor, this should include warnings or alerts so preventative measures can be taken.
Keep alive functionality is yet another reason that MQTT is an excellent choice for modern distributed networks and inter-device communication. With proper configuration, you can design a viable network implementation that can withstand instances of unreliable connectivity.