Use case on AWS SQS
Anudeep Nalla
Opensource Contributer | Platform Engineer | EX-NPCI | RHCA Level III | OpenShift | CEPH | CK{S,A,AD} | 3x Microsoft Certified | AWS CSA | Rancher | Nirmata | DevOps | Ansible | Jenkins | DevSecOps | Kyverno | Rook-Ceph
What is Amazon SQS?
Amazon Simple Queue Service (SQS) is a managed message queue service offered by Amazon Web Services (AWS). It provides an HTTP API over which applications can submit items into and read items out of a queue. The queue itself is fully managed by AWS, which makes SQS an easy solution for passing messages between different parts of software systems that run in the cloud.
How does SQS work?
SQS provides an API endpoint to submit messages and another endpoint to read messages from a queue. Each message can only be retrieved once, and you can have many clients submitting messages to and reading messages from a queue at the same time.
The messages that SQS handles can be unformatted strings, XML or JSON. Because SQS guarantees “exactly once” delivery, and because you can concurrently submit messages to and read messages from a given queue, SQS is a good option for integrating multiple independent systems.
You might well be asking: why use SQS if you can have an internal HTTP API for each service? While HTTP APIs are an accessible way to expose software systems to external users, it is not the most efficient mechanism when it comes to integrating purely internal systems. A messaging queue is more lightweight. SQS also handles things like automated retries, preserving queue state across multiple availability zones in AWS, and keeping track of expiration timeouts on all messages.
How does SQS integrate with other AWS services?
Most interesting for Serverless developers is SQS integration with Amazon Lambda: SQS can act as an AWS Lambda event source. When configured, every SQS message triggers a Lambda function run that processes a batch of SQS messages.
Another useful integration is with SNS: an SQS queue can subscriber to an SNS topic. We cover the differences between SQS and SNS below, in the When to use SQS vs. SNS section.
SQS also provides standard integrations for monitoring and debugging SQS queues using Amazon CloudWatch and AWS X-Ray.
Serverless developers can manually integrate an SQS queue with any other AWS service (or a third-party service) by writing code that uses the AWS SDK to submit messages to SQS and read them from there, or by using the SQS API directly.
The benefits of using SQS.
For Serverless developers, using SQS generally provides a wealth of benefits, which you can read about below.
Scalability Your SQS queues scale to the volume of messages you’re writing and reading. You don’t need to scale the queues; all the scaling and performance-at-scale aspects are taken care of by AWS.
Pay for what you use When using SQS, you only get charged for the messages you read and write (see the details in the Pricing section). There aren’t any recurring or base fees.
Ease of setup Since SQS is a managed service, so you don’t need to set up any infrastructure to start using SQS. You can simply use the API to read and write messages or use the SQS <-> Lambda integration.
Options for Standard and FIFO queues When creating an SQS queue, you can choose between a standard queue and a FIFO queue out of the box. Both queue types can be useful for different purposes.
Automatic deduplication for FIFO queues Deduplication is important when using queues, and for FIFO queues SQS will do the work to remove any duplicate messages for you. This makes FIFO queues on SQS suitable for tasks where it’s critical to have each task done exactly once.
A separate queue for unprocessed messages This features of SQS is useful for debugging. All messages that could not be processed are sent into a "dead-letter" queue where you can inspect them. This queue has all the usual integrations enabled, so you can subscribe to it using an AWS Lambda event, for example, to send a notification when an item cannot be processed.
Disadvantages of using SQS.
Using SQS can also create challenges for Serverless developers, as described hereafter.
High cost at scale with pay per use pricing, if the number of messages you send is very high, your SQS bill can be quite significant. Part of SQS pricing is data transfer charges, and those can add up if you send larger messages, or if you process messages from outside the main AWS region in which the queue is located. In some cases, when running at scale with millions of messages processed every day, the cost of using SQS might be higher than the cost of operating your own queue system, even including the overhead to manage your own solution.
Lack of support for broadcast messages with its “exactly once” delivery, SQS doesn’t support a way for multiple entities to retrieve the same message, making SQS not so useful for one-to-many broadcasts.
To address this, in the cases where one-to-many delivery is important, developers can use SQS alongside SNS.
Reduced control over performance When running a message queue system at scale, something you may well end up wanting to do is to fine-tune its performance to suit your needs. With SQS this is not an option: the service is fully managed, and you don’t get to look under the hood.
What is SQS used for?
The most common ways to use SQS, and of course other messaging systems, in cloud applications are:
Decoupling microservices. In a microservice architecture, messages represent one of the easiest ways to set up communication between different parts of the system. If your microservices run in AWS, and especially if those are Serverless services, SQS is a great choice for that aspect of the communication.
Sending tasks between different parts of your system. You don’t have to be running a microservices-oriented application to take advantage of SQS. You can also use it in any kind of application that needs to communicate tasks to other systems.
Distributing workloads from a central node to worker nodes. You can frequently find messaging systems in the flows of distributed large workloads like map-reduce operations. For these kinds of operations, it’s essential to be able to maintain a queue of all the tasks that need to be processed, efficiently distribute the tasks between the machines or functions doing the work and guarantee that every part of the work is only done once.
Scheduling batch jobs. SQS is a great option for scheduling batch jobs for two reasons. First, it maintains a durable queue of all the scheduled jobs, which means you do not need to keep track of the job status—you can rely on SQS to pass the jobs through and to handle any retries, should an execution fail, and your batch system returns the message to the queue. Second, it integrates with AWS Lambda; if you are using AWS Lambda to process the batch jobs, SQS automatically launches your Lambda functions once the data is available for them to process.
Amazon SQS limits
Amazon SQS has a few service limits that you should consider before using SQS in production:
Queue delay. Minimum 0 seconds, maximum 15 minutes. The built-in delay functionality in SQS queues exists to allow inserting a pause between when a message is produced and when it’s visible in the queue. If you need that delay being longer than 15 minutes, SQS can’t offer a delay that long, so you’ll need to implement the delay in the system producing the messages before it pushes them to SQS.
In-flight messages. SQS has a maximum of 120,000 messages (standard queue) or 20,000 messages (FIFO queue) that can be in flight: in other words, messages that have been received by a consumer but not yet removed from the queue. In most well-architected systems, it would be hard to reach this number of in-flight messages unless you are in an outage scenario. Before using SQS in production, determine if this could become an issue for you, especially in edge cases and possible error states for your application.
Message attributes. Each message can have a maximum of 10 metadata attributes. If you would like to have more metadata than that, consider including it in the message itself rather than as a field attached to the message.
Message content. Messages you submit to SQS queues can be in plain text (unformatted) or in JSON or XML format. Other formats are not supported.
Message retention. Minimum: 60 seconds. Maximum: 14 days. If you think you might need shorter or longer retention times, SQS might not work for you.
Maximum message size is 256KB. If you need to send larger messages, consider uploading the larger message to S3 and including a reference to the S3 object as part of the message.
Message visibility timeout. You might be wondering 'What is SQS visibility timeout?' The visibility timeout is a configurable time period during which SQS temporarily "hides" a message that has been received by a consumer to avoid its being processed by other consumers. After the visibility timeout expires, if the message hasn’t been removed from the queue by the consumer that received it, it becomes visible to other queue consumers. The default visibility timeout is 30 seconds, the minimum is 0 seconds, and the maximum is 12 hours.
Consider all these limits for your specific use case, and take future growth into account, especially if you are building a growth-oriented system.
Application Integration Using Queues and Messages
Building Blocks
In the digital world, even the most basic design of web-based systems requires the use of queues to integrate applications. SQS is a secure, serverless, durable, and highly available message queue service. It provides a simple REST API to create a queue as well as send, receive, and delete messages.
Producing Messages
Message producers are processes that call the SendMessage APIs of SQS. SQS supports two types of queues: standard and first-in-first-out (FIFO). Standard queues provide best effort ordering while FIFO provides first-in-first-out ordering. Messages can be sent either as single messages or in batches. Standard queues can support unlimited throughput by adding as many concurrent producers as needed, whereas FIFO queues can support up to 300 TPS without batching and 3000 TPS with batching.
In terms of message delivery, SQS standard supports “at-least-once” delivery, meaning that messages will be delivered at least once but occasionally, more than one copy of the message will be delivered. SQS FIFO provides “exactly once” delivery, meaning it can detect duplicate messages.
Consuming Messages
Message consumers are processes that make the ReceiveMessage API call on SQS. Messages from queues can be processed either in batch or as a single message at a time. Each approach has its advantages and disadvantages.
- Batch processing: This is where each message can be processed independently, and an error on a single message is not required to disrupt the entire batch. Batch processing provides the most throughput for processing, and also provides the most optimizations for resources involved in reading messages.
- Single message processing: Single message processing is commonly used in scenarios where each message may trigger multiple processes within the consumer. In case of errors, the retry is confined to the single message.
To maintain the order of processing, FIFO queues are typically consumed by a single process. Messages across message groups can still be processed in parallel. However, the overall throughput limit will still apply for FIFO queues.
SQS supports long and short polling on queues. Short polling will sample a subset of servers and will return the messages or provide an empty response if there are no messages in those servers. Long polling will query all servers and return immediately if there are messages. Long polling can reduce cost by reducing the number of calls in cases of empty responses. Get more details and sample code for sending and receiving messages.
It’s simple to poll for messages but it does have an overhead of running a process continuously. In some situations, it may be hard to monitor such processes and troubleshoot in case of errors. SQS integration with AWS Lambda takes the undifferentiated heavy lifting of polling logic into a Lambda agent.
Benefits:
Queue-based processing can protect backend systems like relational databases from unexpected surge in front-end traffic. You can decouple the processing logic by using an intermediate queue.
Consider a scenario that involves a web application to order products related to a TV show. It’s possible that the ordering and payment systems rely on a traditional relational database. During peak show seasons, queues can be used to control the traffic to the ordering and payment system. In the following diagram we show how the web application requests are staged in the queue, while the consumption of the users on the backend is based on the capacity of the database.
Error Handling
Asynchronous message processing presents unique challenges with error handling since they just manifest as log entries on the producer or consumer and create backlogs in processing. In order to handle errors elegantly, the first requirement is to address the nature of the error. In case the error is transient, it may help to retry after a brief delay and eventually move to a dead letter queue if repeated attempts fail. The other error scenario can be cases where the data itself is bad and the error won’t fix, even after repeated attempts. In such cases, consumer process needs to make this determination and needs to move the message to an error queue.
In some cases, due to its highly distributed architecture, Standard SQS can deliver a duplicate message, which it can result in duplicate key errors for the consumer. The best way to mitigate it is to make the consumers idempotent so that processing the same message multiple times produces the same result.