ScalabilityMessaging

Competing Consumers

A single worker cannot keep up with a bursty or growing stream of background work, but processing that work inline in the request path blocks users and exposes the application to load spikes it has no way to shed.

Rickvian Aldi·Software engineer·6 min read

Problem

Many features produce work that can be done eventually but should not be done synchronously: image processing, email sending, invoice generation, ML feature computation, report rendering. Running these inside the HTTP request that triggered them ties user-perceived latency to whichever background task happens to be slow that day, and collapses under load when hundreds of users hit the same endpoint simultaneously.

Moving the work to a single background worker helps - the request returns instantly and the worker processes in the background - but shifts the bottleneck downstream. One worker has a fixed throughput ceiling. A traffic spike that produces ten thousand jobs in a minute will take the worker hours to drain. During that time the queue grows unboundedly, users wait longer and longer for their image to process, and the application owner has no lever to pull except "deploy more." The single-worker design does not scale with demand; it only delays the collapse.

Forces

Throughput must scale with demand. A fixed-capacity worker cannot absorb spikes that exceed its processing rate. Something must grow horizontally.
Job cost is non-uniform. Some jobs finish in 50ms; some take 30 seconds. A design that routes work to a specific worker (affinity, sharding) will underuse fast workers while slow workers accumulate backlog.
Workers fail. Instances crash, get deployed over, run out of memory. A design that loses in-flight work when a worker dies is unacceptable for anything business-critical.
Delivery semantics matter. Brokers that guarantee delivery must do so even across worker crashes, which means messages get redelivered. Workers therefore cannot assume they see each message exactly once.

Solution

Producers drop messages onto a single queue. Multiple consumer instances pull from that same queue concurrently. The broker hands each message to exactly one consumer, but which consumer gets which message is determined dynamically - whichever worker is currently free pulls the next message. Faster workers naturally pull more; slower ones pull less. Throughput scales by adding consumer instances.

// Producer - drops jobs onto the queue
await sqs.sendMessage({
  QueueUrl: IMAGE_QUEUE,
  MessageBody: JSON.stringify({ jobId, imageUrl, ops }),
});
 
// Consumer - many instances of this run in parallel
async function workerLoop() {
  while (running) {
    const { Messages } = await sqs.receiveMessage({
      QueueUrl: IMAGE_QUEUE,
      MaxNumberOfMessages: 10,
      WaitTimeSeconds: 20, // long polling
    });
 
    await Promise.all(
      (Messages ?? []).map(async (msg) => {
        const job = JSON.parse(msg.Body);
        try {
          await processImage(job);
          await sqs.deleteMessage({ QueueUrl: IMAGE_QUEUE, ReceiptHandle: msg.ReceiptHandle });
        } catch (err) {
          // Leave the message; it becomes visible again and another worker retries.
          log.error({ err, jobId: job.jobId }, 'processing failed');
        }
      })
    );
  }
}

The broker handles the hard part: exclusive delivery (only one consumer sees a message at a time), visibility timeouts (if a consumer doesn't ack in N seconds, the message becomes available again for a different consumer), and backpressure (slow consumers naturally pull less work).

Competing consumers is the smallest scalability pattern that actually scales. Add more workers, go faster. Remove workers, save money. The queue absorbs the mismatch.

Idempotency is mandatory. Queues almost universally provide at-least-once delivery. A worker that processes a message and then crashes before acknowledging will see that message again - possibly on a different worker. If running the handler twice produces the wrong answer (double-charging a card, double-sending an email), the pattern is broken. Pair this pattern with the Idempotency Key pattern so duplicates are safe.

Ordering is not preserved across the fleet. Because any worker may pull any message, two messages about the same entity may be processed out of order or in parallel on different workers. If you need per-entity ordering (all events for user 42 processed in order), use a partitioned queue keyed on entity id - Kafka consumer groups, SQS FIFO with message groups, RabbitMQ consistent-hash exchange. Each partition is its own competing-consumers pool, typically of one, which trades horizontal scale for ordering within a partition.

Tune the consumer count. The right number of consumers is determined by the external resources the job touches - database connections, third-party rate limits, CPU per job - not by the queue. Scaling consumers past what downstream systems can absorb simply moves the bottleneck to a worse place.

Dead-letter queues. Messages that fail repeatedly should move to a DLQ after N attempts. Without this, a poison message can consume a worker's capacity indefinitely as it fails, times out, and is redelivered forever.

When NOT to Use

Strict total ordering is required. If every message must be processed in the exact global order it was produced - financial tape reconstruction, serialized event logs - competing consumers is the wrong pattern. Use a single consumer or a partitioned design with one consumer per partition.
Jobs are sub-millisecond and extremely high volume. The broker round-trip overhead (network latency + ack) is often larger than the job itself. Batch them into coarser work items or process them synchronously.
Consumers cannot be made idempotent. If the downstream operation is fundamentally non-idempotent and cannot be made so, at-least-once delivery will cause duplicates the business cannot tolerate. Either move to a broker with exactly-once semantics (Kafka EOS within its constraints) or redesign the operation.
The queue replaces a trivial fire-and-forget. Not every async task needs a broker. An in-process job runner with a persistent table is often enough for low-volume, single-server workloads. Reach for competing consumers when horizontal scaling of the workers is a real requirement.

Outbox is the reliable source of the events that feed the queue - a producer that loses events before they reach the queue defeats the entire fleet behind it. Idempotency Key makes at-least-once delivery safe at the consumer: every job carries an identifier, each consumer records keys it has processed, and duplicates become no-ops. In production systems, the three patterns almost always appear together - Outbox to publish, queue with competing consumers to scale, Idempotency Key to tolerate the redeliveries that the queue inevitably produces.

References

Hohpe, G. and Woolf, B. Enterprise Integration Patterns. Addison-Wesley, 2003. "Competing Consumers" pattern.
AWS documentation. "Amazon SQS best practices." docs.aws.amazon.com/AWSSimpleQueueService
Kreps, Jay. "The Log: What every software engineer should know about real-time data's unifying abstraction." engineering.linkedin.com, 2013.
Kleppmann, Martin. Designing Data-Intensive Applications. O'Reilly, 2017. Chapter 11 (Stream Processing).

messaging scalability workers queues throughput

Related patterns

Idempotency Key

Retrying a failed API request can trigger duplicate side effects - charging a card twice, creating two accounts, or sending the same email multiple times.

apisidempotencyreliabilitypaymentsdistributed-systems

Transactional Outbox

Events published directly inside a database transaction can be lost if the broker is unavailable, leaving the database and downstream consumers permanently out of sync.

messagingdistributed-systemsconsistencyevent-driven

Related patterns

Get essays in your inbox