Systems

Five Message Broker Patterns

I kept dropping names like Saga, CQRS, and Outbox in design reviews without being fully honest about which one solved what. A ByteByteGo infographic pushed me to stop faking it and draw each one from memory. These are the diagrams - and the use cases - that finally made them stick.

Rickvian Aldi·Software engineer·April 20, 2026·13 min read

I have been saying the words "we'll just fire an event" in design reviews for years. The words come out cleanly. The underlying picture in my head, if I am honest, has always been a little blurry. Is the event fired inside the database transaction, or after? Who consumes it? If the broker is down for thirty seconds, does anything actually break?

This week I stumbled on a ByteByteGo infographic that lined up five of these patterns side by side - Transactional Outbox, CQRS, CQRS with Event Sourcing, Saga, Competing Consumers - and I realized I could recite all of them and confidently describe maybe two.

So I did the thing that always works for me when I half-understand something. I sat down and drew each one until I could explain, in plain language, what it is, what it is for, and when I would actually reach for it. This article is that exercise, written out.

Why These Patterns Exist At All

Message brokers - Kafka, RabbitMQ, SQS, NATS, Pulsar, pick your favorite - are plumbing. They move messages from producers to consumers without either side having to know about the other. That is the whole value proposition, and it is a big one.

But the moment you put a broker in the middle of a system, new questions appear that a single-database design never had to answer:

What if my database commit succeeds but the broker publish fails?
What if two services need the same data but want completely different shapes of it?
What if a "transaction" has to span four services that each own their own database?
What if one producer emits faster than a single consumer can process?

Each of the five patterns below is a specific answer to one of those questions. None of them are new. All of them show up in real architectures under different names. What helped me was seeing them as answers to questions rather than as architectural vocabulary to namedrop.

1. Transactional Outbox - Stop Losing Events Between the DB and the Broker

The first time I shipped a service that fired events, my code looked something like this:

await db.orders.insert(newOrder);
await broker.publish("order.created", newOrder);

It works. Most of the time. The problem is the tiny window between those two lines. If the process crashes after the insert but before the publish, the row exists and the event does not. If the publish succeeds but the commit fails, the event exists for an order that was never created. There is no way to make both lines atomic, because the database and the broker do not share a transaction.

The fix is to stop trying to publish inside the request path at all. Write the event to an outbox table in the same database transaction as the business row. A separate relay process reads the outbox and publishes to the broker afterwards.

caption="Transactional Outbox - the broker publish is moved out of the request path and into a relay that reads from a table written in the same transaction as the business row."

Use case I actually hit. Order service writes an order, publishes order.created, and the email service sends a confirmation. Before the outbox: the broker had a thirty-second blip during a rolling restart and we dropped roughly two hundred confirmation emails. After the outbox: the blip became latency instead of loss - the events piled up in the table and the relay drained them once the broker came back.

The pattern already has its own deep-dive in the pattern library if you want the schema and the tradeoff between polling and CDC. The thing I want to flag here is cultural rather than technical: once a team has the outbox, await broker.publish(...) inside a request handler stops being acceptable. The guarantee the pattern gives you only holds if everybody uses it.

CQRS stands for Command Query Responsibility Segregation, which is a long way of saying: the thing you use to change data should not be the same thing you use to read data.

In a traditional app, one model and one database serve both. You write Order, you read Order, both through the same schema. That works until your reads get complicated enough that they start deforming your writes. You add denormalized columns to make a dashboard fast. You add indexes that slow down inserts. You cache. The write model becomes a compromise between two jobs.

CQRS separates them. Writes go through a command side that models business operations and lands them in a write-optimized store. Reads go through a query side that pulls from a store shaped for the reads your product actually does. The two sides stay synchronized asynchronously - usually via events flowing through the broker.

caption="CQRS - the command side owns state changes, the query side owns read shapes, and events on the broker keep the two eventually consistent."

Use case I actually hit. A product catalog where the write model was Postgres (authoritative, relational, transactional) and the read model was Elasticsearch (great at fuzzy product search, typo tolerance, faceted filters). Every catalog update was a write against Postgres. A stream of events replayed the updates into Elasticsearch. The write side never had to care about search. The read side never had to care about business rules.

The trap to name out loud: CQRS gives you eventual consistency between the two sides. A user who updates a product and immediately lists products may see the old record. For most domains that is fine. For domains where it is not fine - a seller who needs to confirm their price change took effect - you need to either read-after-write from the write side, or make the query-side update synchronous, which defeats most of the point.

3. CQRS With Event Sourcing - Store What Happened, Not Just What Is

Regular CQRS keeps the current state on the write side. Event Sourcing goes one step further: it throws away the idea of storing the current state at all. What you store is the sequence of events that produced the state. Current state is a function of replaying those events.

Instead of a row saying "order 42: status=shipped, total=$99," you have an append-only log of events: OrderCreated, OrderPaid, OrderShipped. The current state is whatever you get by folding those events in order. Your read models are projections over the same event log.

caption="CQRS with Event Sourcing - the event log is the source of truth. Read models are projections that can be rebuilt from the log at any time."

Use case where it clearly earns its keep. A banking ledger. You do not want to store "current balance" as a number that someone can UPDATE. You want an append-only log of every debit and credit, and the balance is whatever that log folds to. When an auditor shows up and asks "why was this account at $412.73 on March 2nd?", you have an answer more precise than any state table can give you.

When I would not reach for it. A CRUD app. A settings screen. A user profile. The value of event sourcing scales with how much you care about the history of the data, not the current value. If nobody will ever ask "how did we get here?", storing events is a lot of operational weight for benefits you will never consume.

The other honest caveat: event sourcing changes how you think about schema migrations, projections, and deployment. You need to be able to rebuild read models from the log. Events become part of your versioned API to your future self. I have only seen it go well in teams that deliberately invested in the tooling for it.

4. Saga - Distributed Transactions Without a Distributed Transaction

Sagas answer a specific question: how do you keep a multi-step business operation consistent when each step runs in a different service with its own database?

Take a checkout: reserve inventory, charge the customer, arrange shipping, send a confirmation. In a monolith this is one database transaction. In a microservice architecture, each step is a different service with its own database and no shared transaction. If the charge succeeds but shipping fails, you have a customer who paid for something that will never arrive - and no automatic rollback.

A saga decomposes the business operation into a sequence of local transactions, each of which emits an event that triggers the next step. If any step fails, compensating transactions run in reverse to undo the completed steps. There is no global lock; there is a protocol.

caption="Saga (choreography style) - each service completes its local transaction and emits an event. Failure at any step triggers compensation events running in reverse."

There are two flavors of saga. Choreography (shown above) has each service listen for events and decide what to do next - no central coordinator. Orchestration has a dedicated saga orchestrator that issues commands to each service and owns the state machine. Choreography is simpler to deploy but harder to observe. Orchestration is easier to reason about but introduces a coupling point. My rule of thumb: fewer than four steps, choreography. Four or more, orchestration, because the debuggability is worth it.

Use case I actually hit. An e-commerce checkout with five services. We started choreographed. By step four we could not answer the question "where did saga X get stuck?" without stitching logs from five systems by hand. We moved to an orchestrator with a state table. The code got slightly more boring and the on-call pages got dramatically shorter.

The pattern library has a longer saga entry with both styles in code. The thing worth repeating here: sagas give you eventual consistency, not atomicity. For a moment, the order exists and the payment does not. Your product, your dashboards, and your customer service team need to understand that.

5. Competing Consumers - One Queue, Many Workers

This is the most mechanically simple pattern of the five, and the one I reach for most often. It is also the one that needs the least explanation, so I will keep this short.

Producers drop messages onto a queue. Multiple consumer instances pull from the same queue. Each message is delivered to exactly one consumer. Load balances itself - faster consumers pull more, slower ones pull less. Add more consumers to go faster. Remove some to save cost.

caption="Competing Consumers - a single queue, many worker instances. The broker load-balances messages across whichever consumers are currently pulling."

Use case I actually hit. Image processing pipeline. Users upload photos, a queue holds "process this photo" jobs, a fleet of worker containers pulls from the queue. On a slow day, three workers. On a campaign spike, forty. The queue absorbs the bursts so the upload endpoint never blocks.

Two things I wish I had internalized earlier. First, messages must be processable in any order - the broker load-balances across workers, so ordering per-message is not preserved across the fleet. If you need per-entity ordering (all events for user 42 processed in order), use a partitioned queue keyed on entity id, where each partition is its own competing-consumers pool of one. Second, consumers must be idempotent. Queues almost universally give at-least-once delivery. A worker that crashes after processing but before acknowledging will see the same message again. If running the handler twice produces the wrong answer, your pattern is broken - not the broker.

How I Decide Which One I Actually Need

Drawing them side by side made the decision tree obvious in a way that reading the names never did:

I need to atomically record a state change and notify others? → Outbox. Almost always. It is the default, not the exotic choice.
My reads and writes have genuinely different shapes and my write model is suffering for it? → CQRS. Accept eventual consistency on the read side.
The history of changes is a first-class product concern (audit, debugging, rebuildable projections)? → CQRS with Event Sourcing. Pay the tooling cost deliberately.
A business operation spans multiple services and must either all succeed or all compensate? → Saga. Orchestrated if it has more than three steps.
I have a stream of independent tasks and I want to scale processing horizontally? → Competing Consumers. Bake in idempotency from day one.

Most real systems use more than one of these at the same time. The checkout flow I mentioned earlier uses four of the five: the outbox to publish events atomically with the write, a saga to coordinate the flow, competing consumers on each step to scale, and a small CQRS slice for the order-lookup read model the customer service team uses. None of them felt clever in isolation. Together they made the system boring, which is the highest compliment I can give a distributed design.

The Quiet Lesson

The thing that surprised me most, drawing these out, is how much of "distributed systems design" is not really about algorithms or consensus or exotic infrastructure. It is about naming the kind of inconsistency you are willing to tolerate - across a transaction boundary, across two databases, across five services - and then picking the pattern that turns that inconsistency into something your system recovers from on its own.

The broker is the plumbing. These patterns are the plumbing conventions that keep the water going the right direction when one of the pipes shakes.

If you only take one thing from this article: the next time you write await broker.publish(...) directly after a database write, pause. That line is a micro-decision about consistency, availability, and what you owe the next engineer who has to debug this at 3am. The patterns above are the menu of better answers.

distributed-systems messaging patterns event-driven architecture

Related essays

Systems

The Pattern Worth Paying For

Idempotency is the single most underrated contract in distributed systems - and ignoring it is how you end up charging customers twice at 3am. testing

Apr 10, 2026·8 min read

Systems

Why Engineers Are Obsessed With P99

If you only watch the average, you are watching the wrong number. P99 is where the money leaks, where the outages start, and where your users quietly decide to leave. testing

Apr 20, 2026·12 min read

Systems

Meta-Stable Failure: When Your System Is Up But Completely Down

The most dangerous distributed systems failures are the ones where everything looks fine, until it doesn't. Here's the failure mode that buries on-call engineers. testing

Apr 19, 2026·8 min read

Related essays

Get essays in your inbox