Systems

The Pattern Worth Paying For

Idempotency is the single most underrated contract in distributed systems - and ignoring it is how you end up charging customers twice at 3am.

Rickvian Aldi·Software engineer·April 10, 2026·8 min read

There is a class of distributed systems bug that appears only under load, only in production, and always at the worst possible moment. It looks like a race condition. It shows up in your Stripe dashboard as a duplicate charge. Your on-call engineer escalates it at 3am, and by the time the postmortem is written, three engineers have burned a weekend and a customer has filed a chargeback.

The pattern that prevents it has a name. It is called idempotency. And despite being the foundational primitive for safe distributed operations, it is consistently the last thing teams think about when designing new endpoints.

This essay is about why idempotency matters, what it actually costs to implement correctly, and when the pattern is worth every bit of the investment.

What Idempotency Actually Means

An operation is idempotent if applying it more than once produces the same result as applying it once. This sounds obvious until you try to design a payment API. Charging a card is not naturally idempotent - if you call POST /charges twice with the same payload, you get two charges. Networks time out. Clients retry. Servers crash mid-write. The gap between "the server received the request" and "the client received the acknowledgment" is where money disappears.

The classic solution is the idempotency key - a client-generated token, typically a UUID, that the server uses to deduplicate requests within a time window.

// Client sends this header with every mutating request
const charge = await stripe.charges.create({
  amount: 2000,
  currency: "usd",
  source: "tok_visa",
}, {
  idempotencyKey: "order_abc123_attempt_1",
});

If the network drops and the client retries, Stripe detects the same key and returns the original response rather than creating a new charge. The client cannot distinguish a first execution from a replay - which is exactly the point.

But the simplicity of the interface hides considerable complexity on the server side.

The Three Layers of Idempotency

Implementing idempotency correctly requires thinking at three distinct layers, and most teams only address one or two of them.

Layer 1: The deduplication store. You need somewhere to record that you've seen a key and what the result was. This is usually a fast key-value store - Redis, DynamoDB, or a dedicated column in your Postgres transactions table. The entry must be written atomically with the underlying operation, or you have a new race condition: two concurrent requests with the same key, both checking before either has written.

async function createCharge(
  idempotencyKey: string,
  params: ChargeParams
): Promise<Charge> {
  // Atomic check-and-set using a database transaction
  return await db.transaction(async (tx) => {
    const existing = await tx.idempotencyKeys.findUnique({
      where: { key: idempotencyKey },
    });
 
    if (existing) {
      // Return the stored result - not a new execution
      return JSON.parse(existing.responseBody) as Charge;
    }
 
    // Execute the actual operation
    const charge = await processCharge(params);
 
    // Record the key and result atomically
    await tx.idempotencyKeys.create({
      data: {
        key: idempotencyKey,
        responseBody: JSON.stringify(charge),
        expiresAt: addDays(new Date(), 7),
      },
    });
 
    return charge;
  });
}

Layer 2: Request fingerprinting. Idempotency keys only work if you tie them to a specific request shape. If a client sends the same key with different parameters - say, the same idempotencyKey but a different amount - you have to decide: honor the key and ignore the parameter mismatch, or reject the request with a 422? Stripe rejects. That is the correct answer. A mismatched fingerprint means the client has a bug, and surfacing that error immediately is better than silently ignoring the mismatch.

Layer 3: Side effects beyond your database. Your charge endpoint probably sends an email confirmation, increments a counter in a metrics system, and triggers a webhook. None of those systems have your idempotency guarantee. You need to ensure that downstream side effects are either naturally idempotent (incrementing a monotonic counter with a version check) or wrapped in their own deduplication logic. Idempotency at the API layer does not automatically propagate to async workers.

Everything fails, all the time. Design for it.

When the Pattern Pays Off

Idempotency is not free. The deduplication store adds a round trip. The fingerprinting logic adds complexity. The key expiration policy adds a new class of edge case - what happens when a key expires and the client retries six days later?

So when is it worth the cost?

The answer is almost always "yes" for any operation that changes state and could be retried by an upstream caller. In practice, this means:

Payment and billing operations
Order creation
Message sends (email, SMS, push notifications)
Provisioning actions (create VM, allocate resource)
Any operation that touches external systems you do not control

The operations where idempotency is less critical are read operations (which are already naturally idempotent) and internal operations that are naturally safe to repeat. But even there, building the habit of asking "what happens if this runs twice?" is worth the mental overhead.

The Subtle Failure Mode Nobody Talks About

There is a failure mode that trips up even experienced engineers: partial completion.

Your charge endpoint writes to the database, then calls the payment processor, then sends the confirmation email. The payment processor call succeeds, but the database write fails on the way back (a transient network partition, a rollback triggered by a constraint violation). The customer's card has been charged. Your database does not know about it. When the client retries with the same idempotency key, your deduplication store has nothing - and you charge the card again.

The solution is to write the idempotency key entry before you start executing the operation, not after. Set its status to pending, execute the operation, then update the status to completed with the stored result. If the system crashes mid-execution, subsequent attempts will find the pending entry and can either wait (if the operation might still complete) or attempt recovery.

This is the pattern Stripe describes in their engineering blog. It is sometimes called the "idempotency receipt" pattern, and it is the detail that separates robust implementations from ones that fail under concurrent retries.

What This Means for Your API Design

If you are designing a new service today, these are the decisions that matter:

Use client-generated keys, not server-generated. Servers cannot create idempotency keys because they cannot know whether a request is a retry. Only the client knows that. The client generates a UUID, stores it locally, and sends it with every retry attempt until it receives a definitive response.

Be explicit about key scope. Does your key apply across all operations or just a specific endpoint? Stripe scopes keys to the calling API key, which prevents one integration from colliding with another. Define your scope explicitly in your documentation.

Choose a reasonable expiration window. Seven days is a common choice. Long enough to cover any realistic retry window, short enough that your deduplication store does not grow unbounded. Paginate your key cleanup job.

Communicate idempotency status to callers. Return a header like Idempotent-Replayed: true when you return a cached response. This helps clients distinguish "the operation ran" from "the operation ran and I'm giving you the result from last time." It also helps your debugging story: a duplicate alert with Idempotent-Replayed: true is a client bug, not a server bug.

The Real Cost of Not Implementing It

The cost of skipping idempotency is not just duplicate charges. It is the cumulative technical debt that accumulates when you build the wrong primitive into your foundation. Every endpoint you add without idempotency is an endpoint that needs to be retrofitted when you start handling retries. Every integration that assumes single delivery is an integration that will behave incorrectly under network partitions.

The engineers who understand this build retry logic correctly from day one. Their systems degrade gracefully. Their on-call rotations are quieter. Their postmortems are shorter.

Idempotency is not glamorous. It does not make it into architecture talks at conferences. But it is one of the clearest examples of a pattern that pays for itself many times over - in the operational incidents it prevents, the customer trust it preserves, and the engineering time it does not consume.

The pattern worth paying for is the one you never have to think about again.

If this resonated, the follow-up is about exactly-once delivery - what it means, what it costs, and why distributed systems practitioners generally settle for at-least-once with idempotency instead.

distributed-systems architecture patterns reliability idempotency

Related essays

Systems

Five Message Broker Patterns

I kept dropping names like Saga, CQRS, and Outbox in design reviews without being fully honest about which one solved what. A ByteByteGo infographic pushed me to stop faking it and draw each one from memory. These are the diagrams - and the use cases - that finally made them stick. testing

Apr 20, 2026·13 min read

Systems

Why Engineers Are Obsessed With P99

If you only watch the average, you are watching the wrong number. P99 is where the money leaks, where the outages start, and where your users quietly decide to leave. testing

Apr 20, 2026·12 min read

Systems

Meta-Stable Failure: When Your System Is Up But Completely Down

The most dangerous distributed systems failures are the ones where everything looks fine, until it doesn't. Here's the failure mode that buries on-call engineers. testing

Apr 19, 2026·8 min read

Related essays

Get essays in your inbox