~/blog/Idempotency-Is-Your-Systems-Seatbelt
Published on

Idempotency Is Your System’s Seatbelt: Building Reliable APIs & Job Queues

1251 words7 min read–––
Views
Authors

The Bug Everyone Ships Once

It’s 2:07 AM.

A customer is trying to pay for something on their phone. They tap Pay.

Your backend does what it always does:

  1. Create a payment intent
  2. Call the payment provider
  3. Persist the result
  4. Return a 200

Except tonight there’s a tiny plot twist: the user’s network is having a moment.

The request takes just long enough that the client times out. The UI spins, feels stuck, and the user does the most human thing imaginable:

They tap Pay again.

Now you’ve got two requests. Two jobs. Two callbacks.

And a morning waiting for you where “Why did you charge me twice?” becomes the only meeting on your calendar.

This is the universe reminding you of a core truth:

Networks are unreliable. Retries are inevitable. Duplicates are normal.

Idempotency is the pattern that turns that truth into a non-event.


What “Idempotent” Means (the only definition that matters)

In math, an operation is idempotent if applying it multiple times has the same effect as applying it once.

In backend engineering, we translate that into something more operational:

An idempotent API/job can be safely retried without causing duplicate side effects.

Side effects are the expensive, irreversible-ish things:

  • charging a card
  • sending an email
  • provisioning infrastructure
  • creating a record that should be unique
  • publishing an event that triggers downstream work

Most reads are naturally safe. Writes are where idempotency earns its paycheck.


Why Duplicates Happen (even if your code is “correct”)

The sneaky part is that retries don’t only come from “a user clicked twice.”

Retries appear anywhere there’s a layer that says “maybe that didn’t go through, let me try again”:

  • browsers and mobile clients (timeout, refresh, background/foreground)
  • reverse proxies (retry on upstream failure)
  • load balancers
  • SDKs with retry policies
  • message queues (at-least-once delivery)
  • serverless platforms (retry on error)

So if your write path isn’t idempotent, you’re implicitly claiming:

“I trust every layer in my stack to never retry.”

That’s not a strategy. That’s a wish.


Two Flavors of Idempotency

1) Natural idempotency (make the operation inherently safe)

If your endpoint sets state, it’s often naturally idempotent.

Example:

  • PUT /users/123/email with { "email": "a@b.com" }

Run it once, run it ten times — the final state is the same.

2) Synthetic idempotency (dedupe using a key)

If your endpoint creates side effects, you usually need synthetic idempotency.

Example:

  • POST /payments creates a new payment each time by default.

So you introduce an Idempotency-Key (or equivalent) to say:

“These requests represent the same user intent. Treat them as one.”

This is the Stripe-style approach, and it’s the workhorse for “create” endpoints.


A Practical Design: Idempotency Keys for Payments

Let’s design the endpoint you wish you had before that 2 AM incident.

You want:

  • First request: do the work, return result.
  • Duplicate request (same intent): return the same result without charging again.

The contract

  • Client sends a unique key per user-intent (one checkout attempt).
  • Server stores the outcome keyed by (userId, idempotencyKey).
  • Server returns the stored outcome for duplicates.
Request
POST /api/payments
Idempotency-Key: 7b36f0d3-4a2d-4f5c-9c77-7f2a7c9c2f2a
Content-Type: application/json

{ "amountCents": 4999, "currency": "USD" }

Minimal schema

sql
CREATE TABLE idempotency_keys (
  user_id           TEXT NOT NULL,
  idem_key          TEXT NOT NULL,
  request_hash      TEXT NOT NULL,
  status            TEXT NOT NULL,          -- IN_PROGRESS | COMPLETED | FAILED
  response_body     JSONB,
  response_code     INT,
  created_at        TIMESTAMPTZ NOT NULL DEFAULT now(),
  updated_at        TIMESTAMPTZ NOT NULL DEFAULT now(),
  PRIMARY KEY (user_id, idem_key)
);

Request hashing (the part people skip, then regret)

If the same key is reused with a different payload, you should reject it. Otherwise you might return the wrong cached response for a new intent.

api/payments.ts
import crypto from 'crypto'

function hashRequest(body: unknown) {
  return crypto.createHash('sha256').update(JSON.stringify(body)).digest('hex')
}

Control flow (how duplicates become boring)

  1. Attempt to insert (userId, key, requestHash, IN_PROGRESS).
  2. If insert succeeds → you “own” the key, do the work.
  3. If insert fails (already exists):
    • if COMPLETED → return stored response (same status + body)
    • if IN_PROGRESS → either wait/poll, or return 409 Try again
    • if FAILED → return the failure (or allow retry, depending on domain)

In Postgres, this is clean with INSERT ... ON CONFLICT.


The Myth to Kill: “Exactly Once”

People hear “idempotency” and think it means your code runs exactly once.

It doesn’t.

  • your handler might run twice
  • your worker might process twice
  • your webhook might arrive twice

Idempotency is not preventing duplicates.

It’s you saying:

“Even if I see this twice, the user-visible outcome stays correct.”

That’s what reliability looks like in the real world.


Idempotency in Job Queues (At-Least-Once Delivery)

Most queues are at-least-once. If your worker crashes after doing the work but before acking, the message comes back like a sequel.

So jobs that cause side effects need a dedupe story.

A simple pattern: store a “processed” marker

worker/emailWorker.ts
// Pseudocode
const jobId = msg.id // stable message id, or derived id

if (await db.processedJobs.exists(jobId)) {
  return ack()
}

await sendEmail(msg.payload)
await db.processedJobs.insert(jobId)
return ack()

The catch

If you do:

  1. send email
  2. record processed

…and crash in between, you’ll still send twice.

So in serious systems you either:

  • make the side effect idempotent at the provider level (dedupe keys)
  • or use a transactional approach like an outbox

The Outbox Pattern (Idempotency for Events)

The classic failure mode:

  • DB update succeeds
  • event publish fails or times out
  • retry publishes twice
  • downstream runs twice

The outbox pattern fixes this by recording the event in the DB in the same transaction as your state change, then publishing from a relay.

sql
BEGIN;

UPDATE orders SET status = 'PAID' WHERE id = $1;

INSERT INTO outbox_events (id, type, payload)
VALUES ($2, 'OrderPaid', $3);

COMMIT;

Then a relay reads outbox_events and publishes with dedupe on event.id.

This is one of those patterns that feels “extra”… right up until the first incident.


The Rules I Actually Follow

  1. Every side-effecting endpoint gets an idempotency story

    • payments, signups, provisioning, webhooks, email
  2. Prefer natural idempotency when you can

    • PUT/PATCH that sets state beats ambiguous POST
  3. Scope idempotency keys

    • typically (userId, key) or (accountId, key)
  4. Reject key reuse with different payloads

    • store a request_hash
  5. Return the exact previous response

    • same status + body (this makes retries invisible)
  6. Expire keys intentionally

    • keep for 24h/7d depending on your domain
  7. Handle IN_PROGRESS

    • racing requests should not both run the side effect

Why This Is Actually a Product Feature

Idempotency isn’t just backend hygiene. It’s UX.

When retries are safe:

  • users can mash buttons without fear
  • mobile apps recover cleanly from spotty networks
  • you can add aggressive client retries to improve perceived performance

It buys reliability and trust.


Closing: Put the Seatbelt On Before the Crash

Incidents rarely happen because your code ran once.

They happen because it ran twice.

Design for duplicates. Make retries boring. Ship with a seatbelt.