All posts

Why Every Financial Flow Deserves a State Machine

The pattern that shows up in every payment system worth trusting: explicit states, atomic transitions, and a clear answer to 'where are we?' — applied to both a real-time QR payment flow and a daily financial settlement system.

Two different systems. Two different teams. Two different problem domains — one a real-time payment execution flow, one a daily financial settlement process. Both ended up with the same architectural pattern at their core.

That’s not a coincidence.

The Problem With Sequential Scripts

The naive way to build a financial flow is a sequential script: do step 1, then step 2, then step 3, return the result. This works until:

  • Step 2 fails halfway through
  • The same script gets triggered twice due to a retry
  • Step 3 times out and the caller doesn’t know if the operation completed
  • Someone needs to resume a flow that stopped three days ago

Sequential scripts make it hard to answer the fundamental operational question: where are we? At any point in time, a transaction or compensation record might be in an indeterminate state — past step 1, possibly past step 2, destination unknown. Debugging requires reading logs. Resuming requires understanding exactly what the script had and hadn’t done.

State machines make that question trivially answerable. A record is always in exactly one well-defined state. Every state name is meaningful. The transition that got it there is logged.

Pattern 1: XState in a Real-Time Payment Service

In mach-p2m-service, every QR payment transaction is governed by a formal state machine defined with XState:

REGISTERED
  └─→ DELIVERY_REQUESTED
        ├─→ DELIVERY_REQUEST_FAILED        [terminal]
        ├─→ DELIVERY_REQUEST_RECEIVED
        │     ├─→ AUTHORIZED               [terminal*]
        │     ├─→ REJECTED                 [terminal]
        │     ├─→ REVERSED                 [terminal]
        │     ├─→ NULLIFIED                [terminal]
        │     └─→ PARTIALLY_AUTHORIZED
        │           ├─→ AUTHORIZED         [terminal*]
        │           ├─→ REVERSED           [terminal]
        │           └─→ NULLIFIED          [terminal]
        └─→ AUTHORIZED
              ├─→ AUTHORIZED               ← idempotent
              ├─→ PARTIALLY_AUTHORIZED
              ├─→ REVERSED
              └─→ NULLIFIED

Every state change goes through verifyTransition() before being applied. Invalid transitions throw. A transaction in DELIVERY_REQUEST_FAILED can’t accidentally become AUTHORIZED — the machine won’t allow it.

The interesting edge: AUTHORIZED → AUTHORIZED is explicitly allowed. This is deliberate. Transbank sends payment webhooks, and networks are unreliable. If a webhook delivery fails and retries, the second delivery finds the transaction already AUTHORIZED and tries to apply the same transition again. Without idempotent handling, this throws an error and the retry logic treats the payment as failed. With it, the second delivery is silently accepted — the state doesn’t change, no error, the system moves on.

Idempotency as a state machine property, not an ad-hoc check.

Both payment transactions and payment cards maintain an immutable status history — every state change appended with a timestamp. This is the compliance audit trail, and it comes for free from the state machine discipline.

Pattern 2: Manual State Machine in a Settlement Service

The compensation flow in mach-prefunded-payments-service uses a different implementation — no XState, just a MongoDB field and explicit validation in each handler — but the same conceptual model:

CREATED → INTERNAL_TRANSFERED → AMOUNT_RECEIVED → INTERNAL_RETRIEVED → COMPLETED
                                                                       └─→ OMITTED

Each of the five handlers owns exactly one state transition. Before doing anything, it validates that the record is in the expected state:

function validateCompensationStatus(compensationStatus: CompensationState): void {
  const validStates = new Set([CompensationState.CREATED]);
  if (!validStates.has(compensationStatus)) {
    throw new BusinessError(errors.invalidStatus.name, errors.invalidStatus.message);
  }
}

If the state is wrong — already processed, skipped ahead, stuck — the handler dequeues without executing. No money moves. No error surface to retry. The record is examined by a human.

This is the property that makes the system safe to re-run: trigger any handler twice and the second execution finds the wrong state, validates, and exits cleanly. No double transfers.

The Retry Decision

Both systems share the same approach to distinguishing retryable from non-retryable failures. In the settlement service:

export function shouldRequeueError(error: Error): boolean {
  if (isMongoConnectionOrTimeoutError(error)) return true;
  const requeueable = new Set(['The request was not successful']);
  return requeueable.has(error.message);
}

Infrastructure failures — database connection dropped, banking service temporarily unreachable — requeue. The message goes back on the queue and the handler tries again when things recover.

Business logic failures — wrong state, record not found, validation error — dequeue. The failure is logged to New Relic. A human reviews. Retrying wouldn’t help and might cause harm.

This distinction is the second pillar of the pattern. The state machine prevents double execution when retries happen. The error classification controls when retries happen. Together they give you a system that heals automatically from transient failures and stops cleanly on real problems.

What Both Systems Share

Superficially, a real-time payment flow and a batch settlement process don’t look alike. One processes thousands of transactions per hour with sub-second latency requirements. The other runs once per business day and waits for human input between steps.

But at the level of correctness guarantees, they need the same things:

  • A record that always knows exactly where it is
  • Transitions that only happen from valid source states
  • Protection against executing the same step twice
  • A clear distinction between “retry this” and “stop and alert”

The state machine is the pattern that delivers all four. Whether you reach for a formal library like XState or implement it with a field and a Set, the discipline is the same: every transition is named, every name is meaningful, and no step executes without first asking “is this the right moment?”

In systems where the output is money moving between accounts, that question is not optional.