The Problem
Every SDK API call at MACHBANK passed through two authentication layers before reaching any business logic: an AWS Lambda authorizer at the API Gateway level, and an OAuth session service behind it. Both layers had structural performance problems — and they compounded each other.
The Lambda authorizer: a network call disguised as middleware
The Lambda authorizer (mach-maas-authorizer-lambda) sits in front of API Gateway and runs on every single inbound request. Its job is to validate the JWT and decide whether to allow or deny the request.
The old implementation delegated this work: instead of validating the JWT itself, it made a synchronous HTTP call to mach-auth-service — a separate microservice — to perform the check. Every request paid this cost: Lambda invocation → network round-trip → microservice → response → Lambda → API Gateway forward.
This added latency on every request and created a hard dependency between the authorizer and the auth service’s availability. If mach-auth-service was slow or degraded, every endpoint on the platform was slow.
The session service: a MongoDB-backed OAuth flow with no fast path
Behind the authorizer, mach-auth-service managed partner sessions via a full OAuth 2.0 PKCE flow with MongoDB-backed state. Every phase of the flow required database operations, and there was no path through the system that didn’t touch the database multiple times.
The token refresh path — executed on every SDK API call requiring a new access token — made sequential calls to Eolian (feature flags), MongoDB (token lookup), the Account Service (machId resolution), and then multiple DB writes:
1. Feature flag check (Eolian network call) ~50ms
2. DB lookup: find refresh token by jti ~30ms
3. Account Service: resolve machId from documentNumber ~80ms
4. DB writes (new tokens) ~60ms
5. DB updateMany: invalidate previous sessions ~50–200ms
Combined: 270–420ms per token refresh, before any business logic ran. Under load, the updateMany — which scans and invalidates all prior sessions for the device — scaled poorly with accumulated tokens and could push the total well past 400ms.
The OAuth flow structure
The channel auth flow supported two channel types with different requirements:
- Ally channels (e.g., MACH): full PKCE required — authorization code generated with
codeChallenge+codeChallengeMethodstored in MongoDB; token exchange validatescodeVerifiervia SHA256 hash. Also requiresmachIdresolution. - Partner channels (e.g., BCI, SSFFQR): PKCE optional,
machIdoptional. Simpler bootstrap path.
Six phases per session:
- Device registration → basic access token
- Authorization code generation (store PKCE challenge)
- Token exchange (validate PKCE verifier, issue session tokens)
- Token refresh (new access token, same refresh token)
- Acknowledge refresh token (invalidate all prior sessions via
updateMany) - Access token verification (scope check + fraud blacklist lookup)
Every phase touched MongoDB. Every phase added latency.
The Redesign
Two parallel changes eliminated both performance bottlenecks.
Fix 1: Move JWT validation into the Lambda
The new Lambda authorizer (mach-maas-authorizer-lambda) validates JWTs locally — no microservice call. It fetches the JWT secret from AWS Secrets Manager once per Lambda lifecycle and caches it in memory:
const secretCache = {}
async function getSecret(secretName) {
if (secretCache[secretName]) return secretCache[secretName]
const value = await secretsManager.getSecretValue({ SecretId: secretName })
secretCache[secretName] = value
return value
}
Lambda’s execution model makes this highly effective: a cold start pays the Secrets Manager latency once (~50–100ms). Every subsequent warm invocation retrieves the secret from secretCache in microseconds — zero network calls. Then jwt.verify() validates the token cryptographically in-process.
The result: the authorizer went from (network call to auth service) to (in-memory map lookup + ~1ms crypto). This change alone eliminated a network hop on every single request across the entire SDK platform.
Additional changes to the Lambda:
- Runtime upgraded to Node.js 22
- Added
deviceOsto the token context forwarded downstream to services - Extended to cover new authorization endpoints
Fix 2: Replace the session service with stateless JWT
mach-sdk-partner-auth-service replaced the MongoDB session model entirely. The insight: if all session state is encoded in a signed token, there is nothing to look up.
Session tokens carry all downstream-needed state — channelId, deviceId, deviceOs, channelCode, documentNumber, machId, sessionId — signed with HMAC-SHA256. Validation is a single in-process operation:
const decoded = jwt.verify(token, secrets.sdkSecret)
// microseconds. no network. no database.
Two-phase authentication replaces the six-phase OAuth flow:
B2B tokens (2-minute TTL) — channel partners (BCI, MACH, SSFFQR) each hold a channel-specific secret. The partner backend generates a short-lived JWT and passes it to the mobile client. The client exchanges it at /auth/session for session tokens. The B2B token is never stored.
Session tokens — issued after B2B validation:
- Access token: 20-minute TTL, signed with
SDK_SECRET - Refresh token: 30-day TTL, signed with
SDK_REFRESH_SECRET(separate key, independently rotatable)
The sessionId field embedded in session tokens distinguishes them from B2B bootstrap tokens — the refresh endpoint rejects any token without sessionId, making type discrimination explicit and cryptographic rather than conditional.
Two secrets for independent rotation:
Using separate secrets for access and refresh tokens means rotating SDK_SECRET to invalidate compromised access tokens doesn’t force every user to re-authenticate. The 20-minute TTL self-heals quickly. Refresh token rotation is independent.
Async device registration:
In the old system, device registration was synchronous on the session creation hot path. In the new service, it’s fire-and-forget:
registerDevice(payload).catch((error) => {
logger.error({ error }, 'Device registration failed')
})
The session response is returned immediately. If registration fails, it’s logged — not surfaced to the client.
Backward-compatible secret rotation:
Old tokens (signed with SDK_SECRET) continue working after the new SDK_REFRESH_SECRET is introduced. The refresh endpoint tries the new secret first, falls back to the old one. After the 30-day refresh TTL cycles out, no valid old-format tokens remain.
Impact
The combined changes eliminated two compounding bottlenecks:
| Before | After | |
|---|---|---|
| Lambda authorizer | Network call to auth service on every request | In-memory secret cache + ~1ms JWT verify |
| Per-request auth | 270–420ms+ per token refresh | <5ms |
| Session creation | Every request | Once per session (30-day refresh TTL) |
A 20x latency reduction that lifted the performance ceiling across every SDK payment flow.