Cascade Failures in Serverless

A single slow API response can bring everything down. One downstream service starts responding in 8 seconds instead of 200 milliseconds. Within minutes, functions queue up, concurrency limits max out, unrelated endpoints buckle, and your users see errors on pages with no connection to the original fault.

Serverless architectures are uniquely vulnerable to this kind of cascade failure.

Why Serverless Amplifies Cascades

Traditional servers fail in predictable ways. When a thread pool fills up, new requests get rejected, and the blast radius stays confined to a single machine. Serverless functions scale automatically, which sounds like a feature until auto-scaling becomes the root cause.

The cascade sequence

Service B slows down. Response time jumps from 200ms to 8 seconds.
Functions calling Service B pile up. Each invocation holds a connection for 8 seconds instead of 200ms, demanding 40x more concurrent executions for the same throughput.
Concurrency limits are hit. Your account caps at 1,000 concurrent Lambda executions. Calls to Service B consume most of that pool.
Unrelated functions get throttled. Your checkout endpoint, search API, and webhook handlers share the same pool and begin returning 429 errors.
Retries amplify the damage. Throttled invocations retry, burning more concurrency. Upstream services also reattempt their calls, piling on extra load.
Total system failure. Every endpoint returns errors. One slow dependency has taken down the entire platform.

Service B slows down (200ms -> 8s)
  └─ Functions pile up (40x concurrency)
       └─ Account concurrency limit hit
            ├─ Checkout API throttled (429)
            ├─ Search API throttled (429)
            ├─ Webhook handlers throttled (429)
            └─ Retries amplify load
                 └─ Full system outage

Auto-scaling, the defining feature of serverless, is the very mechanism that turns a localized slowdown into a system-wide outage.

Real-World Cascade Patterns

The shared dependency

Multiple functions depend on the same database or API (Application Programming Interface). When that resource slows down, every caller grinds to a halt at once.

// These three functions share the same database
// When the DB is slow, all three consume excessive concurrency

// Function A: User profile
const user = await db.query('SELECT * FROM users WHERE id = ?', [userId]);

// Function B: Order history
const orders = await db.query('SELECT * FROM orders WHERE user_id = ?', [userId]);

// Function C: Analytics
const events = await db.query('SELECT * FROM events WHERE user_id = ?', [userId]);

The retry storm

A function fails, its caller retries, the reattempt also fails, and the next caller upstream retries too. Each layer multiplies the total request volume.

API Gateway -> Function A (retries 3x) -> Function B (retries 3x) -> Service C (down)

Total calls to Service C: 3 x 3 = 9 per original request
With 100 concurrent users: 900 calls to an already-failing service

The synchronous chain

A request passes through five functions in sequence. The total wait equals the sum of all individual timeouts, yet each function sets its own limit independently.

// Function A calls Function B calls Function C
// A has a 30s timeout, B has a 30s timeout, C has a 30s timeout
// If C takes 25s, B waits 25s, then A waits 25s for B's response
// Total: 50s - but API Gateway has a 29s timeout
// Result: A always times out, even though C technically succeeded

Pattern 1: Bulkhead Isolation

The bulkhead pattern stops one failing dependency from consuming all available concurrency. In serverless, this means setting reserved concurrency per function.

functions:
  processPayment:
    handler: payment.handler
    reservedConcurrency: 100  # Can never use more than 100 concurrent executions

  sendNotification:
    handler: notification.handler
    reservedConcurrency: 50   # Isolated from payment function's concurrency

  syncToCRM:
    handler: crm.handler
    reservedConcurrency: 30   # If CRM is slow, only 30 executions are affected

If the CRM (Customer Relationship Management) system goes down, syncToCRM consumes at most 30 concurrent executions. Payment and notification functions stay healthy because they draw from their own reserved pools.

The trade-off: reserved concurrency shrinks the pool available to other functions. Plan your allocation budget across every function.

Pattern 2: Circuit Breakers

A circuit breaker stops calling a failing service after a threshold of errors, saving your function from burning time and concurrency on requests destined to fail.

// Simple circuit breaker for serverless
// Store state in a shared cache (Redis, DynamoDB) since Lambda
// instances do not share memory

const CIRCUIT_KEY = 'circuit:service-b';
const FAILURE_THRESHOLD = 5;
const RESET_TIMEOUT_MS = 30000; // 30 seconds

async function callWithCircuitBreaker(serviceFn) {
  const circuit = await redis.get(CIRCUIT_KEY);

  if (circuit) {
    const state = JSON.parse(circuit);

    if (state.status === 'open') {
      const elapsed = Date.now() - state.openedAt;

      if (elapsed < RESET_TIMEOUT_MS) {
        // Circuit is open - fail fast without calling the service
        throw new Error('Circuit breaker open: service-b is unavailable');
      }

      // Try a single request to see if the service recovered
      state.status = 'half-open';
      await redis.set(CIRCUIT_KEY, JSON.stringify(state));
    }
  }

  try {
    const result = await serviceFn();

    // Success - reset the circuit
    await redis.del(CIRCUIT_KEY);
    return result;
  } catch (err) {
    const state = circuit ? JSON.parse(circuit) : { failures: 0 };
    state.failures = (state.failures || 0) + 1;

    if (state.failures >= FAILURE_THRESHOLD) {
      state.status = 'open';
      state.openedAt = Date.now();
    }

    await redis.set(CIRCUIT_KEY, JSON.stringify(state));
    throw err;
  }
}

When the circuit opens, your function fails in milliseconds instead of waiting seconds for a timeout, freeing concurrency for healthy requests.

Pattern 3: Async Decoupling

The strongest defense against cascades: do not call downstream services synchronously. Queue the work and return right away.

// Before: synchronous chain, cascade-prone
export const handler = async (event) => {
  const order = JSON.parse(event.body);
  const payment = await chargePayment(order);          // If slow, blocks here
  const shipment = await createShipment(payment);       // And here
  const notification = await sendEmail(shipment);       // And here
  return { statusCode: 200, body: JSON.stringify(payment) };
};

// After: async decoupling, cascade-proof
export const handler = async (event) => {
  const order = JSON.parse(event.body);
  const orderId = crypto.randomUUID();

  await aq.tasks.create({
    targetUrl: 'https://your-app.com/api/process-order',
    payload: { orderId, ...order },
    webhookUrl: 'https://your-app.com/api/order-orchestrator',
    retries: 3,
    timeout: 60,
  });

  return {
    statusCode: 202,
    body: JSON.stringify({ orderId, status: 'accepted' }),
  };
};

With async decoupling:

Your Lambda finishes in milliseconds, releasing concurrency right away
Each downstream call runs as a separate task with its own retry and timeout
If one service stalls, only its tasks queue up while other functions stay healthy
The task queue acts as a natural backpressure mechanism, processing work at a sustainable rate

Pattern 4: Timeout Budgets

In synchronous chains, derive timeouts from the total budget rather than setting each one per service:

export const handler = async (event, context) => {
  const totalBudget = context.getRemainingTimeInMillis();

  // Allocate time proportionally
  const step1Timeout = Math.min(5000, totalBudget * 0.3);
  const step2Timeout = Math.min(10000, totalBudget * 0.5);
  const step3Timeout = Math.min(3000, totalBudget * 0.2);

  const user = await fetchWithTimeout(getUserUrl, step1Timeout);
  const order = await fetchWithTimeout(createOrderUrl, step2Timeout);
  const email = await fetchWithTimeout(sendEmailUrl, step3Timeout);

  return { statusCode: 200, body: JSON.stringify(order) };
};

async function fetchWithTimeout(url, timeoutMs) {
  const controller = new AbortController();
  const timer = setTimeout(() => controller.abort(), timeoutMs);

  try {
    const res = await fetch(url, { signal: controller.signal });
    return res.json();
  } finally {
    clearTimeout(timer);
  }
}

This prevents a slow first step from devouring the entire budget, leaving later steps with no remaining time.

Pattern 5: Shed Load Early

When your system is under stress, rejecting requests fast beats accepting them and failing slowly. Add admission control at the edge:

export const handler = async (event, context) => {
  // Check system health before accepting work
  const health = await redis.get('system:health');

  if (health === 'degraded') {
    // Return 503 immediately instead of attempting work that will fail
    return {
      statusCode: 503,
      headers: { 'Retry-After': '30' },
      body: JSON.stringify({
        error: 'Service temporarily unavailable',
        retryAfter: 30,
      }),
    };
  }

  // System is healthy - proceed normally
  return processRequest(event);
};

A fast 503 beats a slow 500. It frees concurrency, signals the client to back off, and keeps the request from burning resources in a system that cannot serve it.

Prevention Checklist

Before your next deployment, verify these safeguards:

Critical functions have reserved concurrency to prevent noisy-neighbor throttling
External service calls use circuit breakers that fail fast on repeated errors
Long-running operations are offloaded to a task queue instead of running inline
Timeout budgets account for the full call chain, not just individual steps
Retry policies use exponential backoff with a maximum attempt cap
Health checks and load shedding support degraded-mode operation
Monitoring alerts fire on concurrency utilization, not just error rates

The Core Principle

Cascade failures happen when one broken component devours shared resources - concurrency, connections, memory - that other components need. Every pattern above works by capping how much of a shared resource any single fault can claim.

The simplest fix: make every downstream call asynchronous. When your function never waits for a response, a slow service cannot drain your concurrency. A task queue handles the waiting, retrying, and failure isolation on your behalf.