Managing State in API Workflows

Multi-step API workflows produce state at every step: intermediate results, progress markers, error context, and data that downstream steps require. Losing this state destroys your ability to resume, debug, or compensate a failed workflow.

This guide covers practical patterns for tracking workflow state across async steps. The goal is straightforward: at any point, you should be able to answer “what happened, what is happening, and what happens next” for every active workflow.

Prerequisites: This guide builds on How API Orchestration Works and How to Chain Multiple API Calls. Read those first if you are new to async workflows.

Step 1: Design Your Workflow State Schema

A workflow state record needs five components:

const workflowState = {
  // Identity
  id: 'wf-abc-123',
  type: 'order-processing',

  // Status
  status: 'running', // pending | running | completed | failed | cancelled

  // Step tracking
  steps: {
    validate:  { status: 'completed', startedAt: '...', completedAt: '...' },
    charge:    { status: 'completed', startedAt: '...', completedAt: '...' },
    ship:      { status: 'running',   startedAt: '...' },
    notify:    { status: 'pending' },
  },

  // Results from completed steps
  results: {
    validate: { valid: true, riskScore: 12 },
    charge:   { transactionId: 'tx-456', amount: 99.00 },
  },

  // Original input (needed for retries and compensation)
  context: {
    userId: 'user-789',
    items: [{ sku: 'WIDGET-1', qty: 2 }],
    address: { street: '123 Main St', city: 'Austin' },
  },

  // Metadata
  createdAt: '2026-03-19T10:00:00Z',
  updatedAt: '2026-03-19T10:00:12Z',
  error: null,
};

Key design decisions:

Store results per step, not in a flat object. This makes it clear which step produced which data.
Keep the original context. You need it for retries, compensation, and debugging.
Track timing per step. This reveals bottlenecks without external monitoring tools.

Choosing a storage backend

Backend	Good for	Trade-off
Redis	High throughput, short-lived workflows	Data loss risk without persistence config
PostgreSQL/MySQL	Durability, complex queries, reporting	Higher latency per write
MongoDB	Flexible schema, moderate throughput	Less strict consistency
DynamoDB	Serverless, auto-scaling	Cost at high write volume

For most teams, a relational database with a JSON column for results and context offers the simplest starting point. Reserve Redis for high-volume workflows that complete within a few minutes.

Step 2: Persist State at Every Transition

The critical rule: write state before dispatching the next task. If you dispatch first and crash before writing, you lose all record of what was dispatched.

app.post('/api/orchestrator', async (req, res) => {
  const { workflowId, step: completedStep } = req.body.payload;
  const result = req.body.result;

  // 1. Write the completed step's result FIRST
  await db.workflows.update(workflowId, {
    [`steps.${completedStep}.status`]: 'completed',
    [`steps.${completedStep}.completedAt`]: new Date().toISOString(),
    [`results.${completedStep}`]: result,
    updatedAt: new Date().toISOString(),
  });

  // 2. Determine the next step
  const nextStep = getNextStep(completedStep);

  if (nextStep) {
    // 3. Mark the next step as dispatched before sending it
    await db.workflows.update(workflowId, {
      [`steps.${nextStep}.status`]: 'dispatched',
      [`steps.${nextStep}.startedAt`]: new Date().toISOString(),
    });

    // 4. Now dispatch
    await aq.tasks.create({
      targetUrl: STEP_CONFIG[nextStep].targetUrl,
      payload: { workflowId, step: nextStep, ...result },
      webhookUrl: 'https://your-app.com/api/orchestrator',
      retries: STEP_CONFIG[nextStep].retries,
      timeout: STEP_CONFIG[nextStep].timeout,
    });
  } else {
    await db.workflows.update(workflowId, {
      status: 'completed',
      updatedAt: new Date().toISOString(),
    });
  }

  res.status(200).json({ received: true });
});

If the process crashes after step 3 but before step 4, the recovery sweep (Step 4 below) detects any step stuck in dispatched status and re-dispatches the stalled task.

Step 3: Handle Concurrent State Updates

When parallel steps finish at the same time, two webhook handlers may try to update the same workflow record. Without protection, one update can overwrite the other.

Option A: Atomic field updates

If your database supports partial updates (MongoDB, Redis, DynamoDB), modify only the specific fields each step touches:

// Step "charge" completes
await db.workflows.update(workflowId, {
  'steps.charge.status': 'completed',
  'steps.charge.completedAt': new Date().toISOString(),
  'results.charge': { transactionId: 'tx-456' },
});

// Step "reserve" completes at the same time - no conflict
await db.workflows.update(workflowId, {
  'steps.reserve.status': 'completed',
  'steps.reserve.completedAt': new Date().toISOString(),
  'results.reserve': { reservationId: 'res-789' },
});

Each update targets different fields, so they never conflict.

Option B: Optimistic locking

For relational databases, add a version column and reject outdated writes:

async function updateWorkflow(workflowId, updates) {
  const workflow = await db.workflows.findById(workflowId);

  const result = await db.workflows.updateOne(
    { id: workflowId, version: workflow.version },
    { ...updates, version: workflow.version + 1 }
  );

  if (result.modifiedCount === 0) {
    // Another process updated first - reload and retry
    return updateWorkflow(workflowId, updates);
  }
}

Fan-in synchronization

When parallel branches must converge before the next step, verify completion atomically:

// After updating a parallel step, check if all parallel steps are done
const workflow = await db.workflows.findById(workflowId);
const parallelSteps = ['charge', 'reserve', 'fraud-check'];
const allComplete = parallelSteps.every(
  s => workflow.steps[s]?.status === 'completed'
);

if (allComplete) {
  await dispatchStep(workflowId, 'fulfill', {
    ...workflow.results.charge,
    ...workflow.results.reserve,
  });
}

To prevent duplicate dispatches from concurrent handlers both seeing “all complete”, use an atomic compare-and-set:

const result = await db.workflows.updateOne(
  { id: workflowId, 'steps.fulfill.status': 'pending' },
  { 'steps.fulfill.status': 'dispatched' }
);

// Only the handler that wins the update dispatches the next step
if (result.modifiedCount === 1) {
  await dispatchStep(workflowId, 'fulfill', payload);
}

Step 4: Build a Recovery Mechanism

Tasks can be dispatched but never report back because of network failures, process crashes, or bugs. A sweep process detects and recovers stalled workflows:

async function recoverStalledWorkflows() {
  const staleThreshold = new Date(Date.now() - 10 * 60 * 1000); // 10 minutes

  const stalled = await db.workflows.find({
    status: 'running',
    updatedAt: { $lt: staleThreshold },
  });

  for (const workflow of stalled) {
    // Find the step that is stuck
    const stuckStep = Object.entries(workflow.steps).find(
      ([, step]) => step.status === 'dispatched' || step.status === 'running'
    );

    if (!stuckStep) continue;

    const [stepName] = stuckStep;

    console.log(`Recovering workflow ${workflow.id}, re-dispatching step: ${stepName}`);

    // Build payload from stored context and previous results
    const payload = {
      workflowId: workflow.id,
      step: stepName,
      ...workflow.context,
      previousResults: workflow.results,
    };

    await aq.tasks.create({
      targetUrl: STEP_CONFIG[stepName].targetUrl,
      payload,
      webhookUrl: 'https://your-app.com/api/orchestrator',
      retries: STEP_CONFIG[stepName].retries,
      timeout: STEP_CONFIG[stepName].timeout,
    });

    await db.workflows.update(workflow.id, {
      updatedAt: new Date().toISOString(),
    });
  }
}

// Run every 5 minutes
setInterval(recoverStalledWorkflows, 5 * 60 * 1000);

Recovery only works because you stored context and results in Step 1. Without those fields, you cannot rebuild the payload for a re-dispatched step.

Step 5: Expire and Archive Old Workflows

Completed workflows accumulate over time. Without cleanup, your state store grows without bound, degrading query speed and raising storage costs.

async function archiveCompletedWorkflows() {
  const archiveThreshold = new Date(Date.now() - 7 * 24 * 60 * 60 * 1000); // 7 days

  const old = await db.workflows.find({
    status: { $in: ['completed', 'failed', 'cancelled'] },
    updatedAt: { $lt: archiveThreshold },
  });

  if (old.length === 0) return;

  // Move to cold storage
  await db.workflowArchive.insertMany(old);

  // Remove from active store
  const ids = old.map(w => w.id);
  await db.workflows.deleteMany({ id: { $in: ids } });

  console.log(`Archived ${old.length} workflows`);
}

Guidelines:

Keep active workflows in a fast store (Redis, primary database table)
Archive to cold storage (S3, separate table, data warehouse) after 7-30 days
Retain failed workflows longer for debugging - 30-90 days is reasonable
Index by status and updatedAt so cleanup queries are efficient

Checklist

Before shipping a stateful workflow to production, confirm the following:

State is written before every task dispatch
Parallel step completions do not overwrite each other
A recovery sweep detects and re-dispatches stuck steps
The original context is stored so steps can be retried with correct input
Completed workflows are archived on a schedule
You can query workflow status by ID for debugging and user-facing progress