Lambda Error Types

AWS Lambda powers most serverless architectures. It also generates more silent failures than any other component in distributed systems. A Lambda function can break in at least six distinct ways, and each demands a different handling strategy. Treating them all the same - or ignoring them - leads to lost events, phantom retries, and 3 AM pages.

This post catalogs every Lambda error type, explains when each surfaces, and shows how to handle them in production. If you build workflows spanning multiple Lambda functions or external services, these patterns will save you painful lessons.

The Error Taxonomy

Lambda errors fall into two broad categories: faults the Lambda service itself generates (invocation errors) and exceptions your code throws (function errors). The distinction matters because they surface differently and demand different responses.

1. Invocation Errors

These occur before your code runs. The Lambda service rejects or aborts the invocation.

Throttling (429 TooManyRequestsException)

Your account or function hit the concurrency limit. The invocation never ran.

{
  "errorType": "TooManyRequestsException",
  "errorMessage": "Rate exceeded"
}

When this happens:

You exceeded the account-level concurrent execution limit (default: 1,000)
You exceeded the function’s reserved concurrency setting
A burst of invocations hit the per-region burst limit

How to handle it:

Synchronous invocations: The caller gets a 429. Retry with exponential backoff.
Async invocations: Lambda retries automatically (twice, with delays). After that, the event goes to the dead letter queue if configured.
Event source mappings (Simple Queue Service (SQS), Kinesis): Lambda pauses polling and retries the batch.

The lasting fix is to offload burst-heavy workloads to a task queue that controls concurrency for you, instead of relying on Lambda’s built-in limits.

Service errors (500 ServiceException)

The Lambda service itself hit an internal fault. Rare, but unavoidable.

{
  "errorType": "ServiceException",
  "errorMessage": "Internal error"
}

Your only option is to retry. These faults are transient by definition.

Invalid request errors (400)

Your invocation payload is malformed, exceeds size limits (6 MB for sync, 256 KB for async), or targets a nonexistent function.

{
  "errorType": "InvalidRequestContentException",
  "errorMessage": "Could not parse request body into json"
}

Never retry these. Fix the caller.

2. Runtime Errors

Your function launched and the runtime started, but your code threw an unhandled exception.

export const handler = async (event) => {
  const data = JSON.parse(event.body); // Throws if body is not valid JSON
  const result = await processOrder(data);
  return { statusCode: 200, body: JSON.stringify(result) };
};

Lambda traps the exception and returns:

{
  "errorType": "SyntaxError",
  "errorMessage": "Unexpected token u in JSON at position 0",
  "trace": ["SyntaxError: Unexpected token u in JSON at position 0", "at JSON.parse (<anonymous>)"]
}

For synchronous invocations, the caller receives this error with a 200 status code but a FunctionError header set to Unhandled. This surprises most developers: Lambda returns HTTP 200 even when your function throws.

// Calling code must check for function errors explicitly
const response = await lambda.invoke({ FunctionName: 'my-func', Payload: '...' });

if (response.FunctionError) {
  // The function threw - do not treat this as success
  const error = JSON.parse(response.Payload);
  console.error('Function error:', error.errorMessage);
}

3. Timeout Errors

Your function exceeded its configured timeout (max 15 minutes). Lambda terminates the process mid-execution.

{
  "errorType": "Runtime.ExitError",
  "errorMessage": "RequestId: abc-123 Error: Runtime exited with error: signal: killed"
}

Timeouts are the most dangerous error type because:

Partial work may have completed. You charged the customer but never recorded the transaction.
No cleanup runs. Finally blocks, shutdown hooks, and graceful termination never execute.
The failure is ambiguous. You cannot tell how far the function progressed before termination.

How to handle timeouts:

Check remaining time before starting expensive operations:

export const handler = async (event, context) => {
  const remainingMs = context.getRemainingTimeInMillis();

  if (remainingMs < 5000) {
    // Less than 5 seconds left - bail out instead of starting work
    // that will be killed mid-execution
    throw new Error('Insufficient time remaining');
  }

  await longRunningOperation();
};

Make operations idempotent so retries after a timeout do not cause duplicates.
Offload long work to a task queue. If your function regularly approaches its timeout, the work does not belong in a Lambda. Use AsyncQueue to run it as a background job with proper timeout handling and retries.

4. Out of Memory Errors

Your function exceeded its configured memory allocation. Lambda terminates the process.

REPORT RequestId: abc-123 Duration: 3450.23 ms Memory Size: 128 MB Max Memory Used: 129 MB

The function log shows Runtime.ExitError or stops with no response. As with timeouts, partial work may have completed.

Signs you are hitting memory limits:

Sporadic Runtime.ExitError with no stack trace
Max Memory Used in CloudWatch approaching Memory Size
Functions that pass in dev but fail in production (larger payloads)

Fix this by raising the memory ceiling, streaming large files instead of loading them whole, or splitting bulk operations into smaller batches via a task queue.

5. Cold Start Failures

A cold start initializes a new execution environment. This adds latency (100ms to 10+ seconds depending on runtime and dependencies), but it can also fail outright:

Dependency initialization errors: A database connection fails during module load
Package size issues: Deployment package exceeds limits (50 MB zipped, 250 MB unzipped)
VPC (Virtual Private Cloud) attachment delays: Functions in a VPC may take 10+ seconds to attach an Elastic Network Interface (ENI), triggering upstream timeouts

Cold start failures are transient. The next invocation gets a fresh environment that may succeed. But if your initialization code contains a deterministic bug (wrong connection string, missing environment variable), every cold start will break.

// This runs once per cold start - if it fails, every invocation in this
// environment fails
import { createPool } from './db';
const pool = createPool(process.env.DATABASE_URL); // Throws if URL is missing

export const handler = async (event) => {
  const conn = await pool.getConnection();
  // ...
};

6. Downstream Service Errors

Your function ran, but an external service it depends on failed. This is not technically a Lambda error, but it is the most common source of failures in distributed Lambda workflows.

export const handler = async (event) => {
  // Any of these can fail independently
  const user = await fetch('https://api.crm.com/users/123');       // CRM is down
  const charge = await stripe.charges.create({ amount: 1000 });     // Stripe timeout
  const email = await ses.sendEmail({ to: user.email });             // SES throttled

  return { statusCode: 200 };
};

Your function succeeds or fails as a unit. If the CRM call completes but Stripe times out, you have a partially finished workflow with no automatic recovery.

This is the core argument for async orchestration: each downstream call should be a separate task with its own retry strategy, not a chain of calls inside a single Lambda.

Error Handling Matrix

Error Type	Retryable?	Partial Work?	Compensation Needed?
Throttling (429)	Yes, with backoff	No	No
Service error (500)	Yes	No	No
Invalid request (400)	No, fix the caller	No	No
Runtime exception	Depends on cause	Possible	Depends
Timeout	Yes, but risky	Likely	Yes
Out of memory	Yes, after fix	Likely	Yes
Cold start failure	Usually yes	No	No
Downstream failure	Depends on service	Likely	Yes

The Pattern That Prevents Most Lambda Errors

Most Lambda failures in distributed systems stem from two sources: timeouts on long-running work and cascading downstream failures. Both yield to the same pattern: do not do the work inside the Lambda.

Instead:

Accept the request in your Lambda (validate input, generate an ID)
Queue the work to AsyncQueue with appropriate retries and timeouts
Return immediately with a tracking ID

export const handler = async (event) => {
  const body = JSON.parse(event.body);
  const jobId = crypto.randomUUID();

  await aq.tasks.create({
    targetUrl: 'https://your-app.com/api/process',
    payload: { jobId, ...body },
    webhookUrl: 'https://your-app.com/api/on-complete',
    retries: 3,
    timeout: 120,
  });

  return {
    statusCode: 202,
    body: JSON.stringify({ jobId, status: 'accepted' }),
  };
};

Your Lambda now does one thing (queue a task) in under 100ms. No timeouts, no memory pressure, no downstream failures. The task queue handles the unreliable parts with proper retries, timeout management, and failure isolation.