AWS Lambda powers most serverless architectures. It also generates more silent failures than any other component in distributed systems. A Lambda function can break in at least six distinct ways, and each demands a different handling strategy. Treating them all the same - or ignoring them - leads to lost events, phantom retries, and 3 AM pages.
This post catalogs every Lambda error type, explains when each surfaces, and shows how to handle them in production. If you build workflows spanning multiple Lambda functions or external services, these patterns will save you painful lessons.
The Error Taxonomy
Lambda errors fall into two broad categories: faults the Lambda service itself generates (invocation errors) and exceptions your code throws (function errors). The distinction matters because they surface differently and demand different responses.
1. Invocation Errors
These occur before your code runs. The Lambda service rejects or aborts the invocation.
Throttling (429 TooManyRequestsException)
Your account or function hit the concurrency limit. The invocation never ran.
{ "errorType": "TooManyRequestsException", "errorMessage": "Rate exceeded"}When this happens:
- You exceeded the account-level concurrent execution limit (default: 1,000)
- You exceeded the function’s reserved concurrency setting
- A burst of invocations hit the per-region burst limit
How to handle it:
- Synchronous invocations: The caller gets a 429. Retry with exponential backoff.
- Async invocations: Lambda retries automatically (twice, with delays). After that, the event goes to the dead letter queue if configured.
- Event source mappings (Simple Queue Service (SQS), Kinesis): Lambda pauses polling and retries the batch.
The lasting fix is to offload burst-heavy workloads to a task queue that controls concurrency for you, instead of relying on Lambda’s built-in limits.
Service errors (500 ServiceException)
The Lambda service itself hit an internal fault. Rare, but unavoidable.
{ "errorType": "ServiceException", "errorMessage": "Internal error"}Your only option is to retry. These faults are transient by definition.
Invalid request errors (400)
Your invocation payload is malformed, exceeds size limits (6 MB for sync, 256 KB for async), or targets a nonexistent function.
{ "errorType": "InvalidRequestContentException", "errorMessage": "Could not parse request body into json"}Never retry these. Fix the caller.
2. Runtime Errors
Your function launched and the runtime started, but your code threw an unhandled exception.
export const handler = async (event) => { const data = JSON.parse(event.body); // Throws if body is not valid JSON const result = await processOrder(data); return { statusCode: 200, body: JSON.stringify(result) };};Lambda traps the exception and returns:
{ "errorType": "SyntaxError", "errorMessage": "Unexpected token u in JSON at position 0", "trace": ["SyntaxError: Unexpected token u in JSON at position 0", "at JSON.parse (<anonymous>)"]}For synchronous invocations, the caller receives this error with a 200 status code but a FunctionError header set to Unhandled. This surprises most developers: Lambda returns HTTP 200 even when your function throws.
// Calling code must check for function errors explicitlyconst response = await lambda.invoke({ FunctionName: 'my-func', Payload: '...' });
if (response.FunctionError) { // The function threw - do not treat this as success const error = JSON.parse(response.Payload); console.error('Function error:', error.errorMessage);}3. Timeout Errors
Your function exceeded its configured timeout (max 15 minutes). Lambda terminates the process mid-execution.
{ "errorType": "Runtime.ExitError", "errorMessage": "RequestId: abc-123 Error: Runtime exited with error: signal: killed"}Timeouts are the most dangerous error type because:
- Partial work may have completed. You charged the customer but never recorded the transaction.
- No cleanup runs. Finally blocks, shutdown hooks, and graceful termination never execute.
- The failure is ambiguous. You cannot tell how far the function progressed before termination.
How to handle timeouts:
- Check remaining time before starting expensive operations:
export const handler = async (event, context) => { const remainingMs = context.getRemainingTimeInMillis();
if (remainingMs < 5000) { // Less than 5 seconds left - bail out instead of starting work // that will be killed mid-execution throw new Error('Insufficient time remaining'); }
await longRunningOperation();};- Make operations idempotent so retries after a timeout do not cause duplicates.
- Offload long work to a task queue. If your function regularly approaches its timeout, the work does not belong in a Lambda. Use AsyncQueue to run it as a background job with proper timeout handling and retries.
4. Out of Memory Errors
Your function exceeded its configured memory allocation. Lambda terminates the process.
REPORT RequestId: abc-123 Duration: 3450.23 ms Memory Size: 128 MB Max Memory Used: 129 MBThe function log shows Runtime.ExitError or stops with no response. As with timeouts, partial work may have completed.
Signs you are hitting memory limits:
- Sporadic
Runtime.ExitErrorwith no stack trace Max Memory Usedin CloudWatch approachingMemory Size- Functions that pass in dev but fail in production (larger payloads)
Fix this by raising the memory ceiling, streaming large files instead of loading them whole, or splitting bulk operations into smaller batches via a task queue.
5. Cold Start Failures
A cold start initializes a new execution environment. This adds latency (100ms to 10+ seconds depending on runtime and dependencies), but it can also fail outright:
- Dependency initialization errors: A database connection fails during module load
- Package size issues: Deployment package exceeds limits (50 MB zipped, 250 MB unzipped)
- VPC (Virtual Private Cloud) attachment delays: Functions in a VPC may take 10+ seconds to attach an Elastic Network Interface (ENI), triggering upstream timeouts
Cold start failures are transient. The next invocation gets a fresh environment that may succeed. But if your initialization code contains a deterministic bug (wrong connection string, missing environment variable), every cold start will break.
// This runs once per cold start - if it fails, every invocation in this// environment failsimport { createPool } from './db';const pool = createPool(process.env.DATABASE_URL); // Throws if URL is missing
export const handler = async (event) => { const conn = await pool.getConnection(); // ...};6. Downstream Service Errors
Your function ran, but an external service it depends on failed. This is not technically a Lambda error, but it is the most common source of failures in distributed Lambda workflows.
export const handler = async (event) => { // Any of these can fail independently const user = await fetch('https://api.crm.com/users/123'); // CRM is down const charge = await stripe.charges.create({ amount: 1000 }); // Stripe timeout const email = await ses.sendEmail({ to: user.email }); // SES throttled
return { statusCode: 200 };};Your function succeeds or fails as a unit. If the CRM call completes but Stripe times out, you have a partially finished workflow with no automatic recovery.
This is the core argument for async orchestration: each downstream call should be a separate task with its own retry strategy, not a chain of calls inside a single Lambda.
Error Handling Matrix
| Error Type | Retryable? | Partial Work? | Compensation Needed? |
|---|---|---|---|
| Throttling (429) | Yes, with backoff | No | No |
| Service error (500) | Yes | No | No |
| Invalid request (400) | No, fix the caller | No | No |
| Runtime exception | Depends on cause | Possible | Depends |
| Timeout | Yes, but risky | Likely | Yes |
| Out of memory | Yes, after fix | Likely | Yes |
| Cold start failure | Usually yes | No | No |
| Downstream failure | Depends on service | Likely | Yes |
The Pattern That Prevents Most Lambda Errors
Most Lambda failures in distributed systems stem from two sources: timeouts on long-running work and cascading downstream failures. Both yield to the same pattern: do not do the work inside the Lambda.
Instead:
- Accept the request in your Lambda (validate input, generate an ID)
- Queue the work to AsyncQueue with appropriate retries and timeouts
- Return immediately with a tracking ID
export const handler = async (event) => { const body = JSON.parse(event.body); const jobId = crypto.randomUUID();
await aq.tasks.create({ targetUrl: 'https://your-app.com/api/process', payload: { jobId, ...body }, webhookUrl: 'https://your-app.com/api/on-complete', retries: 3, timeout: 120, });
return { statusCode: 202, body: JSON.stringify({ jobId, status: 'accepted' }), };};Your Lambda now does one thing (queue a task) in under 100ms. No timeouts, no memory pressure, no downstream failures. The task queue handles the unreliable parts with proper retries, timeout management, and failure isolation.
Further Reading
- Cloud Function Timeout Issues - timeout limits across all major platforms
- How to Handle Long-Running API Calls - offloading slow work to a queue
- Handling Failures in API Chains - compensation and dead letter patterns
- How to Run Background Tasks on Vercel - similar patterns for Vercel’s serverless functions
