Cloud Function Timeout Issues

Cloud functions excel at most tasks. They scale to zero, cost nothing when idle, and handle traffic spikes on their own. But they carry one critical limitation that catches every team eventually: timeout limits.

When a cloud function exceeds its timeout, the platform kills the process. No graceful shutdown, no partial response, no indication to the user. The work vanishes.

Timeout Limits by Platform

Every serverless platform enforces different limits, and the numbers aren’t always intuitive:

Platform	Default	Maximum	Billed Per
Vercel Serverless (Hobby)	10s	10s	1ms
Vercel Serverless (Pro)	60s	300s	1ms
Netlify Functions	10s	26s	—
AWS Lambda	3s	900s (15 min)	1ms
AWS API Gateway + Lambda	30s	30s	—
Google Cloud Functions (1st gen)	60s	540s (9 min)	100ms
Google Cloud Functions (2nd gen)	60s	3600s (60 min)	100ms
Azure Functions (Consumption)	5 min	10 min	—
Cloudflare Workers	—	30s CPU time	—
Supabase Edge Functions	60s	150s	—

The critical trap: API gateways impose their own limits. AWS Lambda supports 15 minutes, but behind API Gateway, the effective ceiling is 30 seconds. Vercel’s Pro plan allows 300 seconds for background functions, but standard serverless routes still cap at 60 seconds.

What Actually Happens When a Function Times Out

Understanding the failure mode matters because it affects data consistency:

1. The process is killed immediately

There’s no SIGTERM, no chance to clean up, no finally block that runs. The platform terminates the runtime.

export async function POST(req) {
  const order = await createOrder(req.body);      // ✓ Completed
  const payment = await chargeCard(order);         // ✓ Completed
  await sendConfirmationEmail(order, payment);     // ✗ Killed mid-execution
  await updateInventory(order);                    // ✗ Never started

  return Response.json({ success: true });         // ✗ Never sent
}

The order was created and the card was charged, but the confirmation email never sent and inventory never updated. The client receives a timeout error with no idea the payment succeeded.

2. The client receives a generic error

Most platforms return a 504 [Gateway Timeout](/glossary/gateway-timeout/) with no useful information:

{
  "error": "FUNCTION_INVOCATION_TIMEOUT",
  "message": "Task timed out after 10.00 seconds"
}

3. Retries can cause duplicates

If the client retries the request, every step that completed before the timeout runs again — potentially charging the card a second time.

Operations That Commonly Timeout

Some tasks are inherently incompatible with serverless timeout limits:

Video and image processing

Transcoding a 5-minute video takes 2–10 minutes. Resizing a batch of 100 high-resolution images takes 30–60 seconds. Neither fits within a 10-second window.

PDF generation

Complex reports with charts, tables, and hundreds of pages can take 30 seconds to several minutes to render.

AI and ML inference

Calling AI APIs for image generation, document analysis, or natural language processing takes 15–60 seconds — and can spike to several minutes due to cold starts or heavy load.

Data imports and exports

Parsing a 50MB CSV file, validating every row, and inserting into a database takes minutes. Generating an export from millions of rows takes even longer.

Third-party API orchestration

Calling a chain of external APIs — payment processor, fraud check, shipping provider, notification service — compounds latency at every step.

Database migrations and bulk operations

Updating millions of rows, rebuilding indexes, or backfilling data requires sustained execution that serverless functions cannot provide.

Patterns for Working Around Timeouts

Pattern 1: Break work into smaller chunks

Split large operations into individual function invocations:

// Instead of processing 10,000 items in one function
export async function POST(req) {
  const items = await getItems(); // 10,000 items
  for (const item of items) {
    await processItem(item); // Timeout!
  }
}

// Process in batches of 100
export async function POST(req) {
  const batch = await getNextBatch(100);
  for (const item of batch) {
    await processItem(item);
  }

  if (hasMoreItems()) {
    await triggerNextBatch(); // Call yourself again
  }
}

This works for straightforward cases but adds complexity: you must track progress, handle partial failures, and manage the recursion.

Pattern 2: Use platform-specific background features

Some platforms offer extended execution for background tasks:

// Vercel: use waitUntil for fire-and-forget work
import { after } from 'next/server';

export async function POST(req) {
  const order = await createOrder(req.body);

  after(async () => {
    // This runs after the response is sent
    // Still subject to function timeout limits
    await sendConfirmationEmail(order);
  });

  return Response.json({ orderId: order.id });
}

The limitation: waitUntil and similar APIs still run within the function’s timeout. They don’t extend the execution window — they only let you do work after sending the response.

Pattern 3: Offload to a task queue

The most reliable pattern. Your serverless function accepts the request, creates a task in the task queue, and responds immediately:

export async function POST(req) {
  const data = await req.json();

  const task = await asyncqueue.tasks.create({
    callbackUrl: 'https://video-service.example.com/transcode',
    payload: {
      videoUrl: data.url,
      format: 'mp4',
      resolution: '1080p',
    },
    webhookUrl: `${process.env.APP_URL}/api/on-transcode-complete`,
    retries: 3,
    backoff: 'exponential',
  });

  // Responds in ~50ms, regardless of how long transcoding takes
  return Response.json({
    taskId: task.id,
    status: 'processing',
  });
}

Your function finishes in 50ms instead of 5 minutes. AsyncQueue handles the long-running work with:

No timeout limit on task execution
Automatic retries with exponential backoff if the service fails
Result storage accessible via API
Full logging of every attempt

Pattern 4: Use streaming responses

For tasks that produce incremental output, streaming keeps the connection alive:

export async function GET(req) {
  const stream = new ReadableStream({
    async start(controller) {
      for (const chunk of await getDataChunks()) {
        controller.enqueue(new TextEncoder().encode(JSON.stringify(chunk) + '\n'));
      }
      controller.close();
    },
  });

  return new Response(stream, {
    headers: { 'Content-Type': 'application/x-ndjson' },
  });
}

Each chunk resets the timeout clock, but this pattern doesn’t help operations that can’t produce incremental output.

Choosing the Right Pattern

Situation	Best Pattern
Batch processing (pageable)	Break into chunks
Non-critical follow-up work	Platform background APIs
Long-running external calls	Task queue
Large data generation	Streaming or task queue
Critical operations (payments)	Task queue with retries
Real-time incremental data	Streaming

Prevention: Design for Timeouts From the Start

1. Make operations idempotent

Every API handler should be safe to retry:

// Use a unique idempotency key
const existing = await db.payments.findByIdempotencyKey(req.headers['idempotency-key']);
if (existing) return Response.json(existing);

2. Separate fast and slow paths

Don’t mix instant operations with potentially slow ones in the same endpoint. Accept the request fast and process it asynchronously.

3. Set client-side timeouts shorter than server limits

If your platform times out at 10 seconds, set your client timeout to 8 seconds. This lets you return a meaningful error instead of a generic gateway timeout.

4. Monitor execution duration

Track p95 and p99 function execution times. When they approach your timeout limit, refactor — don’t wait until users start seeing errors.

Conclusion

Cloud function timeouts aren’t a bug — they’re a design constraint. Every serverless platform enforces them to maintain stability and fair resource allocation.

The teams that succeed with serverless design around this constraint from day one: keep functions fast, offload slow work to task queues, and make everything idempotent. Your functions should do one thing quickly — accept work — and let infrastructure like AsyncQueue handle the rest.