logo

When a webhook delivery fails, the event is not gone - but it might as well vanish if you lack a recovery plan. A 500 error, a network timeout, or a crashed server can all cause webhook failures. Without proper handling, those events get retried a few times and then dropped for good.

This guide shows you how to build webhook handling that never loses an event, even when things break.

Step 1: Understand How Webhook Deliveries Fail

Webhook delivery can fail at multiple points:

Sender ──HTTP POST──> Your Server ──Process──> Database
│ │ │
▼ ▼ ▼
Network error Server crash DB timeout
DNS failure 500 error Constraint violation
TLS error Memory limit Connection refused

What happens after a failure:

Most webhook senders (including AsyncQueue) retry failed deliveries with exponential backoff. But retries have limits. After 3-5 attempts over a few hours, the sender gives up.

Failure TypeRetried?Data Lost?
Your server returns 5xxYesOnly if all retries fail
Your server returns 4xxNoYes (sender assumes invalid)
Network timeoutYesOnly if all retries fail
Your server is down for hoursMaybeLikely - retries may exhaust
Your handler throws unhandled exceptionYes (if 5xx)Only if all retries fail

The critical insight: you must not rely on the sender’s retry mechanism as your sole safety net. You need your own.

Step 2: Respond Fast to Prevent False Failures

The most common cause of “failed” webhook deliveries is slow response time. If your handler takes 10 seconds to process and the sender has a 5-second timeout, the delivery gets marked as failed even though your handler finishes eventually.

// BAD - slow response triggers false failure
app.post('/api/webhook', async (req, res) => {
await validateSignature(req); // 50ms
await lookupOrder(req.body.orderId); // 200ms
await updateInventory(req.body); // 3000ms
await sendConfirmation(req.body); // 2000ms
await updateAnalytics(req.body); // 1000ms
// Total: 6.25 seconds - sender may have already timed out
res.json({ received: true });
});
// GOOD - respond instantly, process later
app.post('/api/webhook', async (req, res) => {
await validateSignature(req); // 50ms - must do this synchronously
// Queue for reliable background processing
await aq.tasks.create({
targetUrl: 'https://your-app.com/api/process-webhook-event',
payload: req.body,
maxRetries: 5,
retryBackoff: 'exponential',
});
res.json({ received: true }); // 100ms total
});

Rule of thumb: Your webhook endpoint should respond in under 1 second. Anything slower belongs in a background task.

Step 3: Use a Task Queue as a Reliability Buffer

By routing incoming webhooks through a task queue, you gain automatic retries, persistence, and observability at no extra effort.

// Webhook receiver - minimal, fast, reliable
app.post('/api/webhook', async (req, res) => {
if (!verifySignature(req)) {
return res.status(401).json({ error: 'Invalid signature' });
}
const { task } = await aq.tasks.create({
targetUrl: 'https://your-app.com/api/handle-event',
payload: {
eventId: req.body.id,
eventType: req.body.type,
data: req.body.data,
receivedAt: new Date().toISOString(),
},
maxRetries: 5,
retryBackoff: 'exponential',
timeout: 30,
});
// Store event reference for auditing
await db.webhookEvents.insert({
eventId: req.body.id,
taskId: task.id,
eventType: req.body.type,
receivedAt: new Date(),
status: 'queued',
});
res.json({ received: true });
});

What this gives you:

  • Webhook sender sees a fast 200 response (no retries on their end)
  • If your handler fails, the task queue retries 5 times with backoff
  • Every event lands in your database for auditing
  • You can inspect failed events in the task dashboard

Step 4: Store Failed Events in a Dead Letter Queue

Even with retries, some events will fail for good. A bug in your handler, a schema change you missed, or corrupted data can cause persistent failures. These events need a durable home where you can find and fix them.

// Event handler with dead letter fallback
app.post('/api/handle-event', async (req, res) => {
try {
await processEvent(req.body);
await db.webhookEvents.update(req.body.eventId, { status: 'processed' });
res.json({ received: true });
} catch (error) {
// Check if this is likely a permanent failure
if (isPermanentError(error)) {
// Store in dead letter queue instead of retrying
await db.deadLetterEvents.insert({
eventId: req.body.eventId,
eventType: req.body.eventType,
payload: JSON.stringify(req.body.data),
error: error.message,
failedAt: new Date(),
});
await db.webhookEvents.update(req.body.eventId, { status: 'dead_letter' });
// Return 200 to prevent further retries
return res.json({ received: true, deadLettered: true });
}
// Transient error - let the task queue retry
res.status(500).json({ error: 'Processing failed' });
}
});
function isPermanentError(error) {
// Validation errors, missing data, schema mismatches
return error.name === 'ValidationError'
|| error.message.includes('not found')
|| error.message.includes('invalid format');
}

Review dead letter events on a regular schedule:

// Admin endpoint to list dead letter events
app.get('/api/admin/dead-letters', async (req, res) => {
const events = await db.deadLetterEvents.find({
failedAt: { gte: sevenDaysAgo },
});
res.json({ events, count: events.length });
});

Step 5: Build a Replay Mechanism

The ultimate safety net: the ability to re-process any historical event. This proves invaluable when you fix a bug and need to reprocess all events that failed because of the defect.

// Replay a single event
app.post('/api/admin/replay-event', async (req, res) => {
const { eventId } = req.body;
// Find the original event
const event = await db.webhookEvents.findOne({ eventId });
if (!event) {
return res.status(404).json({ error: 'Event not found' });
}
// Create a new task to reprocess
const { task } = await aq.tasks.create({
targetUrl: 'https://your-app.com/api/handle-event',
payload: event.payload,
maxRetries: 3,
});
await db.webhookEvents.update(eventId, {
status: 'replayed',
replayTaskId: task.id,
});
res.json({ taskId: task.id, status: 'replayed' });
});
// Replay all dead letter events from a date range
app.post('/api/admin/replay-dead-letters', async (req, res) => {
const { since, until } = req.body;
const events = await db.deadLetterEvents.find({
failedAt: { gte: since, lte: until },
});
const tasks = [];
for (const event of events) {
const { task } = await aq.tasks.create({
targetUrl: 'https://your-app.com/api/handle-event',
payload: JSON.parse(event.payload),
maxRetries: 3,
});
tasks.push(task.id);
}
res.json({ replayed: tasks.length, taskIds: tasks });
});

Best practices for replay:

  • Always replay through the same handler, not a special path
  • Log replayed events distinctly so you can trace their origin
  • Ensure your handler is idempotent so replays run safely
  • Test replay on a staging environment before running against production