logo

Latency

Latency is the time a request takes to travel from client to server, get processed, and return a response. In API development, latency directly shapes user experience — every additional millisecond of delay means slower interfaces, lower conversion rates, and frustrated users.

Components of API Latency

Total latency is the sum of several stages:

StageTypical Duration
DNS resolution1–50ms
TCP connection10–100ms
Transport Layer Security (TLS) handshake30–100ms
Server processing1ms–minutes
Data transfer1–500ms

For most APIs, server processing time dominates. A database query taking 500ms dwarfs the 50ms of network overhead.

Measuring Latency: Percentiles

Averages hide problems. Use percentiles:

  • p50 (median): Half of requests are faster than this
  • p95: 95% of requests are faster — this is what most users experience
  • p99: 1 in 100 requests is slower — this catches tail latency

An API with a 100ms average might have a p99 of 3 seconds — meaning 1% of users wait 30x longer than expected.

How Async Processing Reduces Latency

The fastest API response is one that skips slow operations entirely:

// High latency: 12 seconds (waits for everything)
app.post('/onboard', async (req, res) => {
await createUser(req.body); // 50ms
await sendEmail(user); // 2,000ms
await syncCRM(user); // 3,000ms
await generateAvatar(user); // 4,000ms
res.json({ user }); // Total: ~9,050ms
});
// Low latency: 80ms (offloads slow work)
app.post('/onboard', async (req, res) => {
const user = await createUser(req.body); // 50ms
await queue.add('onboard-tasks', user); // 30ms
res.json({ user }); // Total: ~80ms
});

By offloading slow operations to a task queue, the user-facing latency drops from seconds to milliseconds.