logo

Latency

Latency is the time a request takes to travel from client to server, get processed, and return a response. In API development, latency directly shapes user experience — every additional millisecond of delay means slower interfaces, lower conversion rates, and frustrated users.

Components of API Latency

Total latency is the sum of several stages:

StageTypical Duration
DNS resolution1–50ms
TCP connection10–100ms
Transport Layer Security (TLS) handshake30–100ms
Server processing1ms–minutes
Data transfer1–500ms

For most APIs, server processing time dominates. A database query taking 500ms dwarfs the 50ms of network overhead.

Measuring Latency: Percentiles

Averages hide problems. Use percentiles:

  • p50 (median): Half of requests are faster than this
  • p95: 95% of requests are faster — this is what most users experience
  • p99: 1 in 100 requests is slower — this catches tail latency

An API with a 100ms average might have a p99 of 3 seconds — meaning 1% of users wait 30x longer than expected.

How Async Processing Reduces Latency

The fastest API response is one that skips slow operations entirely:

// High latency: 12 seconds (waits for everything)
app.post('/onboard', async (req, res) => {
  await createUser(req.body);        // 50ms
  await sendEmail(user);             // 2,000ms
  await syncCRM(user);               // 3,000ms
  await generateAvatar(user);        // 4,000ms
  res.json({ user });                // Total: ~9,050ms
});

// Low latency: 80ms (offloads slow work)
app.post('/onboard', async (req, res) => {
  const user = await createUser(req.body);   // 50ms
  await queue.add('onboard-tasks', user);     // 30ms
  res.json({ user });                         // Total: ~80ms
});

By offloading slow operations to a task queue, the user-facing latency drops from seconds to milliseconds.