Throttling in API Systems

Throttling is the practice of rejecting or delaying requests that exceed a defined rate. When a client sends more requests than a service can handle, the throttle kicks in and returns an error response. This protects the service from overload while giving the client a clear signal to slow down.

How It Works

The service defines a rate limit (e.g., 100 requests per second per client)
Each incoming request is checked against the current count for that client
If the count is within the limit, the request proceeds normally
If the count exceeds the limit, the service returns a 429 Too Many Requests response
The response typically includes a Retry-After header telling the client when to try again

Client sends 150 req/s (limit: 100 req/s)

  ├── First 100 requests  → 200 OK (processed)
  └── Remaining 50 requests → 429 Too Many Requests

Concurrency Limits vs. Rate Limits

Rate limits cap the number of requests over a time window. Concurrency limits cap the number of requests being processed at the same moment. Both are forms of throttling, but they protect against different problems. Rate limits prevent sustained overload. Concurrency limits prevent resource exhaustion from long-running operations.

How Task Queues Help

When clients hit throttle limits, requests are lost unless the client retries. A task queue sits between the client and the service, absorbing bursts of traffic. Instead of rejecting excess requests, the queue buffers them and delivers them at a pace the service can handle. This turns bursty traffic into a smooth, predictable stream.

When to Use Throttling

Public APIs that must protect backend resources from unpredictable traffic
Multi-tenant systems where one noisy client could starve others
Upstream integrations where a third-party API enforces strict rate limits
Cost-sensitive services where every processed request incurs expense

Considerations

Return clear 429 responses with Retry-After headers so clients can back off gracefully
Log throttled requests to identify clients that consistently exceed limits
Combine throttling with backpressure for end-to-end flow control