logo

Throttling in API Systems

Throttling is the practice of rejecting or delaying requests that exceed a defined rate. When a client sends more requests than a service can handle, the throttle kicks in and returns an error response. This protects the service from overload while giving the client a clear signal to slow down.

How It Works

  1. The service defines a rate limit (e.g., 100 requests per second per client)
  2. Each incoming request is checked against the current count for that client
  3. If the count is within the limit, the request proceeds normally
  4. If the count exceeds the limit, the service returns a 429 Too Many Requests response
  5. The response typically includes a Retry-After header telling the client when to try again
Client sends 150 req/s (limit: 100 req/s)
├── First 100 requests → 200 OK (processed)
└── Remaining 50 requests → 429 Too Many Requests

Concurrency Limits vs. Rate Limits

Rate limits cap the number of requests over a time window. Concurrency limits cap the number of requests being processed at the same moment. Both are forms of throttling, but they protect against different problems. Rate limits prevent sustained overload. Concurrency limits prevent resource exhaustion from long-running operations.

How Task Queues Help

When clients hit throttle limits, requests are lost unless the client retries. A task queue sits between the client and the service, absorbing bursts of traffic. Instead of rejecting excess requests, the queue buffers them and delivers them at a pace the service can handle. This turns bursty traffic into a smooth, predictable stream.

When to Use Throttling

  • Public APIs that must protect backend resources from unpredictable traffic
  • Multi-tenant systems where one noisy client could starve others
  • Upstream integrations where a third-party API enforces strict rate limits
  • Cost-sensitive services where every processed request incurs expense

Considerations

  • Return clear 429 responses with Retry-After headers so clients can back off gracefully
  • Log throttled requests to identify clients that consistently exceed limits
  • Combine throttling with backpressure for end-to-end flow control