logo

Load Shedding for System Protection

Load shedding is the practice of intentionally rejecting requests when a system is overwhelmed. Rather than letting every request degrade in quality or crash the entire service, load shedding sacrifices some traffic to preserve stability for the rest.

How It Works

When system resources approach critical thresholds, a load shedding layer begins rejecting incoming requests. The goal is to keep the system operating within its capacity so that accepted requests complete successfully.

Incoming traffic: 10,000 req/sec
System capacity: 6,000 req/sec
Without load shedding:
→ All 10,000 slow down → timeouts → cascading failure
With load shedding:
→ 6,000 accepted and processed normally
→ 4,000 rejected immediately with HTTP 503

Load Shedding vs. Rate Limiting

These two mechanisms serve different purposes:

AspectRate LimitingLoad Shedding
TriggerPer-client request countOverall system resource pressure
GoalFair usage enforcementSystem survival
TimingProactive - applied before overloadReactive - kicks in during overload
ScopePer user, per API key, per IPGlobal across all traffic

Rate limiting prevents any single client from consuming too many resources. Load shedding protects the entire system when aggregate demand exceeds capacity, regardless of how many clients contribute to that demand.

Strategies for Choosing What to Drop

Not all requests carry equal importance. Effective load shedding prioritizes what matters most:

  • Priority-based: Assign priority levels to request types and drop the lowest tier first
  • Random sampling: Reject a percentage of all requests evenly for simple implementation
  • Age-based: Drop requests that have already waited too long, since callers may have given up
  • Cost-based: Shed expensive operations first to free the most resources per rejection
function shouldShed(request, systemLoad) {
if (systemLoad < 0.8) return false; // No shedding needed
const priority = request.headers['x-priority'] || 'low';
if (systemLoad > 0.95) return priority !== 'critical';
if (systemLoad > 0.85) return priority === 'low';
return false;
}

Implementing Load Shedding

Key decisions when adding load shedding to a service:

  1. Pick your signal: CPU usage, memory pressure, queue depth, or request latency
  2. Set thresholds: Define when shedding begins and when it stops
  3. Return fast: Rejected requests should get an immediate 503 response with a Retry-After header
  4. Monitor the shed rate: Track how often shedding activates to guide capacity planning

When to Use Load Shedding

  • Services that experience sudden traffic spikes beyond normal scaling speed
  • Systems where partial availability beats total unavailability
  • APIs serving mixed workloads with clear priority distinctions
  • Infrastructure with hard resource ceilings that cannot auto-scale quickly