Load Shedding for System Protection
Load shedding is the practice of intentionally rejecting requests when a system is overwhelmed. Rather than letting every request degrade in quality or crash the entire service, load shedding sacrifices some traffic to preserve stability for the rest.
How It Works
When system resources approach critical thresholds, a load shedding layer begins rejecting incoming requests. The goal is to keep the system operating within its capacity so that accepted requests complete successfully.
Incoming traffic: 10,000 req/secSystem capacity: 6,000 req/sec
Without load shedding: → All 10,000 slow down → timeouts → cascading failure
With load shedding: → 6,000 accepted and processed normally → 4,000 rejected immediately with HTTP 503Load Shedding vs. Rate Limiting
These two mechanisms serve different purposes:
| Aspect | Rate Limiting | Load Shedding |
|---|---|---|
| Trigger | Per-client request count | Overall system resource pressure |
| Goal | Fair usage enforcement | System survival |
| Timing | Proactive - applied before overload | Reactive - kicks in during overload |
| Scope | Per user, per API key, per IP | Global across all traffic |
Rate limiting prevents any single client from consuming too many resources. Load shedding protects the entire system when aggregate demand exceeds capacity, regardless of how many clients contribute to that demand.
Strategies for Choosing What to Drop
Not all requests carry equal importance. Effective load shedding prioritizes what matters most:
- Priority-based: Assign priority levels to request types and drop the lowest tier first
- Random sampling: Reject a percentage of all requests evenly for simple implementation
- Age-based: Drop requests that have already waited too long, since callers may have given up
- Cost-based: Shed expensive operations first to free the most resources per rejection
function shouldShed(request, systemLoad) { if (systemLoad < 0.8) return false; // No shedding needed
const priority = request.headers['x-priority'] || 'low';
if (systemLoad > 0.95) return priority !== 'critical'; if (systemLoad > 0.85) return priority === 'low';
return false;}Implementing Load Shedding
Key decisions when adding load shedding to a service:
- Pick your signal: CPU usage, memory pressure, queue depth, or request latency
- Set thresholds: Define when shedding begins and when it stops
- Return fast: Rejected requests should get an immediate 503 response with a
Retry-Afterheader - Monitor the shed rate: Track how often shedding activates to guide capacity planning
When to Use Load Shedding
- Services that experience sudden traffic spikes beyond normal scaling speed
- Systems where partial availability beats total unavailability
- APIs serving mixed workloads with clear priority distinctions
- Infrastructure with hard resource ceilings that cannot auto-scale quickly