Kubernetes CPU Limits: To Throttle or Not?

The great debate: Should you set CPU limits in Kubernetes? We dive into CFS quotas, throttling latency, and why 'Requests Only' might be the better FinOps move.

J
Jesus Paz
2 min read

In the Kubernetes world, there’s a piece of advice that gets repeated like gospel: “Always set Requests and Limits for everything.”

It sounds logical. Requests guarantee resources; Limits prevent a rogue pod from eating the whole node. Safety first, right?

But for CPU, setting limits can actually hurt your performance and waste money.

The CFS Quota Problem

Kubernetes implements CPU limits using the Linux kernel’s CFS (Completely Fair Scheduler) Bandwidth Control.

Here is how it works:

  1. You set a limit of 1000m (1 core).
  2. The kernel defines a “period” (usually 100ms).
  3. Your container gets 100ms of runtime every period.

The Trap: If your app receives a burst of traffic and tries to use 20ms of CPU in the first 5ms of the cycle, it might get throttled for the rest of that period—even if the host machine is completely idle!

This is called Micro-bursting.

  • Symptom: Your CPU usage graphs look low (averages hide bursts), but your p99 latency spikes.
  • Cause: The kernel puts your process to sleep until the next 100ms period begins.

The “Requests Only” Strategy

Many Kubernetes experts (including folks at Zalando and Buffer) now advocate for removing CPU limits entirely for latency-sensitive workloads.

The Setup:

  • Set CPU Requests: Accurately. This is crucial for the scheduler to place pods correctly.
  • Remove CPU Limits: Let the pod burst if the node has spare cycles.

Why it works:

  1. Lower Latency: No artificial throttling during micro-bursts. Your app runs as fast as the hardware allows.
  2. Better Utilization: You paid for the whole node; why let cycles go to waste if they’re available?
  3. Safety Net: If the node is under contention, the kernel’s fair scheduler ensures every pod gets at least its Request share.

The Risk: The Noisy Neighbor

The fear is that one runaway process will starve everyone else.

But remember: Requests provide a guarantee. Even without a limit, if Pod A requests 1 core and Pod B goes rogue, the kernel ensures Pod A still gets its 1 core. Pod B only eats the spare capacity.

Note: This logic applies to CPU (a compressible resource). For Memory (incompressible), you MUST set limits equal to requests, or you risk OOM kills.

Conclusion

Stop blindly setting CPU limits.

  1. Monitor: Check the container_cpu_cfs_throttled_seconds_total metric.
  2. Experiment: If you see throttling on critical services, delete the limit.
  3. Measure: Watch your p99 latency drop.

Your users don’t care about your “fairness” policy. They care about speed. Give your apps the breathing room they need.

👨‍💻

Jesus Paz

Founder & CEO

Read Next

Join 1,000+ FinOps and platform leaders

Get Kubernetes and ECS cost tactics delivered weekly.