Jesus Paz · 1 min read

Understanding Kubernetes Resource Requests vs Limits (and Why They Affect Your Bill)

Requests keep the cluster stable, limits prevent noisy neighbors, and both drive cost—here’s how to tune them.

kubernetes capacity

Requests and limits look simple, but they control scheduling, reliability, and ultimately how much you pay AWS. Let’s demystify them with a cost lens.

Requests = reserved capacity

  • Kubernetes guarantees the requested CPU/memory for each pod.
  • Cluster autoscaler scales nodes based on aggregate requests, not usage.
  • Translation: high requests → larger node footprint → higher cost.

Limits = safety rails

  • CPU limits throttle workloads when they exceed the boundary.
  • Memory limits cause OOM kills if breached.
  • Translation: overly tight limits crash apps; no limits create noisy neighbors.

Cost implications

ScenarioResult
Requests >> usage, limits equal requestsWasteful spend, thrashing autoscalers
Requests = usage, limits slightly higherBalanced utilization
No limitsPotential runaway pods impacting other workloads

ClusterCost highlights pods where requests are 2× actual usage so you can trim safely.

Tuning workflow

  1. Gather P95 usage per pod over 7–14 days.
  2. Set requests = P95 (rounded up).
  3. Set limits = requests × 1.2 (or more for bursty workloads).
  4. Automate PRs to apply the new values.

For latency-sensitive services, add SLO data before cutting requests.

Monitor continuously

  • Track CPU throttling and OOM events.
  • Watch cluster utilization: aim for 70–80% to leave failover headroom.
  • Use ClusterCost to alert when request-to-usage ratio drifts above thresholds.

Requests and limits are the knobs that turn Kubernetes from an expensive science project into a predictable platform. Tune them weekly, and your bill will reward you.***

Related reading

Join 1,000+ FinOps and platform leaders

Get Kubernetes and ECS cost tactics delivered weekly.