Jesus Paz · Oct 28, 2024 1 min read

Understanding Kubernetes Resource Requests vs Limits (and Why They Affect Your Bill)

Requests keep the cluster stable, limits prevent noisy neighbors, and both drive cost—here’s how to tune them.

kubernetes capacity

Requests and limits look simple, but they control scheduling, reliability, and ultimately how much you pay AWS. Let’s demystify them with a cost lens.

Requests = reserved capacity

Kubernetes guarantees the requested CPU/memory for each pod.
Cluster autoscaler scales nodes based on aggregate requests, not usage.
Translation: high requests → larger node footprint → higher cost.

Limits = safety rails

CPU limits throttle workloads when they exceed the boundary.
Memory limits cause OOM kills if breached.
Translation: overly tight limits crash apps; no limits create noisy neighbors.

Cost implications

Scenario	Result
Requests >> usage, limits equal requests	Wasteful spend, thrashing autoscalers
Requests = usage, limits slightly higher	Balanced utilization
No limits	Potential runaway pods impacting other workloads

ClusterCost highlights pods where requests are 2× actual usage so you can trim safely.

Tuning workflow

Gather P95 usage per pod over 7–14 days.
Set requests = P95 (rounded up).
Set limits = requests × 1.2 (or more for bursty workloads).
Automate PRs to apply the new values.

For latency-sensitive services, add SLO data before cutting requests.

Monitor continuously

Track CPU throttling and OOM events.
Watch cluster utilization: aim for 70–80% to leave failover headroom.
Use ClusterCost to alert when request-to-usage ratio drifts above thresholds.

Requests and limits are the knobs that turn Kubernetes from an expensive science project into a predictable platform. Tune them weekly, and your bill will reward you.***

Why FinOps Needs Kubernetes-Aware Cost Tools (Not Just AWS Billing Data)

Best Practices for Tagging AWS Infrastructure for Accurate Cost Allocation