The Cost of High Availability: Multi-AZ NAT Gateways
Is 99.99% uptime worth tripling your network bill? We do the math.
Spot Instances are cheap until they break your app. Here is the boilerplate code you need to handle SIGTERM and preStop hooks correctly.
In my post on Spot Instances, I warned about the “Interruption Tax.” The only way to pay that tax without going bankrupt is Graceful Shutdowns.
When AWS reclaims a Spot node, it gives you a 2-minute warning. Kubernetes translates this into a SIGTERM signal sent to your pod.
If your app ignores SIGTERM, it gets SIGKILLed 30 seconds later. In-flight requests fail. Database connections leak. Customers get 500 errors.
Here is how to fix it.
SIGTERM to your app.Step 3 and Step 4 happen at the same time. This is the problem.
Your app might receive the SIGTERM and shut down before the load balancer stops sending it traffic. Result: Dropped requests.
You need to tell your app to wait for the load balancer to update. The simplest way is a preStop hook.
lifecycle: preStop: exec: command: ["/bin/sh", "-c", "sleep 10"]This forces the pod to stay alive for 10 seconds after the termination starts, giving the load balancer time to propagate the change.
In your application code (Go example), you must catch the signal and drain connections.
sigChan := make(chan os.Signal, 1)signal.Notify(sigChan, syscall.SIGINT, syscall.SIGTERM)
<-sigChan // Wait for signallog.Println("Shutting down...")
// Stop accepting NEW requestsserver.Shutdown(ctx)
// Finish OLD requestswaitForJobsToFinish()Spot instances are only “production ready” if your app is “interruption ready.”
preStop sleep (10-15s).SIGTERM.Do this, and you can save 90% on compute without waking up at 3 AM.
Founder & CEO
Is 99.99% uptime worth tripling your network bill? We do the math.
Get Kubernetes and ECS cost tactics delivered weekly.