5 Kubernetes Gotchas That Break Production Deployments

ReleaseRun · Kubernetes 2026 5 Kubernetes Gotchas That Break Production Real mistakes, real incident post-mortems, and the fixes that actually work. Tap to start →

1 Gotcha #1 imagePullPolicy: Always slows every deploy Every Pod restart hits your registry. On spot instances with high churn this adds 30-60s per restart and can rate-limit your registry. imagePullPolicy: IfNotPresent # cached image is fine # Always = necessary only if you use :latest or mutable tags # Pin tags: my-app:1.4.0 — then IfNotPresent is safe

2 Gotcha #2 No resource limits = noisy neighbour OOMKills your app Without limits, one runaway Pod can OOMKill every other Pod on the node. Without requests, the scheduler can't plan and overcommits nodes. resources: requests: cpu: "100m" memory: "128Mi" limits: memory: "256Mi" # CPU: no limit (throttle not kill) # No CPU limit = throttled, not OOMKilled

3 Gotcha #3 livenessProbe kills app before it finishes starting livenessProbe starts checking immediately. JVM apps, apps loading ML models, apps running migrations — all get killed in CrashLoopBackOff. startupProbe: # add this! httpGet: {path: /health, port: 8080} failureThreshold: 30 # 30 × 10s = 5 minutes periodSeconds: 10 livenessProbe: # only starts after startupProbe succeeds httpGet: {path: /health, port: 8080} periodSeconds: 30

4 Gotcha #4 kubectl drain takes down all replicas at once Node upgrades and cluster maintenance drain nodes. Without a PodDisruptionBudget, kubectl drain can evict ALL pods simultaneously — instant outage. apiVersion: policy/v1 kind: PodDisruptionBudget metadata: {name: my-app} spec: minAvailable: 2 # always keep 2 pods running selector: matchLabels: {app: my-app}

5 Gotcha #5 StatefulSet PVC stuck Pending — AZ mismatch Cloud block disks (EBS, Azure Disk) are zone-specific. If PVC is created before the Pod is scheduled, it may land in the wrong AZ and the Pod can never mount it. apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: {name: gp3} provisioner: ebs.csi.aws.com volumeBindingMode: WaitForFirstConsumer # Waits until Pod is scheduled, then creates disk in same AZ

⚡ Track every Kubernetes release releaserun.com monitors Kubernetes, Node.js, Go, Python, PostgreSQL, and 13+ more technologies. Get the story behind every version bump. Kubernetes Release History →