Helm Upgrade Failed: History Before Rollback
When helm upgrade fails, the release is telling you a story — read history and revision manifests before you uninstall or rollback.
Your pipeline reports success — until someone checks the cluster:
helm status api -n prodSTATUS: failed
REVISION: 14
DESCRIPTION: Upgrade "api" failed: timed out waiting for conditionThe reflex is immediate: rollback, uninstall, delete the Deployment. All three feel like progress.
They often make recovery harder.
**Helm stores every revision.** History is your timeline — not a footnote.
What failed status actually means
failed means revision 14 never reached a healthy deployed state. Pods might still run on revision 13. Hooks might be stuck. The chart might have rendered but readiness never passed.
Before any destructive command, answer:
- Which revision is actually serving traffic?
- What changed between the last good revision and this one?
- Is the release pending on a hook Job?
helm history api -n prod
helm get manifest api -n prod --revision 14
kubectl get pods -n prod -l app.kubernetes.io/instance=api
kubectl get jobs -n prodThe decision order that works
| Symptom | First step | Why |
|---|---|---|
| `failed` after upgrade | `helm history` | Pick the right rollback target |
| `pending-upgrade` | Check hook Jobs + `helm status` | Second upgrade races the first |
| Values changed, image unchanged | `helm get manifest` vs live Deployment | Values path / subchart alias issues |
| CrashLoop after bump | Diff rev N vs N-1 manifests | Rollback only after you know N-1 was good |
Rollback is recovery — not diagnosis. helm rollback without reading revision 14 repeats blind fixes.
Traps that waste on-call time
helm uninstall first — removes release metadata and complicates re-install ownership.
Delete the Deployment — Helm still thinks it owns the release; the next upgrade fights orphaned labels.
helm upgrade --force on pending releases — stacks operations while hooks still run.
Trusting helm get notes — notes are template docs; Ingress hosts and Service names come from rendered manifests and values.
Practice the pattern
The Helm Releases path in the Decision Trainer walks through failed upgrades, pending hooks, values drift, and rollback decisions — graded on your first step, not chart trivia.
If you manage releases in CI, pair this with the Platform Pack (Helm, Kustomize, Kyverno, Argo CD) for GitOps-adjacent incidents end to end.