Zero-Downtime Deployments with Kubernetes Rolling Updates
Deploying new versions of an application without dropping traffic is one of the core responsibilities of an SRE. Kubernetes rolling updates make this achievable out of the box, but a few misconfigurations can still cause brief outages. This guide walks through the key settings and why they matter.
How Rolling Updates Work
When you update a Deployment, Kubernetes gradually replaces old Pods with new ones. At no point does it terminate all running Pods first — instead it respects two key parameters:
maxSurge— how many extra Pods can be created above the desired replica count during the updatemaxUnavailable— how many Pods can be unavailable at any given time during the update
By default both values are set to 25%, which means for a 4-replica Deployment, Kubernetes will bring up one new Pod before terminating an old one.
Basic Deployment Configuration
Here is a minimal Deployment that demonstrates the key fields:
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-api
namespace: production
spec:
replicas: 4
selector:
matchLabels:
app: my-api
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
template:
metadata:
labels:
app: my-api
spec:
containers:
- name: my-api
image: ghcr.io/allistera/my-api:1.2.0
ports:
- containerPort: 8080
readinessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 3
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 256Mi
Setting maxUnavailable: 0 guarantees that the service always has the full replica count available. The trade-off is that the rollout takes slightly longer because a new Pod must become ready before an old one is terminated.
Why Readiness Probes Are Non-Negotiable
Without a readiness probe Kubernetes has no way to know that your new container is actually serving traffic. It will mark the Pod as Ready the moment the container starts, and the old Pod is terminated immediately — before your application has finished booting.
A minimal HTTP readiness probe looks like this in practice:
readinessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 3
Your /healthz endpoint should return a non-2xx status code if the application is not ready to serve traffic (for example, while a database migration is still running).
Triggering and Watching a Rollout
Once your Deployment is configured, triggering an update is as simple as changing the image tag:
kubectl set image deployment/my-api \
my-api=ghcr.io/allistera/my-api:1.3.0 \
-n production
You can watch the progress in real time:
kubectl rollout status deployment/my-api -n production
If something looks wrong, roll back immediately:
kubectl rollout undo deployment/my-api -n production
Kubernetes keeps the previous ReplicaSet around (controlled by revisionHistoryLimit), so the rollback is instant — no image pull required.
Pod Disruption Budgets
For critical services, pair your rolling update strategy with a PodDisruptionBudget. This protects your Pods from being evicted simultaneously during a node drain or cluster upgrade:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: my-api-pdb
namespace: production
spec:
minAvailable: 3
selector:
matchLabels:
app: my-api
This ensures at least 3 out of 4 replicas are always available, even if a node is taken offline for maintenance.
Graceful Shutdown
The final piece of the puzzle is making sure your application handles SIGTERM correctly. When Kubernetes terminates a Pod it sends SIGTERM and then waits for terminationGracePeriodSeconds (default 30s) before force-killing with SIGKILL.
A Node.js application that drains in-flight requests before exiting:
process.on('SIGTERM', () => {
console.log('Received SIGTERM, shutting down gracefully');
server.close(() => {
console.log('All connections closed, exiting');
process.exit(0);
});
// Force exit if graceful shutdown takes too long
setTimeout(() => {
console.error('Graceful shutdown timed out, forcing exit');
process.exit(1);
}, 25000);
});
Set terminationGracePeriodSeconds on the Pod spec to match the maximum time your application needs to drain:
spec:
terminationGracePeriodSeconds: 30
Summary
Achieving zero-downtime deployments with Kubernetes requires four things working together:
- Rolling update strategy —
maxUnavailable: 0withmaxSurge: 1keeps the full replica count available at all times - Readiness probes — prevent traffic from reaching Pods that are not yet ready
- Pod Disruption Budgets — protect against simultaneous evictions during cluster operations
- Graceful shutdown — drain in-flight requests before the process exits
Get these right and your deployments become a non-event for your users.