Zero-Downtime Deployments with Kubernetes Rolling Updates

Deploying new versions of an application without dropping traffic is one of the core responsibilities of an SRE. Kubernetes rolling updates make this achievable out of the box, but a few misconfigurations can still cause brief outages. This guide walks through the key settings and why they matter.

How Rolling Updates Work

When you update a Deployment, Kubernetes gradually replaces old Pods with new ones. At no point does it terminate all running Pods first — instead it respects two key parameters:

maxSurge — how many extra Pods can be created above the desired replica count during the update
maxUnavailable — how many Pods can be unavailable at any given time during the update

By default both values are set to 25%, which means for a 4-replica Deployment, Kubernetes will bring up one new Pod before terminating an old one.

Basic Deployment Configuration

Here is a minimal Deployment that demonstrates the key fields:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-api
  namespace: production
spec:
  replicas: 4
  selector:
    matchLabels:
      app: my-api
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  template:
    metadata:
      labels:
        app: my-api
    spec:
      containers:
        - name: my-api
          image: ghcr.io/allistera/my-api:1.2.0
          ports:
            - containerPort: 8080
          readinessProbe:
            httpGet:
              path: /healthz
              port: 8080
            initialDelaySeconds: 5
            periodSeconds: 5
            failureThreshold: 3
          resources:
            requests:
              cpu: 100m
              memory: 128Mi
            limits:
              cpu: 500m
              memory: 256Mi

Setting maxUnavailable: 0 guarantees that the service always has the full replica count available. The trade-off is that the rollout takes slightly longer because a new Pod must become ready before an old one is terminated.

Why Readiness Probes Are Non-Negotiable

Without a readiness probe Kubernetes has no way to know that your new container is actually serving traffic. It will mark the Pod as Ready the moment the container starts, and the old Pod is terminated immediately — before your application has finished booting.

A minimal HTTP readiness probe looks like this in practice:

readinessProbe:
  httpGet:
    path: /healthz
  port: 8080
  initialDelaySeconds: 5
  periodSeconds: 5
  failureThreshold: 3

Your /healthz endpoint should return a non-2xx status code if the application is not ready to serve traffic (for example, while a database migration is still running).

Triggering and Watching a Rollout

Once your Deployment is configured, triggering an update is as simple as changing the image tag:

kubectl set image deployment/my-api \
  my-api=ghcr.io/allistera/my-api:1.3.0 \
  -n production

You can watch the progress in real time:

kubectl rollout status deployment/my-api -n production

If something looks wrong, roll back immediately:

kubectl rollout undo deployment/my-api -n production

Kubernetes keeps the previous ReplicaSet around (controlled by revisionHistoryLimit), so the rollback is instant — no image pull required.

Pod Disruption Budgets

For critical services, pair your rolling update strategy with a PodDisruptionBudget. This protects your Pods from being evicted simultaneously during a node drain or cluster upgrade:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: my-api-pdb
  namespace: production
spec:
  minAvailable: 3
  selector:
    matchLabels:
      app: my-api

This ensures at least 3 out of 4 replicas are always available, even if a node is taken offline for maintenance.

Graceful Shutdown

The final piece of the puzzle is making sure your application handles SIGTERM correctly. When Kubernetes terminates a Pod it sends SIGTERM and then waits for terminationGracePeriodSeconds (default 30s) before force-killing with SIGKILL.

A Node.js application that drains in-flight requests before exiting:

process.on('SIGTERM', () => {
  console.log('Received SIGTERM, shutting down gracefully');

  server.close(() => {
    console.log('All connections closed, exiting');
    process.exit(0);
  });

  // Force exit if graceful shutdown takes too long
  setTimeout(() => {
    console.error('Graceful shutdown timed out, forcing exit');
    process.exit(1);
  }, 25000);
});

Set terminationGracePeriodSeconds on the Pod spec to match the maximum time your application needs to drain:

spec:
  terminationGracePeriodSeconds: 30

Summary

Achieving zero-downtime deployments with Kubernetes requires four things working together:

Rolling update strategy — maxUnavailable: 0 with maxSurge: 1 keeps the full replica count available at all times
Readiness probes — prevent traffic from reaching Pods that are not yet ready
Pod Disruption Budgets — protect against simultaneous evictions during cluster operations
Graceful shutdown — drain in-flight requests before the process exits

Get these right and your deployments become a non-event for your users.