A Kubernetes Pod Lifecycle means the complete journey of a pod from creation to deletion.

In simple words:

Pod created → scheduled to a node → containers start → app becomes ready → pod serves traffic → pod gets deleted, restarted, or crashes

A pod is not just one container. A pod is the smallest deployable unit in Kubernetes. It can contain one or more containers.

Simple Flow

pod lifecycle — complete flow

kubectl apply
↓ Pod object created in etcd
↓ Scheduler selects a node
↓ kubelet starts the pod
↓ Init containers run (one by one)
↓ Main container starts
↓ Startup probe passes
↓ Readiness probe passes
↓ Pod receives traffic
↓ Liveness probe runs continuously
↓ Pod is deleted / crashes / restarted
↓ Kubernetes handles termination or restart

Most important line in this entire article

Running does not always mean Ready.

A pod can show STATUS: Running with READY: 0/1. The container is running, but the pod is not ready to receive traffic. This single distinction causes more production incidents than anything else in Kubernetes.

Lifecycle at a Glance

The full lifecycle in one visual — from kubectl apply to pod deletion. Each step is handled by a specific Kubernetes component.

kubectl applyPod object written to etcd

Scheduler assigns nodenodeName set → PodScheduled=True

pause container startsNetwork namespace created (sandbox)

Init containers runSequential — each must exit 0

postStart hookFires immediately after container starts

Startup ProbeBlocks liveness + readiness

Readiness ProbePass → pod added to endpoints

Serving TrafficReady=True, phase=Running

Liveness ProbeContinuous — failure → restart

──── delete / HPA ────deletionTimestamp set

preStop hookRuns before SIGTERM

Endpoints removedTraffic stops arriving

SIGTERM sentGrace period begins (default 30s)

SIGKILL (if needed)If container still alive after grace period

Pod deleted from etcdObject gone — lifecycle complete

Who Does What

Every step in the lifecycle is handled by a specific component. Most engineers use kubectl without knowing which internal component actually does the work.

Component	What it does in the pod lifecycle
API Server	Accepts kubectl apply. Validates the pod spec. Writes the pod object to etcd. The source of truth for all pod state.
Scheduler	Watches for pods with no nodeName set. Selects a node based on resources, affinity, taints, and topology. Writes nodeName to the pod spec.
kubelet	The node agent. Reads the pod spec, creates the sandbox (pause container), starts init containers, then main containers. Runs probes, executes lifecycle hooks, sends SIGTERM/SIGKILL. Reports status back to the API Server.
Container Runtime (CRI)	Actually starts and stops containers. kubelet talks to it via gRPC. Common runtimes: containerd, CRI-O. Pulls images, manages container lifecycle at the OS level.
Endpoint Controller	Watches pod Ready conditions. When a pod becomes Ready, adds its IP to Service endpoints. When a pod is deleted or becomes not-Ready, removes it. This is what stops traffic before SIGTERM.
ReplicaSet Controller	Maintains the desired pod count. When HPA lowers the replica count, ReplicaSet selects which pods to delete using the age-based ranking algorithm.
HPA	Reads metrics (CPU, memory, custom). Calculates desired replica count. Updates Deployment spec.replicas. Never directly kills pods — that is ReplicaSet's job.

The most important one: kubelet. When someone says “Kubernetes starts the container” — that is kubelet. When they say “Kubernetes sends SIGTERM” — that is kubelet. Everything that happens on the node is kubelet.

Pod Phases

Kubernetes has exactly 5 real pod phases. Not more.

pod phases

1. Pending
2. Running
3. Succeeded
4. Failed
5. Unknown

The phase is stored in pod.status.phase. It is a high-level summary — not a detailed health status. Use conditions and container states for the real picture.

Pod Lifecycle — The 5 Phases and Transitions

Phase	What It Means	Common Causes
Pending	Pod accepted by API Server but not yet running	Scheduler finding a node, images being pulled, init containers running
Running	≥1 container is running, starting, or restarting	Normal running state — does NOT mean the pod is healthy or ready
Succeeded	All containers exited with code 0, will not restart	Completed Jobs, batch tasks, one-shot operations
Failed	All containers exited, at least one exited non-zero	App crash, OOMKill, exit code ≠ 0, restartPolicy: Never
Unknown	API Server cannot get pod status from the node	Network partition between API Server and kubelet, node failure

1. Pending

The pod is created but has not started running yet.

No node available (not enough CPU or memory)

Image is still being pulled

Init container is still running

PVC is not attached yet

Taint/toleration mismatch

2. Running

At least one container inside the pod has started.

Running does not mean healthy

Running does not mean ready

Running does not mean traffic is going to it

READY: 0/1 means Running but not ready

3. Succeeded

All containers completed successfully with exit code 0.

Usually seen in Jobs, CronJobs, one-time scripts, migration jobs

4. Failed

At least one container exited with a non-zero exit code.

App crashed

Wrong command or entrypoint

Missing env variable or config

OOMKilled (exit code 137)

Permission issue

5. Unknown

Kubernetes cannot get the pod status from the node.

Node is down or unreachable

kubelet is not responding

Network issue between API server and node

Important: Terminating is NOT a phase

Many people think Terminating is a pod phase. It is not. There are exactly 5 phases. Terminating means Kubernetes has started deleting the pod — the deletionTimestamp is set. The phase field still says Running while the pod is terminating. Naming six phases in an interview is an immediate red flag.

check pod phase

# Check pod status and phases
kubectl get pods -n <namespace>

# Describe pod for detailed info
kubectl describe pod <pod-name> -n <namespace>

# Check exact phase
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.status.phase}'

Pod Conditions

Pod phase tells you the high-level status. Pod conditions tell you whether the pod is actually usable.

There are 4 main pod conditions:

pod conditions

PodScheduled      → node has been assigned
Initialized       → all init containers completed
ContainersReady   → all containers are ready
Ready             → pod can receive traffic

The most important one is Ready=True. Only a ready pod receives traffic from a Service.

Pod Conditions — The 4 Health Gates

Check conditions with:

check pod conditions

kubectl describe pod <pod-name> -n <namespace>

# Look for:
# Conditions:
#   Initialized       True
#   Ready             False    ← not receiving traffic
#   ContainersReady   False
#   PodScheduled      True

# If Ready=False, the pod is not receiving traffic.
# Check why the readiness probe is failing.

Container States

Inside a pod, each container has its own state. The pod phase is a summary of container states — not the other way around.

Container states are:

container states

Waiting     → container has not started yet
Running     → container process is running
Terminated  → container has exited

Container State	What it means	Common reasons
Waiting	Container is not yet running. kubelet is preparing it.	ContainerCreating (image pulling, volume mounting), CrashLoopBackOff (restart throttle), ImagePullBackOff (pull failed), PodInitializing (init containers running)
Running	Container is executing. startedAt timestamp is set.	Normal operation. Check readiness condition separately — Running does not mean Ready.
Terminated	Container has exited. exitCode and reason are set in Last State.	Completed (exit 0), OOMKilled (exit 137), Error (exit 1), Signal (killed by a Unix signal)

Waiting — common reasons

ContainerCreating — image is being pulled or volume mounting

ImagePullBackOff — registry error or wrong image tag

CrashLoopBackOff — container keeps crashing, kubelet is throttling restarts

PodInitializing — init containers are still running

Running — important note

Container process is running. But remember: container Running does not mean pod Ready. Check the Ready condition separately.

Terminated — check these fields

Reason — Completed, Error, OOMKilled, Signal

Exit Code — 0=success, 137=OOMKilled, 1=error, 127=not found

Last State — previous container run details

Restart Count — how many times it has restarted

kubectl describe pod <pod-name> -n <namespace>

🚨 Interview Trap

Init containers run sequentially — not in parallel. Each must exit 0 before the next starts. While an init container is running, the pod phase is Pending and Initialized=False. A stuck init container (Init:CrashLoopBackOff) means the main container never starts. Engineers often debug the wrong container entirely.

Init Containers

Init containers run before the main container starts. They run one at a time, in order. Each must exit with code 0 before the next one starts. The main container only starts after all init containers have completed successfully.

init container lifecycle

Pod lifecycle with init containers:
1. Init container 1 runs → must exit 0
2. Init container 2 runs → must exit 0
3. Main container starts

If any init container fails:
→ Pod stays in Pending
→ Initialized = False
→ Main container never starts

Common use cases for init containers:

Wait for a database to be ready before the app starts

Pull secrets or TLS certificates from a vault

Copy config files into a shared volume

Run database migrations (schema changes)

Check if a dependency service is healthy

debug init containers

# Check init container status
kubectl describe pod <pod-name> -n <namespace>

# Look for:
# Init Containers:
#   wait-for-db:
#     State: Terminated (exit 0 = passed)
#     or
#     State: CrashLoopBackOff (it keeps failing)

# Read init container logs
kubectl logs <pod-name> -n <namespace> -c <init-container-name>

# If init container is failing:
# Status shows: Init:CrashLoopBackOff
# OR:           Init:0/2  (0 of 2 completed)

Common mistake

Engineers see kubectl logs <pod> returning nothing and assume the pod has no logs. But the pod is still running its init containers — the main container has not started yet. Always specify -c <container-name> to target the init container.

example: wait for DB init container

# Example init container in YAML
initContainers:
- name: wait-for-db
  image: busybox
  command: ['sh', '-c', 'until nc -z postgres-svc 5432; do sleep 2; done']

Sidecar Containers

A sidecar container runs alongside the main container in the same pod. They share the same network namespace and volumes. They talk to each other via localhost.

sidecar architecture

Pod
├── main container     (your API server)
├── sidecar: envoy     (handles traffic encryption)
└── sidecar: fluentd   (ships logs to Elasticsearch)

All three containers:
→ share the same IP address
→ communicate via localhost
→ can read/write the same mounted volumes

Common sidecar patterns:

Pattern	Examples	What it does
Service mesh proxy	Envoy (Istio), Linkerd	mTLS, traffic routing, observability
Log collector	Fluentd, Filebeat, Promtail	reads logs from shared volume, ships to storage
Secret injector	Vault Agent, cert-manager	pulls secrets/certs, refreshes them automatically
Metrics exporter	Prometheus exporter	exposes /metrics endpoint for Prometheus to scrape

Important: all containers affect pod readiness

If a sidecar has a readiness probe and it fails, the pod Ready condition becomes False — even if the main container is healthy. The pod will not receive traffic. Always check kubectl describe pod to see which container is not ready.

k8s 1.29+ native sidecar

# Kubernetes 1.29+ native sidecar support
# Add restartPolicy: Always to an init container = it becomes a sidecar
# It starts before the main container and keeps running
# For Jobs: it does NOT block the Job from completing

initContainers:
- name: log-collector
  image: fluentd:v1.16
  restartPolicy: Always   # ← this makes it a native sidecar

Probes

Kubernetes uses probes to check container health. There are 3 probe types:

probe types

1. Startup Probe   → Is the app done starting?
2. Readiness Probe → Can the app receive traffic?
3. Liveness Probe  → Is the app still alive (not stuck)?

Simple way to remember the difference:

what each probe failure does

Readiness failure  → removes traffic. Does NOT kill the container.
Liveness failure   → kills and restarts the container.
Startup failure    → kills and restarts. Blocks readiness + liveness.

Probe Types — Three Different Jobs

🚀 Startup Probe

Only during container startup

On FAIL

Container is killed and restarted

On PASS

Liveness + readiness probes take over

Use for

Slow-starting apps (JVM, ML models)

⚠ Without this, liveness can kill a healthy-but-slow container before it finishes initializing.

🚦 Readiness Probe

Continuously, while container is Running

On FAIL

Pod removed from Service endpoints — traffic stops

On PASS

Pod added back to endpoints — traffic resumes

Use for

Signal: "I am ready to accept traffic"

⚠ Failing readiness does NOT kill the container. It only removes it from routing.

❤️ Liveness Probe

Continuously, while container is Running

On FAIL

Container is killed and restarted by kubelet

On PASS

Nothing — the container keeps running

Use for

Detect deadlocks and zombie processes

⚠ Set thresholds 3× higher than readiness. Same threshold = instant restart loop.

All three probe types support: exec (run a command), httpGet (HTTP check), tcpSocket (TCP check), and grpc.

Probe	Failure Action	Pod Gets Traffic?	Container Killed?
Startup	Container killed and restarted	No — blocked until startup passes	Yes
Readiness	Pod removed from Service endpoints	No — removed until probe passes	No
Liveness	Container killed and restarted	Not if readiness also fails	Yes

🔥 Production Reality

Setting the same failureThreshold on liveness and readiness is like installing a sprinkler system that triggers from the smoke detector. The smoke detector (readiness) just removed the pod from traffic. The sprinkler (liveness) then kills it and forces a restart — which takes 30+ seconds and adds to the incident. The readiness probe already handled the situation. Liveness thresholds should be 3–5× higher than readiness so they only fire on genuinely stuck processes that readiness alone cannot recover from.

deployment.yaml — complete lifecycle configuration

# Complete production manifest — all three probes + lifecycle hooks
apiVersion: apps/v1
kind: Deployment
spec:
  template:
    spec:
      terminationGracePeriodSeconds: 60
      containers:
        - name: api
          image: my-api:2.0.0
          startupProbe:              # Runs first. Blocks readiness + liveness until it passes.
            httpGet: { path: /healthz/startup, port: 8080 }
            failureThreshold: 30     # 30 × 10s = up to 5 min to start
            periodSeconds: 10
          readinessProbe:            # Gates traffic. Removes from Service endpoints on failure.
            httpGet: { path: /healthz/ready, port: 8080 }
            initialDelaySeconds: 5
            periodSeconds: 5
            failureThreshold: 3      # 15s of bad readiness before removed from endpoints
          livenessProbe:             # Kills and restarts on failure. Set threshold MUCH higher.
            httpGet: { path: /healthz/live, port: 8080 }
            periodSeconds: 10
            failureThreshold: 9      # 3× readiness — only fires on truly stuck processes
          lifecycle:
            postStart:
              exec:
                command: ["/bin/sh", "-c", "curl -s -X POST http://consul:8500/v1/agent/service/register"]
            preStop:
              exec:
                command: ["/bin/sh", "-c", "sleep 5"]  # closes iptables propagation window
          resources:
            requests: { memory: "256Mi", cpu: "100m" }
            limits:   { memory: "512Mi", cpu: "500m" }  # set -Xmx to 75% of limit for JVM

What If Probe Is Not Configured?

Many developers skip probes. Here is what happens in each case:

No Startup Probe

What happens: Readiness and liveness probes start immediately when the container starts.

Risk: Can kill slow-starting apps before they finish initializing. If your JVM app takes 45 seconds to start and liveness starts at second 0, it kills the container before it ever becomes healthy.

Fix: Add a startup probe for apps that take > 30 seconds to start.

No Readiness Probe

What happens: Kubernetes assumes the pod is Ready immediately when the container starts.

Risk: Pod gets traffic before the app is ready to serve. This causes 502 errors during rolling deployments — the new pod gets traffic while the app is still loading config, warming cache, or connecting to the database.

Fix: Always add a readiness probe. It is the most important probe for production.

No Liveness Probe

What happens: Kubernetes never restarts a stuck or deadlocked container.

Risk: Pod stays Running indefinitely even if the app is completely frozen. Memory leak, deadlock, goroutine leak — the pod shows healthy but serves no requests. Without liveness, you need manual intervention.

Fix: Add a liveness probe with a high failureThreshold (9+). It should only fire for truly stuck processes.

Termination Hooks — preStop and SIGTERM

When a pod is deleted, Kubernetes does not kill it immediately. It follows a graceful shutdown process.

pod termination sequence

kubectl delete pod
↓ deletionTimestamp is set
↓ Pod is removed from Service endpoints (traffic stops)
↓ preStop hook runs (if configured)
↓ SIGTERM is sent to the container
↓ Grace period starts (default: 30 seconds)
↓ SIGKILL is sent if app does not stop
↓ Pod object is removed

Traffic stops before SIGTERM. This means Kubernetes first removes the pod from routing, then tells the app to shut down.

SIGTERM — polite shutdown

Kubernetes is saying: “Please stop gracefully. Finish current work. Close connections. Exit cleanly.”

Your app should handle SIGTERM. Default grace period is 30 seconds.

SIGKILL — force kill

Sent when the app does not stop within the grace period. No cleanup possible. App is terminated immediately.

preStop hook runs before SIGTERM. Common use: add a small sleep so iptables rules propagate before the app shuts down.

postStart + preStop in a deployment

# Both hooks in a single container spec
containers:
  - name: api
    image: my-api:2.0.0
    lifecycle:
      postStart:
        exec:
          # Runs right after container starts — in parallel with the app
          # Pod will NOT be marked Ready until this exits
          command: ["/bin/sh", "-c", "curl -s -X POST http://consul:8500/v1/agent/service/register -d @/config/service.json"]
      preStop:
        exec:
          # Runs BEFORE SIGTERM — pod stays alive until this exits
          # Use for cleanup: drain connections, deregister, flush queue
          command: ["/bin/sh", "-c", "sleep 5 && curl -s -X PUT http://consul:8500/v1/agent/service/deregister/my-api"]

# postStart failure kills and restarts the container
# preStop failure is ignored — SIGTERM fires anyway after grace period

⚡ Pro Tip

The most useful preStop pattern for most services is simply sleep 5. It adds a 5-second pause before SIGTERM fires — enough time for iptables rules to propagate across all nodes after the Endpoint Controller removes the pod from the Service. Without it, you get a ~1 second window of 502s during rolling updates even with perfect readiness probes. This one line closes that window.

What If a Request Takes Longer Than the Grace Period?

Default grace period is 30 seconds. SIGTERM is sent. The app has 30 seconds to finish and exit. If it does not, SIGKILL fires.

What actually happens when SIGKILL fires mid-request

HTTP request in flight→ Client gets 502 or connection reset. Request is lost.

Database write in progress→ Transaction is rolled back. Data may be partially written if not atomic.

Payment being processed→ Money may be charged but order not created. Requires manual reconciliation.

File upload in progress→ Partial file written. Storage corruption risk.

Message being published to queue→ Message may be lost or duplicated depending on ACK status.

How to fix it

Increase grace period

Set terminationGracePeriodSeconds to match your slowest request. For APIs: 60s. For batch jobs: match the job duration.

terminationGracePeriodSeconds: 60

Handle SIGTERM in the app

Stop accepting new requests immediately on SIGTERM. Wait for in-flight requests to complete. Then exit.

process.on("SIGTERM", () => server.close(() => process.exit(0)))

Add preStop sleep

preStop sleep 5 gives iptables time to stop routing new requests before SIGTERM fires. Reduces new requests arriving during shutdown.

preStop: exec: command: ["sleep", "5"]

Set request timeout

Never let requests run longer than the grace period. If a request can take 5 minutes, the grace period must be > 5 minutes.

Set max request timeout < terminationGracePeriodSeconds

RestartPolicy — Always, OnFailure, Never

restartPolicy controls what Kubernetes does when a container exits. It applies to all containers in the pod.

Always

Container restarts on any exit — whether exit 0 (success) or non-zero (failure). This is the default for Deployments.

Use for: Long-running services: APIs, workers, databases

If the container keeps crashing, this leads to CrashLoopBackOff.

OnFailure

Container restarts only if exit code is non-zero. If exit 0 (success), container stays terminated.

Use for: Batch jobs, scripts that run once and finish

Default for Jobs. Pod goes to Succeeded when all containers exit 0.

Never

Container is never restarted. Pod goes to Failed if any container exits non-zero. Pod goes to Succeeded if all containers exit 0.

Use for: One-shot tasks, Jobs where the Job controller handles retries

Use with Jobs that have backoffLimit to control retry behavior at the Job level.

restart policy commands

# Check a pod's restart policy
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.restartPolicy}'

# Deployment default
spec:
  restartPolicy: Always   # always restart on any exit

# Job default
spec:
  restartPolicy: OnFailure  # restart only on non-zero exit

🚨 Interview Trap

restartPolicy: Always does not mean the pod restarts forever instantly. Each restart is throttled by the CrashLoopBackOff back-off timer (10s → 20s → 40s → up to 5 minutes). The restart policy just defines when to restart — the back-off timer defines how fast.

CrashLoopBackOff Explained

CrashLoopBackOff is a Kubernetes pod status. It means: the container starts, crashes or exits, Kubernetes restarts it — and after repeated failures, Kubernetes waits longer and longer before each retry.

The name is three words joined together. Each word tells you exactly what is happening:

Crash

Your container or process failed and exited.

Loop

Kubernetes keeps restarting it — over and over. It is not stuck. It is in a loop.

BackOff

Kubernetes delays the next restart to avoid retrying too fast. Each wait is longer than the last.

What actually happens, step by step:

1
Pod starts
2
App crashes after a few seconds
3
Kubernetes restarts it immediately
4
App crashes again
5
Kubernetes restarts again — but now waits 10 seconds first
6
App crashes again → waits 20 seconds
7
App crashes again → waits 40 seconds, then 80s, then 160s, then 300s (cap)
8
Status becomes CrashLoopBackOff during each wait window

Crash #	Wait Before Next Restart	Status You See
1st	10 seconds	CrashLoopBackOff
2nd	20 seconds	CrashLoopBackOff
3rd	40 seconds	CrashLoopBackOff
4th	80 seconds	CrashLoopBackOff
5th	160 seconds	CrashLoopBackOff
6th+	300 seconds (max)	CrashLoopBackOff — stays here

The back-off resets after the container stays running for 10 consecutive minutes. If it crashes again after that, the timer starts fresh from 10 seconds.

Common reasons for CrashLoopBackOff

1.App code error — unhandled exception, panic, or crash on startup
2.Missing environment variable or secret — app checks for it and exits when not found
3.Wrong command or entrypoint — process runs to completion and exits with code 0
4.Database / Redis / API not reachable — app fails to connect on startup and exits
5.Permission issue — can't write to a file, can't bind to a port
6.App exits normally because it has no long-running process (script instead of server)
7.Out of memory — Linux kernel kills the process (OOMKilled, exit code 137)
8.Bad config after deployment — wrong value in ConfigMap or Secret causes startup failure

Commands to find the reason

Check pod status and restart count:

kubectl get pods -n <namespace>

See events and detailed state:

kubectl describe pod <pod-name> -n <namespace>

Check current logs (if the container is still running):

kubectl logs <pod-name> -n <namespace>

Most important — check the previous crashed container's logs:

kubectl logs <pod-name> -n <namespace> --previous

This shows what the container printed before it crashed. 80% of the time, the error is here.

Check the exact exit code and reason:

kubectl describe pod <pod-name> -n <namespace> | grep -A20 "Last State"

Exit code 137 = OOMKilled. Exit code 1 = app error. Exit code 0 = wrong entrypoint. Exit code 127 = binary not found.

When you see RESTARTS: 1

It means the container crashed or was killed once, and Kubernetes restarted it. To know the exact reason, run:

kubectl logs <pod-name> -n railsapp --previouskubectl describe pod <pod-name> -n railsapp

The key thing to remember

CrashLoopBackOff is not the actual error. It is Kubernetes telling you: “I keep trying to run this container, but it keeps failing.” The real error is inside the container.kubectl logs --previous is where you find it.

🚨 Interview Trap

“To fix CrashLoopBackOff, delete the pod and let it restart fresh.” The ReplicaSet Controller creates an identical replacement with the same broken spec. The new pod runs the same broken container — CrashLoopBackOff starts again, this time with a fresh back-off timer. You bought yourself 10 seconds of watching it fail faster. Fix: kubectl logs --previous, find the root cause, fix the image or config, redeploy.

OOMKilled

OOMKilled means the container used more memory than its allowed limit. The Linux kernel killed the process. Exit code is always 137.

what OOMKilled means

# Example:
Memory limit: 512Mi
App tries to use: 700Mi
Linux kernel kills the process → Kubernetes shows: OOMKilled
Exit code: 137

Check it with:

diagnosing OOMKilled

kubectl describe pod <pod-name> -n <namespace>

# Look for:
# Last State:
#   Reason:     OOMKilled
#   Exit Code:  137

check memory usage

# Check current memory usage
kubectl top pod <pod-name> -n <namespace>

Fix options

Increase memory limit in the Deployment

Fix the memory leak in the application

For JVM: set -Xmx to ~75% of the container limit

Reduce app memory usage

ImagePullBackOff

ImagePullBackOff means Kubernetes cannot pull the container image.

Common reasons

Wrong image name or tag

Image does not exist in the registry

Private registry secret missing or wrong

Docker Hub rate limit hit

Registry network issue

diagnosing ImagePullBackOff

# Check exact error in Events section
kubectl describe pod <pod-name> -n <namespace>

# Look at the Events: section at the bottom
# Failed to pull image "my-app:v2.0": rpc error...

Resource Requests vs Limits — CPU Throttling vs OOMKilled

Every container has two resource settings. They do completely different things.

Setting	What it does	What happens if exceeded
requests	Minimum guaranteed resources. Scheduler uses this to pick a node with enough capacity.	Nothing. Requests are a promise to the scheduler, not a runtime enforcement.
limits (CPU)	Maximum CPU allowed. Enforced by the kernel cgroup.	Container is CPU-throttled — slows down but keeps running. No kill. No restart.
limits (Memory)	Maximum memory allowed. Enforced by the Linux OOM killer.	Container is OOMKilled (exit code 137). Pod restarts.

resource requests and limits

resources:
  requests:
    memory: "256Mi"   # scheduler: find a node with 256Mi free
    cpu: "100m"       # scheduler: find a node with 100 millicores free
  limits:
    memory: "512Mi"   # OOMKilled if app uses more than 512Mi
    cpu: "500m"       # throttled (slowed) if app uses more than 500 millicores

CPU throttling is silent

CPU throttling does not show up in kubectl get pods. Pod shows Running. Logs look normal. But the app is running slow because the kernel is pausing the container to enforce the CPU limit. You see it as increased latency — not as errors.

JVM rule of thumb

Set -Xmx to 75% of the container memory limit. The JVM needs ~25% overhead for non-heap memory (GC metadata, JIT compiled code, thread stacks, class data). Setting -Xmx equal to the limit = OOMKilled under GC pressure.

diagnosing resource issues

# Check current resource usage
kubectl top pod <pod-name> -n <namespace>
kubectl top pod <pod-name> -n <namespace> --containers

# Check if CPU is throttled (on the node directly)
# cat /sys/fs/cgroup/cpu/kubepods/.../cpu.stat
# Look for: nr_throttled > 0

QoS Classes — Who Gets Evicted First

Kubernetes assigns every pod a Quality of Service (QoS) class based on its resource configuration. When a node runs out of memory, kubelet evicts pods in this order: BestEffort first, Guaranteed last.

BestEffort

When assigned: No requests or limits set on any container.

Risk: Evicted FIRST during node memory pressure.

Advice: Never use in production for critical workloads.

Burstable

When assigned: At least one container has requests or limits. requests ≠ limits.

Risk: Evicted SECOND — after BestEffort, before Guaranteed.

Advice: Most production pods fall into this class.

Guaranteed

When assigned: ALL containers have both CPU and memory requests AND limits set. requests == limits exactly.

Risk: Evicted LAST. Most protected during node pressure.

Advice: Use for critical services: databases, primary API servers.

QoS class commands

# Check a pod's QoS class
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.status.qosClass}'
# Output: BestEffort | Burstable | Guaranteed

# Guaranteed example (requests == limits for ALL containers)
resources:
  requests:
    memory: "512Mi"
    cpu: "500m"
  limits:
    memory: "512Mi"   # same as requests
    cpu: "500m"       # same as requests

Pod Eviction — When the Node Runs Out of Resources

Eviction happens when a node runs low on resources. kubelet watches node pressure and evicts pods to free up capacity.

eviction triggers and order

Eviction triggers:
  MemoryPressure  → node memory is running low
  DiskPressure    → node disk is filling up
  PIDPressure     → too many processes on the node

Eviction order:
  1. BestEffort pods (no requests/limits) → evicted first
  2. Burstable pods (requests < limits) → evicted second
  3. Guaranteed pods (requests == limits) → evicted last

What happens when a pod is evicted:

eviction sequence

1. kubelet terminates the pod immediately (hard eviction = no grace period)
2. Pod object stays in etcd with STATUS: Evicted
3. ReplicaSet notices the pod is gone
4. ReplicaSet creates a new pod — on a DIFFERENT node
5. The evicted pod object stays until you clean it up manually

Evicted pods do NOT restart on the same node

An evicted pod is gone. The ReplicaSet creates a replacement pod — but it is a new pod, on a different node. The evicted pod object stays in the cluster with STATUS: Evicted until you clean it up.

diagnosing and cleaning up eviction

# Check if a pod was evicted
kubectl describe pod <pod-name> -n <namespace>
# Look for: Reason: Evicted
#           Message: The node was low on resource: memory

# Check node pressure conditions
kubectl describe node <node-name>
# Look for Conditions: MemoryPressure=True, DiskPressure=True

# Find all evicted pods
kubectl get pods -n <namespace> --field-selector=status.phase==Failed

# Clean up evicted pods
kubectl get pods -n <namespace> -o name   | xargs kubectl delete -n <namespace> --field-selector=status.phase==Failed

Pod Termination — The Complete Sequence

The sequence matters. Step 2 (Endpoint Controller removes the pod from Service routing) happens in parallel with step 3 (preStop hook). By the time SIGTERM is sent in step 4, new traffic has already stopped arriving at the pod. Any in-flight requests from before step 2 can still be completing — which is why the grace period (step 5) exists. The grace period is the window for those in-flight requests to finish.

🔥 Production Reality

There is a race condition that every team hits exactly once. You add a readiness probe, you test zero-downtime deploys, everything works. Then at peak traffic you see a brief spike of 502s during every rolling update. The cause: iptables rules on nodes are eventually consistent. After the Endpoint Controller removes the pod, there is a ~1–2 second window where old iptables rules on some nodes still route traffic to the terminating pod. The preStop hook sleep 5 closes this window by adding a delay before SIGTERM — giving iptables rules time to propagate everywhere first.

🚨 Interview Trap

“Kubernetes removes a pod from the Service and then sends SIGTERM. So existing requests are safe — they finish before the pod is deleted.” Partially correct. Existing requests can finish, but only if they complete beforeterminationGracePeriodSeconds expires. A payment taking 10 seconds on a 30-second grace period is fine. A video upload taking 2 minutes on a 30-second grace period gets SIGKILL at second 30, and the client gets a 502. “Zero-downtime” only works when request duration is less than the grace period.

HPA Scale-Down — Which Pod Dies First?

HPA means Horizontal Pod Autoscaler. It automatically scales the number of pods based on CPU, memory, or custom metrics.

Important: HPA does not directly kill pods. HPA only changes the desired replica count.

HPA scale-down flow

HPA detects: CPU usage is low
→ HPA updates spec.replicas from 10 to 6
→ ReplicaSet Controller decides which 4 pods to delete
→ kubelet terminates them

Scale-down priority — who dies first:

HPA scale-down order

1. Unscheduled pod (never ran)     → dies first
2. Not Ready pod (probe failing)   → dies second
3. Newest Running pod              → dies third
4. Older Running pod               → dies later
5. Oldest Running pod              → survives longest

Common wrong answers: “the pod with highest CPU”, “the oldest pod”, “random”. None are correct. Among healthy running pods, the newest pod dies first.

Why? Older pods have warmer caches and stable connections. They have proven they can handle load. Newer pods are less proven. Kubernetes keeps the more stable ones.

HPA Scale-Down — Which Pod Gets Terminated First

The key insight: among healthy Running pods, the newest pod dies first.Not the oldest. Not the busiest. The one that has been running for the shortest time.

The reasoning: older pods have warmer caches, more established connections, and have proven they can survive under load. A pod that has been running for 2 hours has passed every health check, survived every traffic spike, and built up a warm state. A pod that has been running for 3 minutes is untested by comparison. When you have to kill one, you kill the least proven one.

🚨 Interview Trap

“HPA kills pods based on CPU usage — it removes the most resource-intensive ones first.” No. HPA makes the scale-down decision based on metrics (CPU, memory, custom). But the ReplicaSet Controller makes the termination-order decision based on pod age and health — not on live metrics at the moment of deletion. The pod selected for deletion is determined by the ranking algorithm above, not by which pod is using the most CPU. This distinction trips up even senior engineers in interviews.

⚡ Pro Tip

You can override the default termination order using the controller.kubernetes.io/pod-deletion-cost annotation. A pod with a lower deletion cost is preferred for termination. Set a high deletion cost on pods you want to protect during scale-down — for example, a pod that holds a long-running batch job or a warm ML model in memory.

HPA vs StatefulSet vs DaemonSet — who dies first?

Scale-down termination order is workload-specific. The “newest dies first” rule applies only to Deployments managed by a ReplicaSet. Other workloads have their own rules:

Workload	Scale-down order	Key implication
Deployment (via ReplicaSet)	Unscheduled → not-Ready → newest Running	Oldest pod is safest — handle SIGTERM in all pods
StatefulSet	Highest ordinal first (pod-2, pod-1, pod-0)	Predictable — pod-0 is the last to go, useful for primary replicas
DaemonSet	Pod removed when node is removed	No scale-down; one pod per node, always
Job / CronJob	Runs to completion — pods not scaled down	HPA cannot target Jobs directly

🚨 Interview Trap

“Can HPA violate a PodDisruptionBudget?” No. A PodDisruptionBudget (PDB) sets the minimum number of pods that must remain available during any voluntary disruption — rolling updates, node drains, or HPA scale-down. The ReplicaSet Controller checks the PDB before deleting a pod. If deleting the pod would drop available pods below minAvailable, the deletion is blocked until enough pods are healthy. HPA never bypasses this. The PDB is the circuit breaker that prevents scale-down from taking your entire service offline.

Rolling Update Mechanics — maxSurge and maxUnavailable

When you update a Deployment, Kubernetes does not restart all pods at once. It rolls them over in batches. Two fields control how aggressive or how safe that rollout is.

Field	What it controls	Default
maxSurge	How many extra pods above the desired count can exist during the rollout	25%
maxUnavailable	How many pods can be unavailable (not Ready) during the rollout	25%

rolling update sequence

# Example: 4 replicas, maxSurge: 1, maxUnavailable: 1
# At most 5 pods can exist (4 + 1 surge)
# At most 1 pod can be unavailable — so at least 3 are always Ready

Rolling update sequence:
1. Create 1 new pod (surge) → total: 5 pods (4 old + 1 new)
2. Wait for new pod to pass readiness probe → it becomes Ready
3. Terminate 1 old pod → total: 4 pods (3 old + 1 new)
4. Create 1 new pod → total: 5 pods (3 old + 2 new)
5. Wait for new pod to be Ready
6. Terminate 1 old pod → total: 4 pods (2 old + 2 new)
7. Repeat until all old pods are replaced

Safe rollout (zero downtime)

maxSurge: 1

maxUnavailable: 0

Always create before destroying. Rollout is slower but never reduces capacity.

Fast rollout (accepts brief capacity reduction)

maxSurge: 0

maxUnavailable: 1

Destroy first, then create. Uses fewer resources but briefly reduces capacity.

rolling update config and commands

strategy:
  type: RollingUpdate
  rollingUpdate:
    maxSurge: 1
    maxUnavailable: 0

# Monitor rollout progress
kubectl rollout status deployment/<name> -n <namespace>

# Rollback to previous version
kubectl rollout undo deployment/<name> -n <namespace>

# Rollback to a specific revision
kubectl rollout history deployment/<name> -n <namespace>
kubectl rollout undo deployment/<name> --to-revision=2 -n <namespace>

🚨 Interview Trap

A rolling update only succeeds in achieving zero downtime if the readiness probe is configured correctly. If the new pod never passes its readiness probe, the rollout stalls — old pods stay alive (because maxUnavailable prevents destroying them) and the new pods pile up (up to maxSurge limit). The rollout hangs until the probe passes or you roll back. Readiness probes are not optional for zero-downtime deployments.

⚡ Pro Tip

Stuck rollout? kubectl rollout status will hang without an error. Check kubectl get pods — if new pods show Running but READY column shows 0/1, the readiness probe is failing. Run kubectl describe pod <new-pod> and look at Conditions and Events to find why.

Deployment vs StatefulSet vs DaemonSet — What Is the Difference?

These are three different Kubernetes workload types. Each one manages pods differently, and each one has different behavior when pods are deleted or the cluster scales.

Deployment (Stateless)

What it is: Pods are identical and interchangeable. Any pod can handle any request. They have no individual identity.

Use for: API servers, web apps, background workers, microservices

Scale-down order: Newest pod dies first. Oldest pod survives longest.

Examples: nginx, express, fastapi, rails, spring boot

Key behavior: If a pod crashes, Deployment creates a new identical one anywhere.

StatefulSet (Stateful)

What it is: Each pod has a unique identity: pod-0, pod-1, pod-2. They have stable network names, stable storage, and a defined startup order.

Use for: Databases, message queues, search engines, anything with state

Scale-down order: Highest ordinal dies first: pod-2 → pod-1 → pod-0. Pod-0 is the last to go (usually the primary).

Examples: MySQL, PostgreSQL, MongoDB, Kafka, Elasticsearch, Redis Cluster

Key behavior: If a pod crashes, StatefulSet recreates it with the SAME name and same storage.

DaemonSet

What it is: One pod per node, always. When a new node joins the cluster, DaemonSet automatically creates a pod on it. When a node leaves, the pod is gone.

Use for: Node-level agents that must run everywhere

Scale-down order: Cannot be scaled manually. Pod count = node count. Remove the node, the pod disappears.

Examples: Log collectors (Fluentd, Filebeat), monitoring agents (Datadog, Prometheus node-exporter), CNI plugins, kube-proxy

Key behavior: No HPA. No replica count. It just follows the nodes.

workload type comparison

Deployment  → pods are identical. Any pod handles any request.
StatefulSet → pods have identity (pod-0, pod-1). Order matters.
DaemonSet   → one pod per node. Follows nodes, not replica count.

Scale-down:
  Deployment  → newest pod dies first
  StatefulSet → pod-2 dies first, pod-0 dies last
  DaemonSet   → no scale-down. Remove node = remove pod.

Running but Not Ready — Why Traffic Stops

SaaS company, Tuesday 10 AM

A new version of the API deployed at 9:45 AM. Rolling update. 6 replicas. By 10:00 AM, 4 new pods were Running. The team marked the release as successful and closed the PR.

At 10:12 AM, a customer filed a support ticket. Their requests were intermittently failing — roughly 1 in 3 requests. Not all of them. One in three.

kubectl get pods showed 6 Running pods. No CrashLoopBackOff. No OOMKilled. Clean.

At 10:28 AM, someone noticed the readiness probe. The new version had a dependency on a feature flag service that was responding slowly. The readiness probe used a 1-second timeout. The feature flag service was responding in 1.4 seconds. Two of the 6 new pods were Running but not Ready — their readiness probe was consistently timing out.

The kube-proxy rules had not fully updated yet. The Endpoint Controller had removed the two not-Ready pods from the Service endpoints — but the iptables rules on some nodes still referenced them due to eventual consistency. Those stale rules were routing 1 in 3 requests to pods that were not accepting traffic.

Fix: increase the probe timeout. 4 minutes to deploy, 43 minutes of degraded service, because kubectl get pods said Running but did not say not-Ready. Phase hid the condition. The team was looking at the wrong signal.

Troubleshooting Flowchart

When a pod is broken, this is the decision tree. Start at the top.

kubectl get pods → what is the STATUS?

Pending

→ kubectl describe pod → check Events section

ImagePullBackOff / ErrImagePull

→ wrong image tag, private registry, rate limit. Fix: check imagePullSecrets.

Unschedulable

→ no node has enough CPU/memory, or taints/affinity mismatch. Fix: check node capacity.

Init:CrashLoopBackOff

→ init container failing. Fix: kubectl logs <pod> -c <init-container-name>

Running — READY column shows 0/N

→ readiness probe failing

→ kubectl describe pod → Conditions: Ready=False, ContainersReady=False

→ curl the probe endpoint manually from inside the pod

dependency unreachable / timeout too short / app still initializing

Running — pod keeps restarting (CrashLoopBackOff)

→ kubectl logs <pod> --previous → read the crash reason

Exit code 137 = OOMKilled → increase memory limit

Exit code 1 = app error → read logs, fix the app

Exit code 143 = SIGTERM received → app is not handling shutdown

Terminating (stuck)

→ preStop hook or finalizer is blocking

kubectl delete pod --grace-period=0 --force (skips graceful shutdown)

Why is my pod stuck in Terminating? — Finalizers

A pod stuck in Terminating forever (even after the grace period expires) is almost always caused by a finalizer.

A finalizer is a key in metadata.finalizers. Kubernetes will not delete the pod object from etcd until every finalizer in the list is removed. The controller that owns the finalizer is responsible for removing it after doing its cleanup work. If that controller is broken, gone, or the cleanup never completes — the pod hangs in Terminating forever.

diagnose and fix stuck Terminating pod

# Check for finalizers
kubectl get pod <pod-name> -n <namespace> -o yaml | grep -A5 finalizers

# Example output:
# metadata:
#   finalizers:
#   - example.com/my-finalizer   ← pod cannot delete until this is removed

# Force-remove all finalizers (use when the controller is gone)
kubectl patch pod <pod-name> -n <namespace>   -p '{"metadata":{"finalizers":[]}}' --type=merge

# Last resort: force delete without grace period
kubectl delete pod <pod-name> -n <namespace> --grace-period=0 --force

Warning: force delete can leave orphaned resources

--grace-period=0 --force removes the pod object from Kubernetes but does NOT guarantee the actual process on the node has stopped. If the node is unreachable, the container may still be running. Only force-delete when you are certain the node is healthy and the controller that owns the finalizer is permanently gone.

The #1 question: pod keeps restarting but logs show nothing

This is the most common production interview scenario. The container is in CrashLoopBackOff. kubectl logs --previous returns nothing or a blank screen. Most engineers are stuck here. Here is the actual debugging path:

pod restarting with no logs — debugging flow

# Step 1: read the exit reason, not the logs
kubectl describe pod my-pod

# Look for this block under Containers → Last State:
#   Last State:     Terminated
#     Reason:       OOMKilled   ← or: Error, Signal, StartError
#     Exit Code:    137         ← 137=SIGKILL(OOM), 1=app error, 143=SIGTERM, 126/127=bad command
#     Started:      Mon, 24 Jun 2026 04:01:00 +0000
#     Finished:     Mon, 24 Jun 2026 04:01:00 +0000  ← same second = died instantly on startup

# If Started and Finished are the same second:
#   → container never ran your app — check the ENTRYPOINT/CMD
#   → check: is the binary missing? wrong file permissions? SecurityContext blocking exec?

# Step 2: check if a startupProbe is killing it before it writes logs
#   startupProbe with failureThreshold too low = container killed before app initializes

# Step 3: check SecurityContext
#   runAsNonRoot: true + image running as root → container exits immediately, no logs

# Step 4: check node-level events
kubectl get events --sort-by='.lastTimestamp' -n my-namespace

🚨 Interview Trap

“If there are no logs, the app did not crash.” The opposite is true. No logs means the container died before the app wrote anything. The crash happened at the OS or runtime level — OOMKilled, bad entrypoint, missing binary, SecurityContext violation — before the application's logging system initialized. Last State: Exit Code in kubectl describe pod tells you what actually killed it. Start there, not with logs.

ImagePullBackOff vs ErrImagePull

These look like two different errors. They are the same error at different points in time.

Status	What it means
ErrImagePull	kubelet tried to pull the image — first attempt failed
ImagePullBackOff	kubelet is retrying with exponential back-off. Same root cause, later retry.

Root causes, in order of frequency: wrong image tag (typo or non-existent version), private registry with no imagePullSecret, Docker Hub rate limit (anonymous pull = 100/6h), network issue between node and registry. Fix: kubectl describe pod → Events section → exact error message.

⚡ Pro Tip

When the production pod has no shell (distroless or scratch image) and you need to debug it live, use ephemeral containers: kubectl debug -it my-pod --image=busybox:1.28 --target=api. This attaches a temporary debugging container to the running pod — same network namespace, same process namespace — without restarting it. No kubectl exec, no rebuild, no redeployment. Available since Kubernetes v1.23 (stable).

The Wall of Shame

Six mistakes. All extremely common. All made by engineers who understood the theory but missed the detail that matters at 3 AM.

1. Using the same probe endpoint for readiness and liveness

“A smoke detector that also burns the building down when it triggers. If the app is slow and readiness removes it from traffic — exactly as designed — the liveness probe then kills the container, triggering a restart that causes 30+ seconds of downtime. The readiness probe was handling it. The liveness probe panicked.”

What happens: Unnecessary container restarts during transient load spikes. Readiness removes the pod from traffic correctly — liveness then kills it before it recovers.

Fix: Liveness failureThreshold must be 3–5× higher than readiness. Different endpoints, different tolerances, different jobs.

2. Not handling SIGTERM in the application

“Kubernetes sends a polite knock at the door — SIGTERM. The apartment ignores it. After 30 seconds, the building manager kicks the door down — SIGKILL. Whatever was happening inside gets destroyed mid-sentence. Every database write, every in-flight request, every open connection: gone.”

What happens: In-flight requests get 502s. Database transactions roll back. Payment writes are half-committed. The fintech disaster above.

Fix: Add a SIGTERM handler to every service. Drain connections, finish writes, exit cleanly within the grace period.

3. No preStop hook during rolling updates

“Removing the pod from the address book and kicking it out happen at the same time, but iptables rules update on a 1–2 second delay across nodes. Like cancelling your hotel reservation but the front desk on the second floor still has the old key list for 2 seconds after checkout. A few unlucky guests get directed to your old room.”

What happens: 1–2 second window of 502s during rolling updates even with readiness probes correctly configured.

Fix: Add preStop: exec: command: [“sleep”, “5”] to give iptables rules time to propagate before SIGTERM is sent.

4. Assuming kubectl get pods shows the full health picture

“Reading only the STATUS column of kubectl get pods is like checking whether the restaurant is open without looking at the menu, the wait time, or the health inspection score. Running/Running/Running — everything is fine. Except two of them have been not-Ready for 20 minutes and are not serving a single request.”

What happens: Running pods that are not Ready silently drop all traffic routed to them. The STATUS column does not show conditions.

Fix: Use kubectl get pods -o wide and check READY column. Or kubectl describe pod and read the Conditions section.

5. Setting memory limit equal to JVM heap size

“The JVM is not a hotel room. It is a hotel room plus a lobby, two stairwells, and a service elevator. Setting -Xmx to the container limit and wondering why you get OOMKilled is asking the guest to occupy every square metre of the building — while expecting the building to still have hallways.”

What happens: OOMKilled immediately or under load. Exit code 137. Restart. OOMKilled again. CrashLoopBackOff.

Fix: Set -Xmx to 75% of the container memory limit. JVM needs 25% overhead for non-heap memory.

6. Expecting HPA to kill the oldest pod to free the most resources

“HPA is not a gardener pruning the oldest branch. It is a building manager following tenant seniority rules — the newest, least-established tenant goes first. Your oldest pod, the one with the warm cache and 500 open connections, is the most protected one. The pod you deployed 3 minutes ago is the first to go. Build your SIGTERM handlers accordingly.”

What happens: Newest pods die. If they hold any in-flight state with no SIGTERM handler, that state is lost. The opening incident.

Fix: Handle SIGTERM in every service. Do not assume any pod will stay alive during scale-down — assume any pod can die at any time.

🔒 Pod Security

Every Kubernetes security field exists to break one specific link in an attack chain — from RCE vulnerability to root shell to API token theft to full cluster compromise. The key fields:runAsNonRoot: true,readOnlyRootFilesystem: true,allowPrivilegeEscalation: false,capabilities: drop: [ALL], andautomountServiceAccountToken: false. For the full breakdown — hardened manifests, NetworkPolicy, and the complete attack chain — see Kubernetes Security Best Practices.

Production Troubleshooting Scenarios

Every scenario below has the same first move: kubectl describe pod or kubectl describe node. The Events section tells you what Kubernetes actually tried to do — it is almost always more informative than the STATUS column.

Scenario	STATUS you see	First command	Common root cause
Pod not scheduled	Pending	kubectl describe pod	Insufficient CPU/memory, taint mismatch, PVC unbound
Image won't pull	ImagePullBackOff	kubectl describe pod → Events	Wrong tag, missing imagePullSecret, registry rate limit
Volume not mounting	ContainerCreating	kubectl describe pod → Events	PVC not bound, secret/configmap missing, CNI issue
Startup crash loop	CrashLoopBackOff	kubectl logs --previous	OOMKilled (exit 137), config error, bad entrypoint
Crash — no logs	CrashLoopBackOff	kubectl describe pod → Last State	Container dies before writing logs — check exit code + reason
App healthy, no traffic	Running (0/N ready)	kubectl describe pod → Conditions	Readiness probe failing: dependency down, timeout too short
Service skipping pods	Running (N/N ready)	kubectl get endpoints <svc>	Selector mismatch, port/targetPort wrong
Memory exhausted	OOMKilled → restart	kubectl describe pod → Last State	limit too low, JVM -Xmx at 100% of limit, memory leak
Node pressure eviction	Evicted	kubectl describe pod	Node disk/memory pressure — pod never comes back unless rescheduled
Pod stuck deleting	Terminating	kubectl get pod -o yaml	Finalizer not cleared, preStop hook hung, node unreachable
Node unreachable	Unknown	kubectl describe node	kubelet lost contact — pods stay Unknown until eviction timeout

Interview Corner

Questions You Should Be Able to Answer at Any Level

Q: A pod shows Running but is not receiving traffic. What do you check first?

kubectl describe pod <name> → Conditions section. If Ready=False, the readiness probe is failing. Check the probe endpoint, timeout, and any dependencies (DB, cache, feature flags) the probe hits.

Q: What is the exact sequence of events when you run kubectl delete pod?

deletionTimestamp set → Endpoint Controller removes pod from Service → preStop hook → SIGTERM → terminationGracePeriodSeconds countdown → SIGKILL if still running → pod removed from etcd. Key: traffic stops before SIGTERM is sent.

Q: HPA scales from 10 to 6 pods. All pods are healthy. Which 4 die?

The 4 newest. ReplicaSet Controller ranks by age — youngest first. Not CPU, not memory, not request count. Age only. Older pods have warmer state and established connections.

Q: What is the difference between CrashLoopBackOff and OOMKilled?

OOMKilled is the kill reason (memory limit exceeded, exit code 137). CrashLoopBackOff is the restart throttle that follows — back-off starts at 10s, doubles each time, caps at 5 minutes. OOMKilled is the cause; CrashLoopBackOff is the symptom.

Q: How do you protect a pod from HPA scale-down?

Set controller.kubernetes.io/pod-deletion-cost annotation. Lower cost = deleted first. Higher cost = survives longer. Useful for pods holding warm caches, batch jobs, or ML models in memory.

🔴 Staff-Level Scenario — The One That Separates Senior from Principal

Q: A pod is restarting every 3 seconds. kubectl logs --previous returns nothing. kubectl describe pod shows no useful events. How do you debug it?

The answer is a 6-level escalating investigation. Stop at the level where you find the cause.

Level 1 — Read exit clues, not logs

No logs does not mean no data. kubectl get pod my-pod -o json | jq '.status.containerStatuses[].lastState' gives you: exitCode (137=OOMKilled, 0=exited cleanly/wrong CMD, 1=runtime error, 126/127=bad entrypoint), reason, and the startedAt/finishedAt timestamps. If they are the same second — the container died before writing a single byte of output. The crash happened at the OS level, not the application level.

Level 2 — The 3-second timing is a clue

CrashLoopBackOff back-off starts at 10 seconds. If it is restarting every 3 seconds, either the back-off has not escalated yet (first few crashes) or a probe is killing it on a 3-second cycle. Check: kubectl get pod my-pod -o yaml | grep -A 10 startupProbe. A startupProbe with periodSeconds: 3 and failureThreshold: 1 kills the container after one failed check — before it writes any logs.

Level 3 — SecurityContext as a silent killer

Four SecurityContext settings kill the container with zero logs: (1) runAsNonRoot: true when the image runs as root → immediate exit; (2) readOnlyRootFilesystem: true when the app writes to /tmp on startup → crash before first log line; (3) capabilities: drop: [ALL] when the app needs NET_BIND_SERVICE to listen on port 80; (4) a seccomp profile blocking a syscall the app uses → SIGSYS, silent kill. Check: kubectl get pod my-pod -o yaml | grep -A 15 securityContext.

Level 4 — Check the kernel OOM killer on the node

The Linux kernel kills processes when it runs out of memory. This happens at the OS level, before kubelet detects it. It does not always appear in kubectl describe — but it always appears in dmesg. SSH to the node and run: dmesg | grep -i "killed process\|out of memory\|oom". You will see the exact process name, PID, and the memory state at the moment of death. This is the kill that kubelet calls OOMKilled — but the actual event is in the kernel ring buffer, not Kubernetes.

Level 5 — Container runtime direct inspection

kubelet gets container status from the container runtime (containerd/CRI-O) via gRPC. Sometimes the runtime has more detail than kubelet exposed. On the node: crictl ps -a | grep <pod-name> to find previous container IDs. Then: crictl logs <container-id> (may contain output kubelet did not capture) and crictl inspect <container-id> (shows OOM details, mount failures, seccomp violations). Also check: journalctl -u kubelet --since "10 minutes ago" | grep <pod-name> and journalctl -u containerd --since "10 minutes ago".

Level 6 — Override the entrypoint to isolate the image vs. the environment

Deploy the same image with command: ["sleep", "3600"] to bypass the application entirely. If the pod stays Running, the image starts correctly — the problem is in the application startup code, environment variables, or config. If it still crashes, the problem is infrastructure (security context, volume mount, network). Once the sleep pod is running, exec in and test the actual binary manually: run /app/server --config=/etc/app/config.yaml and watch stderr directly. This is the fastest way to get a terminal on the exact environment the crashing container sees.

Why this question matters in interviews

Junior engineers stop at Level 1. Mid-level engineers reach Level 3. Senior engineers know Level 4 (dmesg). Staff engineers know Level 5 (crictl) and always jump straight to Level 6 first to isolate the variable. The question tests not just knowledge — it tests the systematic thinking that distinguishes engineers who can debug blindly at 3 AM from engineers who need a log to know where to look.

Q: A pod is stuck in ContainerCreating for 10 minutes. What do you check?

kubectl describe pod → Events section. The most common causes in order: PVC not bound (check kubectl get pvc), Secret or ConfigMap referenced in the spec does not exist, image pull is failing silently, CNI plugin has not assigned the pod an IP yet. If Events are empty, check kubectl describe node <node-name> — disk pressure or kubelet issues appear there.

Q: A node becomes NotReady. What happens to the pods running on it?

Immediately: pods continue running on the node (kubelet may still be running the containers), but the Scheduler stops placing new pods there. After the pod-eviction-timeout (default 5 minutes), the node controller marks pods as Unknown and evicts them — ReplicaSet and Deployment pods are rescheduled on other nodes. StatefulSet pods are NOT rescheduled automatically (to prevent split-brain). DaemonSet pods stay in Unknown status. Pods with local persistent volumes may lose data if the node does not recover.

Q: The pod is not OOMKilled and not crashing, but the service is very slow under load. What could it be?

CPU throttling. When a container hits its CPU limit, the Linux kernel throttles it using CFS (Completely Fair Scheduler) — the process does not crash, it just runs slower. Check: kubectl top pod my-pod to see current CPU usage vs limit. If usage is at or near the limit, throttling is likely. Fix: increase the CPU limit, or remove it (and rely on requests for scheduling). You can also check /sys/fs/cgroup/cpu/cpu.stat inside the container for nr_throttled periods.

Q: What are QoS classes and how do they affect which pod gets evicted first?

Kubernetes assigns one of three QoS classes based on resource declarations. Guaranteed: requests = limits for all containers — last to be evicted. Burstable: requests set but less than limits — evicted after BestEffort. BestEffort: no requests or limits set — first to be evicted when the node is under memory pressure. OOMKilled order follows the same priority. A BestEffort pod will be killed before a Burstable pod, which is killed before a Guaranteed pod. Always set requests and limits in production.

Q: A pod has STATUS: Evicted. Why? Will it restart?

The node evicted the pod due to resource pressure — typically disk pressure (DiskPressure), memory pressure (MemoryPressure), or PID pressure. Unlike CrashLoopBackOff, evicted pods are NOT automatically restarted in place. The ReplicaSet Controller notices the pod is gone and creates a replacement on a different node. The evicted pod object remains in the namespace (STATUS: Evicted) until you clean it up: kubectl get pods --field-selector=status.phase==Failed -o name | xargs kubectl delete.

Q: Your service returns intermittent 502s/503s during rolling updates, even with a readiness probe. What is the cause?

The iptables propagation race. When a pod is removed from Service endpoints, kube-proxy updates iptables rules — but this propagation takes 1–2 seconds and happens independently on each node. During this window, some nodes still route traffic to the terminating pod. Fix: add preStop: exec: command: ["sleep", "5"]. This creates a 5-second delay before SIGTERM, giving iptables rules time to propagate across all nodes before the pod starts shutting down. This is why the preStop sleep is not optional for zero-downtime deployments.

Q: How do you debug a production pod that has no shell installed (distroless image)?

Use ephemeral containers: kubectl debug -it my-pod --image=busybox:1.28 --target=my-container. This attaches a temporary busybox container to the running pod without restarting it — it shares the same network namespace, so you can curl internal services, check DNS, inspect /proc, and run network diagnostics. Available since Kubernetes v1.23. No image rebuild, no redeployment, no downtime.

Q: How do you view events for a pod that has already been deleted?

Pod events are stored as Event objects in the namespace, not inside the pod. They persist for ~1 hour after the pod is deleted (controlled by --event-ttl on the API Server, default 1h). Check: kubectl get events --sort-by='.lastTimestamp' -n my-namespace. Filter by pod name: kubectl get events --field-selector involvedObject.name=my-pod. For longer retention, you need a dedicated events backend (OpenSearch, Loki, or an events export tool like event-exporter).

Useful Debugging Commands

kubectl debugging commands

# Get pods in namespace
kubectl get pods -n <namespace>

# Describe pod (most useful command)
kubectl describe pod <pod-name> -n <namespace>

# Current container logs
kubectl logs <pod-name> -n <namespace>

# Previous crashed container logs (very important)
kubectl logs <pod-name> -n <namespace> --previous

# Check events sorted by time
kubectl get events -n <namespace> --sort-by='.lastTimestamp'

# Check resource usage
kubectl top pod <pod-name> -n <namespace>

# Check exact pod phase
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.status.phase}'

# Check last container state (exit code + reason)
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.status.containerStatuses[*].lastState}'

# Exec into a running container
kubectl exec -it <pod-name> -n <namespace> -- /bin/sh

# Debug distroless or minimal containers
kubectl debug -it <pod-name> --image=busybox:1.28 --target=<container-name>

Simple Troubleshooting Flow

If pod is Pending

kubectl describe pod <pod-name> -n <namespace>

• No node available (not enough CPU or memory)

• PVC not bound

• Taint or toleration issue

• Image still being pulled

If pod is ImagePullBackOff

kubectl describe pod <pod-name> -n <namespace> # check Events

• Wrong image name or tag

• Registry secret missing

• Registry access denied

If pod is CrashLoopBackOff

kubectl logs <pod-name> -n <namespace> --previous

• Check exit code in Last State

• Exit 137 = OOMKilled

• Exit 1 = app error

• Exit 0 = wrong CMD

• Exit 127 = binary not found

If pod is Running but READY is 0/1

kubectl describe pod <pod-name> -n <namespace> # check Conditions

• Readiness probe endpoint failing

• App dependency is down

• App still starting

• Wrong port or health check path

If pod is stuck Terminating

kubectl delete pod <pod-name> -n <namespace> --grace-period=0 --force

• preStop hook is stuck

• Finalizer blocking deletion

• Node is unreachable

Most Important Points to Remember

1. Running does not mean Ready.

2. Readiness failure stops traffic but does NOT kill the container.

3. Liveness failure kills and restarts the container.

4. CrashLoopBackOff = app keeps crashing. Check --previous logs.

5. OOMKilled = container hit memory limit. Exit code 137.

6. Terminating is NOT a real pod phase. 5 phases only.

7. Traffic is removed BEFORE SIGTERM during pod deletion.

8. HPA deletes the newest healthy pod first during scale-down.

9. If request > grace period → SIGKILL fires → 502 error.

10. Always kubectl describe pod — not just kubectl get pods.

One-Line Summary

Kubernetes pod lifecycle is the complete journey of a pod from creation, scheduling, startup, readiness, traffic serving, crashing, restarting, and graceful deletion.

About the author

Ravi Kapoor

Senior DevOps Engineer & Technical Writer

CKA & AWS SA-Pro Certified9 yrs — Atlassian & FintechKubernetes open-source contributor

Ravi is a senior DevOps engineer with 9 years of experience building cloud-native infrastructure at Atlassian and multiple fintech companies. CKA and AWS Solutions Architect Professional certified, he has managed Kubernetes clusters serving millions of daily users and contributes to open-source tooling.

Targeting a Kubernetes or DevOps Role?

AiResumeFit matches your resume to Kubernetes, cloud, and DevOps job descriptions — improving your ATS score in seconds.

Optimize My Resume →

Kubernetes Pod Lifecycle Explained: Phases, Conditions, Probes, and Which Pod HPA Kills First

Simple Flow

Lifecycle at a Glance

Who Does What

Pod Phases

Pod Conditions

Container States

Init Containers

Sidecar Containers

Probes

What If Probe Is Not Configured?

Termination Hooks — preStop and SIGTERM

What If a Request Takes Longer Than the Grace Period?

RestartPolicy — Always, OnFailure, Never

CrashLoopBackOff Explained

OOMKilled

ImagePullBackOff

Resource Requests vs Limits — CPU Throttling vs OOMKilled

QoS Classes — Who Gets Evicted First

Pod Eviction — When the Node Runs Out of Resources

HPA Scale-Down — Which Pod Dies First?

HPA vs StatefulSet vs DaemonSet — who dies first?

Rolling Update Mechanics — maxSurge and maxUnavailable

Deployment vs StatefulSet vs DaemonSet — What Is the Difference?

Running but Not Ready — Why Traffic Stops

SaaS company, Tuesday 10 AM

Troubleshooting Flowchart

Why is my pod stuck in Terminating? — Finalizers

The #1 question: pod keeps restarting but logs show nothing

ImagePullBackOff vs ErrImagePull

The Wall of Shame

Production Troubleshooting Scenarios

Interview Corner

Useful Debugging Commands

Simple Troubleshooting Flow

Targeting a Kubernetes or DevOps Role?