← Interview Questions
Kubernetes80+ Questions · Beginner to Expert
Kubernetes Interview Questions & Answers (2026)
80+ Kubernetes interview questions with detailed, expert answers. Covers pods, services, deployments, RBAC, networking, operators, multi-cluster, and real-world troubleshooting scenarios.
BeginnerIntermediateAdvancedExpert
How to Use This Guide
Start from the Beginner section and work down. For CKA exam prep, focus on Intermediate and Advanced. For senior SRE / platform engineering interviews, master the Expert section. Each answer includes the reasoning interviewers expect — not just the definition.
Beginner
Q: What is Kubernetes and why is it used?
Kubernetes (K8s) is an open-source container orchestration platform originally developed by Google. It automates the deployment, scaling, scheduling, and management of containerized applications across a cluster of machines.
**Why it is used:**
- **Automated scaling:** Kubernetes can automatically scale applications up or down based on CPU utilization, memory, or custom metrics using Horizontal Pod Autoscaler (HPA).
- **Self-healing:** If a container crashes, Kubernetes automatically restarts it. If a node fails, workloads are rescheduled onto healthy nodes.
- **Service discovery and load balancing:** Kubernetes provides built-in DNS for service discovery and load balances traffic across pod replicas.
- **Rolling deployments and rollbacks:** Kubernetes enables zero-downtime deployments and can roll back to a previous version if a deployment fails.
- **Declarative configuration:** You describe the desired state (via YAML manifests), and Kubernetes continuously works to achieve and maintain that state.
In production environments, Kubernetes is the de facto standard for container orchestration at scale.
Q: What is a Pod in Kubernetes?
A Pod is the smallest deployable unit in Kubernetes. A Pod represents one or more containers that share the same network namespace, storage volumes, and lifecycle.
**Key characteristics:**
- Containers within a Pod share the same IP address and port space — they communicate via localhost.
- Pods are ephemeral — when a Pod dies, it is not restarted in-place; a new Pod with a new IP is created.
- Pods can have multiple containers (sidecar pattern), but usually contain one main container.
- Init containers run and complete before the main containers start, used for setup tasks.
**Pod lifecycle states:** Pending → Running → Succeeded / Failed / Unknown.
In practice, you almost never create Pods directly — you create Deployments, StatefulSets, or DaemonSets that manage Pods for you.
Q: What is the difference between a Deployment and a StatefulSet?
**Deployment:**
- Manages stateless applications.
- Pods are interchangeable — they have random names (e.g., `nginx-7d4b9c-xyz`).
- Pods can be restarted, replaced, or scaled without concern for identity.
- Ideal for web servers, API services, and any application that does not need persistent storage tied to a specific instance.
**StatefulSet:**
- Manages stateful applications (databases, distributed systems like Kafka, Zookeeper, Elasticsearch).
- Pods have stable, unique identities with predictable names (e.g., `postgres-0`, `postgres-1`).
- Pods are deployed and terminated in order (0, 1, 2...) by default.
- Each Pod gets its own PersistentVolumeClaim (PVC), which persists across Pod restarts.
- Uses a headless Service for stable DNS names (`postgres-0.postgres.namespace.svc.cluster.local`).
**When to use each:** Use Deployments for web applications, microservices, and APIs. Use StatefulSets for databases (PostgreSQL, MySQL), message brokers (Kafka, RabbitMQ), and any system where Pod identity and persistent storage matter.
Q: What is a Kubernetes Service and what types exist?
A Kubernetes Service provides a stable network endpoint (IP address and DNS name) for accessing a set of Pods. Since Pods are ephemeral and their IPs change, Services provide consistent addressing through label selectors.
**Service Types:**
1. **ClusterIP (default):** Exposes the service on a cluster-internal IP. Only accessible within the cluster. Used for internal microservice communication.
2. **NodePort:** Exposes the service on a static port on every node's IP (range: 30000–32767). Accessible from outside the cluster via `NodeIP:NodePort`. Useful for development but not recommended for production.
3. **LoadBalancer:** Creates an external cloud load balancer (AWS ELB, GCP LB) that routes traffic to the service. The standard approach for exposing services externally in cloud environments.
4. **ExternalName:** Maps a service to a DNS name (e.g., an external database). Returns a CNAME record. Useful for integrating external services into the cluster DNS namespace.
**Headless Service:** A ClusterIP service with `clusterIP: None`. Instead of a virtual IP, DNS returns the IPs of individual Pods directly. Required for StatefulSets.
Q: What is a ConfigMap and a Secret in Kubernetes?
Both are Kubernetes objects for injecting configuration data into Pods, but they serve different purposes.
**ConfigMap:**
- Stores non-sensitive configuration data as key-value pairs.
- Examples: database host, application log level, feature flags, configuration files.
- Stored in plain text in etcd.
- Can be mounted as environment variables or as files in a volume.
**Secret:**
- Stores sensitive data: passwords, API keys, TLS certificates, tokens.
- Stored base64-encoded in etcd (not truly encrypted by default — encryption at rest requires additional configuration with KMS providers).
- Should be accessed via environment variables or volume mounts, never hardcoded in container images.
- Can be of type `Opaque`, `kubernetes.io/tls`, `kubernetes.io/dockerconfigjson`, etc.
**Best practice:** Use Secrets for sensitive data, ConfigMaps for non-sensitive configuration. For production systems, integrate with external secret managers (HashiCorp Vault, AWS Secrets Manager) using solutions like External Secrets Operator.
Intermediate
Q: How does Kubernetes scheduling work?
The Kubernetes scheduler (kube-scheduler) is responsible for assigning Pods to nodes. The scheduling process happens in two phases:
**Phase 1 — Filtering (Predicates):**
The scheduler filters out nodes that cannot run the Pod based on constraints:
- **Resource requests:** Node must have sufficient CPU and memory available.
- **NodeSelector / nodeAffinity:** Pod must be scheduled on nodes with matching labels.
- **Taints and Tolerations:** A node with a taint rejects Pods that do not have a matching toleration.
- **PodAffinity / PodAntiAffinity:** Pod must/must not be co-located with other Pods.
- **Volume topology:** Node must support the required storage class zone.
**Phase 2 — Scoring (Priorities):**
Remaining nodes are scored by multiple functions:
- `LeastAllocated`: Prefer nodes with more available resources.
- `NodeAffinity`: Prefer nodes matching preferred affinity rules.
- `SpreadConstraint`: Balance Pods across zones/nodes for HA.
The node with the highest score wins. If multiple nodes tie, one is chosen at random.
**Custom scheduling:** You can influence scheduling with node affinity rules (preferred or required), Pod topology spread constraints, and resource limits/requests. For advanced use cases, custom schedulers can be deployed alongside the default scheduler.
Q: Explain Kubernetes RBAC (Role-Based Access Control).
RBAC controls who can perform what actions on which Kubernetes resources. It uses four key API objects:
**1. Role (namespace-scoped):** Defines a set of permissions (verbs) on resources within a specific namespace.
```yaml
kind: Role
metadata:
namespace: production
name: pod-reader
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "watch", "list"]
```
**2. ClusterRole (cluster-scoped):** Same as Role but applies cluster-wide or to non-namespaced resources (Nodes, PersistentVolumes).
**3. RoleBinding:** Binds a Role to a Subject (User, Group, ServiceAccount) within a namespace.
**4. ClusterRoleBinding:** Binds a ClusterRole to a Subject across the entire cluster.
**Common verbs:** get, list, watch, create, update, patch, delete, deletecollection.
**Subjects:** Users (humans, authenticated via certs/OIDC), Groups, ServiceAccounts (for Pods to access the API).
**Best practice:** Follow the principle of least privilege. Grant the minimum permissions required. Use namespaced Roles over ClusterRoles where possible. Create dedicated ServiceAccounts per application rather than using the default ServiceAccount.
Q: What is a PersistentVolume (PV) and PersistentVolumeClaim (PVC)?
Kubernetes separates storage provisioning from storage consumption through a two-resource model.
**PersistentVolume (PV):**
- A piece of storage provisioned by an administrator or dynamically by a StorageClass.
- Represents actual storage (AWS EBS volume, GCP PD, NFS share, local disk).
- Has a lifecycle independent of any Pod — data persists even when the Pod is deleted.
- Defined with capacity, access mode, and reclaim policy.
**PersistentVolumeClaim (PVC):**
- A request for storage by a user/application.
- Specifies size, access mode (ReadWriteOnce, ReadOnlyMany, ReadWriteMany), and optionally a StorageClass.
- The control plane matches a PVC to an available PV based on requirements.
- Pods mount the PVC as a volume.
**Access Modes:**
- `ReadWriteOnce (RWO)`: Can be mounted read-write by a single node. (EBS, GCP PD)
- `ReadOnlyMany (ROX)`: Can be mounted read-only by many nodes.
- `ReadWriteMany (RWX)`: Can be mounted read-write by many nodes. (NFS, EFS, CephFS)
**Reclaim Policies:** `Delete` (PV is deleted when PVC is released), `Retain` (PV remains for manual cleanup), `Recycle` (deprecated).
**Dynamic provisioning:** A StorageClass enables automatic PV creation when a PVC is submitted — no manual PV creation required. This is the standard approach in cloud environments.
Q: What is a Kubernetes Ingress and when would you use it over a LoadBalancer service?
An Ingress is a Kubernetes API object that manages external HTTP/HTTPS traffic routing to services within the cluster. Unlike a LoadBalancer service (which creates one external load balancer per service), an Ingress uses a single entry point and routes traffic based on rules.
**Ingress capabilities:**
- **Host-based routing:** Route `api.example.com` → service A, `app.example.com` → service B.
- **Path-based routing:** Route `/api/*` → service A, `/static/*` → service B.
- **TLS termination:** Handle HTTPS at the Ingress level, routing HTTP internally.
- **Rewrites and redirects:** Modify request paths before forwarding.
**Ingress Controller (required):** The Ingress resource only defines rules — an Ingress Controller implements them. Popular controllers: NGINX Ingress, Traefik, AWS ALB Ingress Controller, Kong, Istio Gateway.
**LoadBalancer Service vs Ingress:**
- Use a **LoadBalancer Service** when you need TCP/UDP load balancing (non-HTTP traffic), or when simplicity is more important than cost.
- Use **Ingress** when you have multiple HTTP/HTTPS services and want to share a single external load balancer — reducing cloud costs (each LoadBalancer service costs money in AWS/GCP).
In practice, most production Kubernetes clusters use a combination: one LoadBalancer service for the Ingress Controller, and Ingress rules for all HTTP services.
Q: How does Kubernetes handle rolling deployments and rollbacks?
Kubernetes Deployments use a RollingUpdate strategy by default, enabling zero-downtime deployments.
**RollingUpdate parameters:**
- `maxSurge`: Maximum number of extra Pods above the desired count during an update (default: 25%). Controls how fast new Pods are created.
- `maxUnavailable`: Maximum number of Pods that can be unavailable during the update (default: 25%). Controls how aggressively old Pods are terminated.
**Rolling update process:**
1. Kubernetes creates new Pods with the updated spec (up to `maxSurge` above desired).
2. Once new Pods pass readiness probes, old Pods are terminated (maintaining `maxUnavailable` threshold).
3. This cycle repeats until all old Pods are replaced.
**Readiness Probes are critical:** They prevent Kubernetes from routing traffic to a new Pod until it is actually ready to serve. Without proper readiness probes, rolling updates can cause errors.
**Rollbacks:**
```bash
# Rollback to the previous revision
kubectl rollout undo deployment/my-app
# Rollback to a specific revision
kubectl rollout undo deployment/my-app --to-revision=3
# Check rollout history
kubectl rollout history deployment/my-app
```
Kubernetes stores up to `revisionHistoryLimit` (default: 10) previous ReplicaSets, enabling quick rollbacks.
**Recreate strategy:** An alternative that terminates all existing Pods before creating new ones. Causes downtime but ensures no version mix. Useful when the old and new versions cannot run simultaneously.
Advanced
Q: Explain the Kubernetes control plane components and their roles.
The Kubernetes control plane manages the overall cluster state and is typically run on dedicated master nodes (or as a managed service in EKS/GKE/AKS).
**kube-apiserver:**
- The central component that exposes the Kubernetes REST API.
- All requests (from kubectl, controllers, kubelet) go through the API server.
- Responsible for authentication, authorization (RBAC), admission control, and validation.
- The only component that reads from and writes to etcd.
- Horizontally scalable — multiple replicas can run behind a load balancer.
**etcd:**
- The distributed key-value store that is the source of truth for all cluster state.
- Stores all API objects: Pods, Deployments, ConfigMaps, Secrets, etc.
- Uses the Raft consensus algorithm for consistency across multiple etcd members.
- Must be backed up regularly — losing etcd without a backup means losing the entire cluster state.
- Critical for performance: a slow etcd causes widespread cluster issues.
**kube-scheduler:**
- Watches for unscheduled Pods and assigns them to nodes.
- Does not actually start Pods — it only writes the node assignment to etcd via the API server.
- Respects resource requests, taints/tolerations, affinity rules, and topology constraints.
**kube-controller-manager:**
- Runs multiple controllers as a single binary (for efficiency):
- **ReplicaSet Controller:** Maintains the desired number of Pod replicas.
- **Deployment Controller:** Manages rolling updates and rollbacks.
- **Node Controller:** Monitors node health and evicts Pods from failed nodes.
- **Endpoints Controller:** Populates Service endpoint slices.
- **CronJob Controller:** Creates Jobs on schedule.
**cloud-controller-manager:**
- Runs cloud-provider-specific logic (optional, present in cloud-managed clusters).
- Manages cloud resources: load balancers, storage volumes, node lifecycle.
Q: What is a Kubernetes Operator and when would you build one?
A Kubernetes Operator is a software extension that uses Custom Resource Definitions (CRDs) and a custom controller to encode operational knowledge about an application into Kubernetes-native automation.
**The Operator pattern:**
1. Define a CRD (e.g., `PostgresCluster`) that extends the Kubernetes API.
2. Users create instances of this CRD (`kubectl apply -f my-postgres.yaml`).
3. The Operator controller watches for these custom resources and performs the complex operational logic needed to fulfill them (creating Deployments, Services, ConfigMaps, running backups, handling failover).
**Why Operators exist:**
Kubernetes provides primitives (Deployments, StatefulSets, Services) but does not understand application-specific operational procedures. Operators encode:
- Cluster provisioning and initialization sequences
- Leader election and membership management
- Backup and restore procedures
- Rolling upgrades specific to the application's requirements
- Self-healing logic (e.g., automatically rejoining a split-brain database cluster)
**When to build an Operator:**
- You are managing a stateful distributed system (database cluster, message broker, search engine)
- You need to automate Day 2 operations (upgrades, backups, scaling, failover)
- You are building a platform that needs to offer database-as-a-service or similar primitives to application teams
**Operator Maturity Levels (Operator Framework model):**
1. Basic Install → 2. Seamless Upgrades → 3. Full Lifecycle → 4. Deep Insights → 5. Auto Pilot
**Toolkits:** Operator SDK (Go, Ansible, Helm), Kubebuilder, Metacontroller.
**Examples of real Operators:** Prometheus Operator, Strimzi (Kafka), Postgres Operator (Zalando), MongoDB Community Operator, Cert-manager.
Q: How would you troubleshoot a Pod that is stuck in CrashLoopBackOff?
CrashLoopBackOff means a container is starting, crashing (exiting with a non-zero code), and Kubernetes keeps restarting it with exponential backoff (10s → 20s → 40s → ... → 5m max).
**Diagnostic process:**
**Step 1 — Check Pod events and status:**
```bash
kubectl describe pod <pod-name> -n <namespace>
```
Look at the Events section: OOMKilled, Liveness probe failed, Error, ImagePullBackOff, etc.
**Step 2 — Check container logs:**
```bash
# Current logs
kubectl logs <pod-name> -n <namespace>
# Previous crashed container logs (most useful)
kubectl logs <pod-name> -n <namespace> --previous
```
**Step 3 — Check exit code:**
In `kubectl describe pod` output, the "Last State" shows the exit code:
- Exit 0: Process exited cleanly but shouldn't have — check startup logic
- Exit 1: Application error — check logs
- Exit 137: OOMKilled (out of memory) — increase memory limits
- Exit 139: Segmentation fault
- Exit 143: SIGTERM received (graceful shutdown signal)
**Common root causes:**
- **OOMKilled:** Container exceeded memory limit. Solution: increase `resources.limits.memory` or fix memory leak.
- **Liveness probe failure:** Probe is too strict or application is slow to start. Solution: adjust `initialDelaySeconds`, `periodSeconds`, or `failureThreshold`.
- **Bad configuration:** Missing required environment variable or config file. Solution: check ConfigMap/Secret mounts.
- **Application startup error:** Bug in application code. Solution: check logs and fix the code.
- **Wrong command:** Container command or args incorrect in the pod spec.
- **Image issues:** Application inside the container fails to start. Inspect the image.
**Step 4 — Debug with a sleep container:**
```bash
# Override the command to keep the container alive and exec in
kubectl run debug --image=<your-image> --command -- sleep infinity
kubectl exec -it debug -- /bin/sh
```
Expert
Q: Describe how you would design a multi-cluster Kubernetes strategy for a global application.
Multi-cluster Kubernetes architecture is a complex domain with several competing approaches. The right design depends on the requirements: global latency, disaster recovery, compliance (data residency), team isolation, or blast radius reduction.
**Why multi-cluster (vs. multi-namespace in a single cluster):**
- True failure domain isolation (a cluster control plane failure affects only one cluster)
- Regulatory data residency requirements (EU data in EU clusters, US data in US clusters)
- Network latency optimization (serve users from the nearest region)
- Team/org isolation with stronger security boundaries than RBAC namespaces
**Architecture patterns:**
**1. Active-Active (Global Load Balancing):**
Multiple clusters in different regions, all serving production traffic. A global load balancer (AWS Route53 latency routing, Cloudflare, Fastly) routes users to the nearest healthy cluster. Best for stateless services. Requires:
- Global DNS management
- Shared or replicated databases with multi-region writes (Aurora Global, Spanner, CockroachDB)
- Cross-cluster service discovery (Istio multi-cluster, Submariner, Skupper)
**2. Active-Passive (Disaster Recovery):**
A primary cluster serves all traffic; a standby cluster in another region is kept warm. Traffic only fails over to standby during a primary failure. Lower cost than active-active. RTO/RPO requirements drive the standby cluster's resource level.
**3. Hub-and-Spoke (Management Plane):**
A central "hub" cluster runs management and observability tooling. "Spoke" clusters run workloads. The hub cluster manages all spoke clusters via fleet management tools (Argo CD ApplicationSets, Fleet, CAPI).
**Fleet management tooling:**
- **Cluster API (CAPI):** Declarative cluster lifecycle management (provision, upgrade, delete) across cloud providers.
- **ArgoCD ApplicationSets:** Deploy applications to multiple clusters from a single ArgoCD instance using cluster generators.
- **Rancher / Fleet:** Manages hundreds of clusters with policy and application distribution.
- **Anthos / ACM:** Google's multi-cluster management platform.
**Cross-cluster networking:**
- **Istio multi-primary / primary-remote:** Extends service mesh across clusters with mTLS.
- **Submariner:** CNI-agnostic L3 connectivity between clusters.
- **Cilium Cluster Mesh:** High-performance cross-cluster networking with Cilium CNI.
**Key operational challenges:**
- Consistent configuration drift across clusters
- Cross-cluster secret management (Vault integration, External Secrets Operator)
- Centralized observability (federated Prometheus with Thanos, centralized logging)
- Cluster upgrade coordination to prevent version skew
**Real-world recommendation:** Start with a single cluster and namespaces for team isolation. Move to multi-cluster only when you hit a specific hard requirement — the operational complexity is significant.
Q: How does the Kubernetes networking model work, and how does CNI fit in?
The Kubernetes networking model is built on four fundamental requirements (the "flat network" model):
1. Every Pod gets its own unique IP address.
2. Pods on any node can communicate with Pods on any other node without NAT.
3. Agents (kubelet, kube-proxy) on a node can communicate with all Pods on that node.
4. Pods' self-view of their own IP matches what other nodes see (no IP masquerading between nodes).
This model simplifies application design but requires non-trivial implementation, which is where the CNI comes in.
**Container Network Interface (CNI):**
CNI is a specification and library for configuring network interfaces in Linux containers. When a Pod is created, kubelet calls the configured CNI plugin to:
- Create a virtual ethernet pair (veth pair): one end in the Pod's network namespace, one in the host
- Assign the Pod's IP address
- Set up routing rules so the Pod can reach other Pods
**CNI Plugin Categories:**
**Overlay networks (VXLAN/GENEVE tunneling):**
- Flannel (simple, VXLAN): Easy to configure. Pod-to-pod traffic is encapsulated in VXLAN packets over the underlying network.
- Weave Net: Similar overlay approach with optional encryption.
- Calico with VXLAN mode.
**Underlay / BGP routing:**
- Calico (BGP mode): Uses BGP to distribute Pod CIDR routes to routers. No encapsulation overhead. Requires BGP-capable network fabric.
- Cilium (eBPF): Uses Linux eBPF programs to implement networking in the kernel, bypassing iptables for much lower overhead and better observability.
**eBPF-based (modern generation):**
- Cilium: Replaces iptables with eBPF programs. Provides network policy enforcement, Hubble observability, Cluster Mesh, and Gateway API support. The preferred choice for high-performance, security-sensitive environments.
**kube-proxy and Service networking:**
kube-proxy runs on each node and is responsible for implementing Service abstractions. It watches the API server for Service and Endpoint changes and programs iptables (or IPVS) rules that redirect traffic to the correct Pod IPs. Cilium can replace kube-proxy entirely with eBPF-based load balancing.
**NetworkPolicies:**
By default, all Pods can communicate with all other Pods. NetworkPolicy objects restrict traffic based on pod selectors, namespace selectors, and port/protocol rules. The CNI plugin is responsible for enforcing NetworkPolicies — not all CNIs support them (e.g., basic Flannel does not enforce them).
Also Prepare Your Kubernetes Resume
Make sure your resume passes ATS screening before your interview.