β Slack: #prod-incidents β
It's 2:17 AM.
Pod A can't reach Pod B. You check everything:
β Both pods are Running
β Both Services exist
β DNS resolves from inside the pod
β curl from outside the cluster works
curl from inside the cluster returns 503.
It's a NetworkPolicy. Nobody wrote one intentionally.
It was the default deny-all the security team added 6 months ago.
Everyone forgot about it.
You don't know why yet. You're about to.
Three Months Later. A Different Kind of War Room.
No Slack this time. Fluorescent lights. A system design interview at a company you actually want to work at. The interviewer uncaps a whiteboard marker and writes one question:
βHow does pod-to-pod networking work in Kubernetes?β
You know this. You've debugged it at 2 AM. You draw the answer.
The Simple Answer (The Trap)
Most engineers draw this. It is not wrong. It is just catastrophically incomplete.
The interviewer nods. Writes something down. Then looks up.
Interviewer keeps going:
β βCan Pod A reach Pod B directly without a Service?β
β βWhat is a network namespace and why does each pod get one?β
β βWhat is a veth pair and how does it connect a pod to the node?β
β βHow does traffic reach a pod on a different node?β
β βYou applied a NetworkPolicy. It's not working. What's the first thing you check?β
Five questions. The first one eliminates most candidates. The last one reveals whether you have ever actually secured a cluster β or just created the appearance of one. Let's answer all five.
Before we go kernel-deep, here's the mental model that makes everything click. Think of a large office building:
| Office Building World | Kubernetes Networking World |
|---|---|
| An office with its own door and intercom (isolated, private) | Network namespace β each pod has its own routing table, interfaces, and iptables rules |
| The phone cable connecting the office to the building switchboard | veth pair β one end in the pod namespace (eth0), one end on the host |
| The building's internal switchboard on one floor | cni0 bridge β routes traffic between pods on the same node |
| The building's entire internal wiring system | CNI plugin β creates and manages all veth pairs and bridges |
| An encrypted tunnel between two separate buildings | VXLAN (Flannel) β wraps packets in UDP for cross-node travel |
| A shared highway between buildings β no tunnels, native routes | BGP (Calico) β advertises pod CIDRs, zero encapsulation overhead |
| The company's central switchboard number (stable, routes to whoever is available) | Service β stable ClusterIP that routes to healthy pod endpoints |
| The security access control list (who can call whom) | NetworkPolicy β defines allowed ingress and egress traffic per pod |
| The building's internal directory (name β number lookup) | CoreDNS β resolves service.namespace.svc.cluster.local to ClusterIP |
Hold that analogy. Everything below is the same thing β except the wiring is veth pairs in the Linux kernel, the switchboard is a bridge running in the host network namespace, and the security list is enforced (or silently ignored) by your CNI plugin.
Q1: Can Pod A Reach Pod B Directly Without a Service?
Most people hedge. βI think so, but you'd normally use a Service.β That's like asking if you can walk directly to a colleague's office and saying βprobably, but you'd normally send an email.β
Yes. Unconditionally. Every pod has a real, routable IP address.
Kubernetes guarantees flat L3 connectivity across the entire cluster. Pod A at10.244.1.2 can send a packet directly to Pod B at 10.244.2.5β on a different node, different namespace, different deployment β with no Service involved. The packet arrives. No NAT. No masquerading between pods. This is one of the four fundamental guarantees of the Kubernetes networking model.
The Service exists for two reasons that have nothing to do with enabling connectivity:
- Discoverability. Pod IPs change every time a pod restarts, reschedules, or a deployment rolls. The Service ClusterIP is stable. DNS points at the ClusterIP, not the pod.
- Load balancing. Multiple pod replicas, single stable entry point. The Service selects a healthy replica via iptables DNAT.
Skipping the Service adds no security. Anyone with a pod IP can reach that pod directly, bypassing the Service entirely. If you want to restrict access, you need NetworkPolicy β not the absence of a Service.
π¨ Interview Trap
Q2: What Is a Network Namespace?
Most people say βit's how Kubernetes isolates pod networking.β That is a description of the effect. The interviewer wants the mechanism.
A network namespace is a Linux kernel isolation primitive. It gives a process its own private copy of the network stack β routing table, iptables rules, network interfaces, ARP table, socket table β completely separate from every other namespace on the same machine. Two processes in different network namespaces can both bind port 8080 simultaneously without conflict. From inside the namespace, it looks like a dedicated machine with its own eth0. From the host, it is one namespace among many, all sharing the same kernel.
Every pod gets exactly one network namespace. Here is the part that surprises people: it is not the container that holds the namespace. It is the pause container (also called the infra container) β a tiny process (gcr.io/pause) that does nothing except hold the network namespace open. All application containers in the pod join that namespace. This is why containers in the same pod can communicate via localhost and why they share the same IP address and port space.
π§ Memory Trick
You can inspect a pod's network namespace directly on the node. Find the pause container PID via crictl inspect, then run:
# Find the pause container PID for a pod
crictl ps | grep <pod-name>
crictl inspect <container-id> | grep pid
# Enter the pod's network namespace and see its interfaces
nsenter --net=/proc/<pid>/ns/net ip addr
nsenter --net=/proc/<pid>/ns/net ip route
# This is the exact same view the container has.Q3: What Is a veth Pair?
A veth pair is a virtual ethernet cable with two ends. Anything written into one end comes out the other. It is a kernel primitive β no process, no daemon, no userspace involvement. The kernel handles it entirely.
When the CNI plugin starts a pod, it creates a veth pair and does something specific: it moves one end into the pod's network namespace (where it appears aseth0) and leaves the other end in the host network namespace (where it gets a name like veth3a8b12f or califed2a41b). The host ends of all veth pairs on a node are attached to a Linux bridge β typically namedcni0 or cbr0.
The packet path for same-node communication is pure L2 switching:
Pod A eth0 β veth pair β cni0 bridge β veth pair β Pod B eth0
No routing. No NAT. The bridge forwards frames based on MAC addresses exactly like a physical switch. The whole path is inside the kernel, in memory, at near-wire speed.
Pod A sends a packet to Pod B on the same node
βββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Node A (10.0.0.10) β
β β
β βββββββββββββββββββββββ βββββββββββββββββββββββ β
β β Pod A Namespace β β Pod B Namespace β β
β β IP: 10.244.1.2 β β IP: 10.244.1.3 β β
β β βββββββββββββββ β β βββββββββββββββ β β
β β β eth0 β β β β eth0 β β β
β β ββββββββ¬βββββββ β β ββββββββ¬βββββββ β β
β βββββββββββΌββββββββββββ βββββββββββΌββββββββββββ β
β β veth pair β veth pair β
β vethA0 vethB0 β
β β β β
β βββββββββββ΄ββββββββββββββββββββββββββββ΄βββββββββββββββββ β
β β cni0 bridge (10.244.1.1) β β
β ββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββ β
β β node eth0 10.0.0.10 β
βββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββ
β
Physical / Virtual Network
β
βββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββ
β Node B (10.0.0.11) β
β β node eth0 10.0.0.11 β
β ββββββββββββββββββββββββββββ΄ββββββββββββββββββββββββββββ β
β β cni0 bridge (10.244.2.1) β β
β ββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββ β
β vethC0 β
β β β
β ββββββββββββββββ΄ββββββββ β
β β Pod C 10.244.2.2 β β
β ββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Same-node: Pod A eth0 β vethA0 β cni0 bridge β vethB0 β Pod B eth0
Cross-node: Pod A eth0 β vethA0 β cni0 β eth0 β [Flannel VXLAN or Calico BGP] β Node B eth0 β cni0 β vethC0 β Pod C eth0
π₯ Production Reality
ip link show type veth. Each running pod contributes one veth to the list. If the number of veth interfaces does not match the number of running pods, your CNI has a cleanup problem β leaked veth pairs from terminated pods consume kernel resources and can exhaust the available IP pool. This is rare but has caused IP exhaustion incidents on high-churn nodes. Add a node-level check for veth count vs pod count to your node health monitoring.Q4: How Does Traffic Reach a Pod on a Different Node?
When a packet leaves the cni0 bridge headed for a pod on a different node, it exits through the node's physical network interface. From here the CNI plugin's strategy takes over. There are two fundamentally different approaches β and choosing between them is one of the first architectural decisions you make in a cluster.
VXLAN Overlay β Flannel
Flannel wraps the original IP packet inside a UDP datagram destined for the target node. A VXLAN tunnel endpoint (the flannel.1 interface) on each node handles encapsulation and decapsulation. The overhead is approximately 50 bytes per packet β VXLAN header plus outer UDP/IP. On networks with a standard 1500-byte MTU, you must set the pod MTU to 1450 or enable jumbo frames, otherwise large packets get silently fragmented.
Flannel works on any network that allows the nodes to reach each other on UDP port 8472. It requires no routing configuration at the physical network layer, which makes it simple to deploy. It also enforces exactly zero NetworkPolicies. More on that in Q5.
BGP Native Routing β Calico
Calico runs a BGP daemon (BIRD) on each node that advertises its pod CIDR to its peers. Other nodes install routes for those subnets directly in their kernel routing table. A packet destined for a pod on Node B is forwarded as plain IP β no encapsulation, no overhead, no MTU adjustments needed. The physical network just sees regular IP packets.
The requirement: your underlying network must allow BGP route advertisements between nodes. In a real data center with BGP-capable ToR switches, Calico in full BGP mode is exceptionally powerful β pod routes become visible to physical network equipment, and you can peer Calico with your ToR switches using BGPPeer resources. In cloud environments (AWS, GCP), the underlying VPC typically does not accept arbitrary BGP route advertisements, so Calico falls back to IP-in-IP or VXLAN overlay in those environments.
VXLAN (Flannel Overlay) β Cross-Node Flow
ββββββββββββββββββββββββββββββββββββββββββ
Pod A 10.244.1.2 on Node A 10.0.0.10
β
β original packet: src=10.244.1.2 dst=10.244.2.2
βΌ
flannel.1 VTEP β VXLAN encapsulation:
outer UDP src=10.0.0.10:12345 dst=10.0.0.11:8472
VXLAN header VNI=1
inner payload: original IP packet (~50 bytes overhead)
βΌ
eth0 (Node A) βββ physical network βββ eth0 (Node B)
β
β VXLAN decapsulation on flannel.1 VTEP (Node B)
βΌ
cni0 bridge β vethC0 β Pod C eth0 10.244.2.2
BGP (Calico) β Cross-Node Flow
βββββββββββββββββββββββββββββββ
Pod A 10.244.1.2 on Node A 10.0.0.10
β
β original packet: src=10.244.1.2 dst=10.244.2.2
βΌ
Node A routing table: 10.244.2.0/24 via 10.0.0.11 dev eth0
β No encapsulation. Plain IP routing. Zero overhead.
β BGP advertisement: "Node B owns 10.244.2.0/24"
βΌ
eth0 (Node A) βββ physical network βββ eth0 (Node B)
β
β Node B routing table: 10.244.2.0/24 is local
βΌ
cni0 bridge β vethC0 β Pod C eth0 10.244.2.2
| CNI | Cross-node strategy | Overhead | NetworkPolicy | Best for |
|---|---|---|---|---|
| Flannel | VXLAN overlay | ~50B/packet | None β enforces nothing | Simple dev/lab clusters |
| Calico | BGP native routing | Zero | Full L3/L4 | On-prem, policy, data center BGP |
| Cilium | eBPF (replaces kube-proxy) | Near-zero | L3/L4/L7 + DNS + HTTP paths | Scale, observability, zero-trust |
| AWS VPC CNI | Native VPC secondary IPs | None | Via Calico or Security Groups | EKS β native AWS networking |
Q5: You Applied a NetworkPolicy. It's Not Working. What's the First Thing You Check?
Most engineers check the policy YAML. They look for selector mismatches, wrong ports, typos. They find nothing wrong. The policy looks correct. Traffic still flows freely. Three hours later someone asks which CNI plugin is installed.
The first thing to check: whether your CNI plugin enforces NetworkPolicy at all.
This is not a theoretical edge case. Flannel β one of the most widely deployed CNIs in the world β does not implement NetworkPolicy. It has never implemented NetworkPolicy. A NetworkPolicy object with Flannel installed is a door that has no mechanism for locks. The lock looks installed. You can see it in kubectl get networkpolicy. Every packet ignores it.
Only Calico, Cilium, and Weave enforce NetworkPolicy. If your cluster runs Flannel and you have NetworkPolicy objects, you have the appearance of network isolation with none of the substance. Half the teams that believe they have micro-segmentation are in this situation. Security audits have found it in clusters that had been βsecuredβ for two years.
NetworkPolicy Enforcement β Who Actually Blocks the Packet βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ Incoming packet β Pod (10.244.1.5:8080) ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β CNI Plugin (Calico iptables / Cilium eBPF) β β β β Is there an ingress NetworkPolicy selecting this pod? β β YES β default deny; check allow rules β β NO β allow all ingress (Kubernetes default) β β β β Does source match an ingress allow rule? β β YES β ALLOW, deliver packet β β NO β DROP (silently β no TCP RST, no log by default) β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β Flannel node (no CNI policy engine) β β β β NetworkPolicy object exists in API? YES β β Does Flannel enforce it? NO β β Does the packet get through? YES β always β β β β The policy is a door with no lock mechanism. β β It looks configured. It does nothing. β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π¨ Interview Trap
# Step 1: Check if the CNI enforces NetworkPolicy at all
kubectl get pods -n kube-system | grep -E 'calico|cilium|weave|flannel'
# Flannel alone β NetworkPolicy is decorative. Add Calico or switch to Cilium.
# Step 2: List every NetworkPolicy in every namespace
kubectl get networkpolicy -A -o wide
# Step 3: Find which policies select a specific pod (check its labels)
kubectl get pod <pod-name> -n <ns> --show-labels
# Step 4: Test connectivity from a debug pod (same network namespace as app pods)
kubectl run debug --image=nicolaka/netshoot --rm -it --restart=Never -- bash
# Inside debug pod:
curl -v http://payment-api.production.svc.cluster.local:8080/health
nc -zv payment-api.production.svc.cluster.local 8080
# Step 5: Check DNS resolution
kubectl exec -it <pod> -n <ns> -- nslookup payment-api.production.svc.cluster.local
kubectl exec -it <pod> -n <ns> -- cat /etc/resolv.conf # check ndots value
# Step 6: Verify pod CIDR routes exist on the node
ip route show | grep 10.244
# Step 7: For Calico β check BGP peer status
kubectl exec -it -n calico-system <calico-node-pod> -- calico-node -bird-ready
calicoctl node statusCoreDNS, Internal DNS, and the ndots:5 Problem
Every pod's /etc/resolv.conf is injected by kubelet at startup. It points at CoreDNS's ClusterIP (typically 10.96.0.10), sets three search domains, and sets options ndots:5. The search domains let you use short names β payment-api instead ofpayment-api.production.svc.cluster.local. The ndots:5setting is where things get expensive.
The ndots:5 Trap
ndots:5 means: if a hostname has fewer than five dots, append each search domain before trying it as an absolute name. A query for payment-api (zero dots) generates four DNS queries β three NXDOMAIN responses and one answer. A query forpayment-api.production.svc.cluster.local (four dots) also generates four queries, because four is still less than five.
Pod /etc/resolv.conf (injected by kubelet): ββββββββββββββββββββββββββββββββββββββββββββ nameserver 10.96.0.10 β CoreDNS ClusterIP search default.svc.cluster.local svc.cluster.local cluster.local options ndots:5 Query: "payment-api" (0 dots < 5) β search domains appended first βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ Attempt 1: payment-api.default.svc.cluster.local β NXDOMAIN Attempt 2: payment-api.svc.cluster.local β NXDOMAIN Attempt 3: payment-api.cluster.local β NXDOMAIN Attempt 4: payment-api. β ANSWER β Query: "payment-api.production.svc.cluster.local" (4 dots < 5) βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ Still appends search domains first β 4 dots is still less than 5! Attempt 1: payment-api.production.svc.cluster.local.default.svc.cluster.local β NXDOMAIN Attempt 2: payment-api.production.svc.cluster.local.svc.cluster.local β NXDOMAIN Attempt 3: payment-api.production.svc.cluster.local.cluster.local β NXDOMAIN Attempt 4: payment-api.production.svc.cluster.local. β ANSWER β Fix 1: Use FQDN with trailing dot β "payment-api.production.svc.cluster.local." Fix 2: Set ndots:2 in pod dnsConfig β max 2 extra lookups instead of 4
At 100 RPS with three external API calls per request, this generates 300 wasted DNS round-trips per second to CoreDNS. It is invisible in application metrics. It shows up as elevated CoreDNS CPU, increased p99 on external API calls, and a pattern of three NXDOMAIN responses per useful answer in a DNS packet capture. The fix is two lines of YAML per Deployment.
# Set ndots:2 on any Deployment that makes high-volume external DNS calls.
# Default is 5 β causes 3-4 DNS queries per external hostname resolution.
# This reduces it to 1-2 queries for most hostnames.
apiVersion: apps/v1
kind: Deployment
metadata:
name: payment-api
namespace: production
spec:
template:
spec:
dnsConfig:
options:
- name: ndots
value: "2"
- name: single-request-reopen # avoids kernel race condition on some distros
containers:
- name: payment-api
image: payment-api:v1.4.0β‘ Pro Tip
service.namespace.svc.cluster.local (the full FQDN) for cross-namespace service calls, and set ndots:2 on Deployments that primarily call external APIs. Short service names within the same namespace work fine with the default β the search domain appends correctly. The problem is exclusively with external hostnames and FQDNs that happen to have fewer than five dots.CoreDNS Is a Pod β and Shares the Same Risks
CoreDNS runs as a Deployment in kube-system. Two replicas by default. It is subject to the same eviction, node drains, and resource pressure as any other workload. If both CoreDNS pods land on the same node and that node goes down, DNS resolution fails cluster-wide. All service discovery stops. Applications begin timing out on DNS queries. From the outside it looks like every service is down.
Production requirements: minimum 4 CoreDNS replicas, PodAntiAffinity to spread them across nodes, PodDisruptionBudget to prevent simultaneous eviction. CoreDNS OOM kills are a particularly bad production failure because they are silent β no application error, just timeouts.
The Production Disasters
Disaster 1: The Forgotten Default-Deny
π₯ Production Reality
Context: 40-service microservices platform. Six months prior, the security team added a default-deny-all NetworkPolicy to every production namespace. It was documented in a Confluence page that nobody reads. A new service was deployed: notification-service.
Symptoms: notification-service pods were Running. Services existed. DNS resolved. Every outbound TCP connection timed out after 30 seconds. The frontend received no notifications. Error logs showed generic timeout errors with no indication of which layer was dropping packets.
What everyone checked: pod logs (nothing), service selectors (correct), node health (fine), firewall rules (unchanged), adjacent service logs (fine), network team ticket (no changes in six months).
Root cause: Three hours in, someone ran kubectl get networkpolicy -A. The default-deny-all policy in the production namespace was blocking all egress from every new pod. notification-service had no explicit allow policies. It could not reach the database, message queue, or any other service.
Fix: Added explicit ingress/egress NetworkPolicy rules for notification-service. Connectivity restored immediately. Total debugging time: 3 hours. Time to fix: 4 minutes.
Prevention: Add kubectl get networkpolicy -A to your runbook as step 2 β after confirming pods are Running and before checking anything else. Every new service deployed into a namespace with a default-deny policy needs its own allow rules. This is not optional. It should be part of your service template.
Disaster 2: Flannel + NetworkPolicy = False Security
π₯ Production Reality
Context: A 30-node on-prem cluster running Flannel. The team had deployed comprehensive NetworkPolicies over 18 months: default-deny per namespace, explicit allow rules, 3-tier application isolation, database protection. The security documentation proudly noted βmicro-segmentation implemented via Kubernetes NetworkPolicy.β
The audit: An external security consultant ran a penetration test. From a compromised pod in the frontend namespace, they reached the PostgreSQL database in the data namespace directly β bypassing the NetworkPolicy that was explicitly supposed to prevent this.
Root cause: Flannel enforces nothing. Every NetworkPolicy object in the cluster was a configuration artifact with no runtime effect. The frontend pod could reach the database because Flannel was delivering every packet from every pod to every other pod unconditionally.
Fix: Migrated from Flannel to Calico. Migration required a maintenance window, CNI replacement on every node, and verification that all existing policies worked as expected under actual enforcement. Some policies had bugs that had never been caught because they had never been enforced.
Prevention: After installing any CNI, immediately verify enforcement by testing a connection that should be blocked. The test takes 30 seconds. It is the only way to confirm your security model is real and not theatrical.
The Wall of Shame
π Senior Engineer Confession
- Using Flannel and expecting NetworkPolicy to work. Installing a lock on a door that has no mechanism for locks. The lock is very shiny. It appears in
kubectl describe networkpolicywith all the correct fields. The door opens freely. Every pod can reach every other pod. Your security model is a carefully formatted YAML file that the kernel has never read. Run Calico or Cilium if you want enforcement. - ndots:5 causing five DNS queries per external lookup. Asking Google Maps for directions five times before accepting that the first answer was correct. CoreDNS answers with NXDOMAIN three times and a real answer once, for every single external hostname your application touches. At scale this triples CoreDNS CPU and adds measurable latency to every request that makes external calls. Two lines of YAML in your Deployment spec. That is the entire fix.
- Not using FQDN for cross-namespace service calls. Calling a colleague by first name when there are 14 people with that name in the building.
payment-apiresolves topayment-apiin your current namespace. If there is no payment-api in your namespace, it fails.payment-api.production.svc.cluster.localresolves correctly from any namespace, every time. Use the full name. It is unambiguous. It is self-documenting. The extra characters are not a performance issue. - Hardcoding pod IPs anywhere they might outlive a deployment. Saving your Uber driver's personal number. Works great. Until they get a new phone β which happens on every pod restart, every node drain, every deployment rollout. Pod IPs are ephemeral by design. This is not a bug. Service DNS exists specifically so you never need to know or record a pod IP. Use it. Today.
- No NetworkPolicy in any namespace. An office building with no access control. Anyone who gets past the front door can walk into any room: the server room, the CEO's office, the data center, the payroll database. The Kubernetes default is wide-open. It is your responsibility to apply a default-deny policy. The default is not a safe starting point. It is a starting point that assumes your cluster is a trusted, private environment. Almost no production cluster is that.
- Using hostNetwork: true for everything because the networking seemed complicated. Moving every employee into the lobby because the lobby has better wifi. Yes, all pods on the same node share the same network namespace, the same IP, the same port space, and β critically β bypass NetworkPolicy entirely. It solves whatever immediate problem motivated it and creates a set of new problems that are significantly harder to untangle.
- Not understanding that Services have stable DNS names and relying on ClusterIP strings instead. Giving everyone the building's physical address when they could just search the company name. The ClusterIP is an implementation detail that changes when the Service is recreated. The DNS name is the stable contract.
payment-api.production.svc.cluster.localsurvives Service recreation.10.96.45.12does not. Use names. - Running pods with the default ServiceAccount and not thinking about it. Every visitor gets a master badge because creating individual badges seemed like too much work. The default ServiceAccount in most namespaces has no token by default in Kubernetes 1.24+, but in older clusters or with
automountServiceAccountToken: true, pods get a token that can query the Kubernetes API. A compromised pod with API access can enumerate Services, Secrets, and ConfigMaps. Disable token mounting unless explicitly needed. Create purpose-specific ServiceAccounts with minimal RBAC.
Production Best Practices
- Choose your CNI based on requirements, not familiarity. Flannel for simple dev/lab clusters only. Calico for production with NetworkPolicy. Cilium for scale, eBPF performance, and L7 observability. AWS VPC CNI for EKS. Migrating CNI on a live cluster is a maintenance window, not a config change.
- Apply default-deny NetworkPolicy before deploying workloads. Namespace security posture starts open. You close it. Do this first. Include DNS egress (port 53 UDP/TCP) or pods cannot resolve any hostname at all.
- Verify NetworkPolicy enforcement immediately after CNI installation. Apply a policy that should block a connection. Test it. If the connection still works, your CNI does not enforce policy and every security assumption in your cluster is wrong.
- Set ndots:2 on Deployments that make high-volume external DNS calls. Default ndots:5 causes 3-4 DNS queries per external hostname. At scale this multiplies CoreDNS load significantly. Two lines of YAML in your pod spec.
- Use FQDNs for cross-namespace service calls.
service.namespace.svc.cluster.localis unambiguous, portable, and resolves correctly from any namespace. Short names within the same namespace are fine. Across namespaces, use the full path. - Run at least 4 CoreDNS replicas with PodAntiAffinity and a PodDisruptionBudget. CoreDNS is cluster-wide DNS infrastructure. Two replicas on the same node is a single point of failure with extra steps.
- Check
kubectl get networkpolicy -Awithin the first 5 minutes of any connectivity incident. Add it to your runbook. The incident in this article cost 3 hours because this command was run last. - Set pod MTU to 1450 when using VXLAN overlays on 1500-byte MTU networks. Oversized packets get fragmented or silently dropped. This causes intermittent failures under load that are extremely difficult to diagnose without explicitly looking at packet sizes.
FAQ
Why does ping to a ClusterIP always time out?
ClusterIP is a virtual IP that only exists in iptables/IPVS rules. Those rules apply to TCP and UDP connections. ICMP (ping) is not handled by the DNAT rules. No process listens on the ClusterIP. Use curl or nc to test Service connectivity β not ping. A ClusterIP that responds to curl and times out on ping is working correctly.
Can I run Flannel and add NetworkPolicy enforcement without replacing it?
Yes. Calico can run in βpolicy-onlyβ mode alongside Flannel β Flannel handles routing while Calico enforces NetworkPolicy. This is a supported migration path but adds operational complexity (two CNI components, two sets of logs, two upgrade tracks). Most teams choose this as a stepping stone and then fully migrate to Calico.
What is the difference between ingress and egress in NetworkPolicy?
Ingress rules on a pod control what traffic is allowed in. Egress rules control what traffic the pod is allowed to send out. Both must be satisfied for a connection to succeed β if Pod A has egress permission to reach Pod B, but Pod B has no ingress rule allowing traffic from Pod A's namespace, the connection is blocked at Pod B. Both sides must have matching allow rules.
What happens to existing connections when a NetworkPolicy is applied mid-flight?
Calico translates new policies into iptables rules immediately. Existing TCP connections that are already established (in conntrack) are not immediately terminated β conntrack state takes precedence until the connection closes naturally. New connection attempts are evaluated against the new rules immediately. With Cilium eBPF, the behavior is similar: existing connection state is preserved, new connections apply the new policy. There is no βrestart connections to apply policyβ step required.
Why does my NetworkPolicy have no effect even though the CNI supports it?
Four common causes: (1) the podSelector does not match the pod's actual labels β run kubectl get pod --show-labels and compare; (2) thepolicyTypes field is missing β without it, the policy only applies to the directions implied by the rules present; (3) for cross-namespace rules, thenamespaceSelector labels do not match the namespace's actual labels; (4) the policy is in the wrong namespace. NetworkPolicy is namespace-scoped. A policy in staging has no effect on pods in production.
π€ The 60-Second Interview Answer
Back in the interview. All five follow-up questions answered. Here is how you deliver the complete answer β covering the simple path and the kernel detail that gets you the offer:
π€ Say This Out Loud Until You Own It
βYes, pods can talk directly without a Service β every pod has a real, routable IP and Kubernetes guarantees flat L3 connectivity across the cluster. The Service exists for discoverability and load balancing, not to enable connectivity.
At the Linux level, each pod runs in its own network namespace β an isolated copy of the network stack with its own routing table, iptables rules, and interfaces. The pause container holds the namespace open; application containers join it.
The CNI plugin connects the pod to the node via a veth pair: one end appears as eth0 inside the pod, the other end on the host connects to a Linux bridge called cni0. Same-node pod-to-pod traffic is pure L2 switching through that bridge. For cross-node traffic, Flannel wraps packets in VXLAN with about 50 bytes of overhead; Calico uses BGP to advertise pod CIDRs and routes packets natively with zero encapsulation.
Critical production point: if your NetworkPolicy isn't working, the first thing to check is whether your CNI enforces NetworkPolicy at all. Flannel does not. An ingress NetworkPolicy with Flannel installed is a door with no lock β it looks configured, it does nothing. Only Calico, Cilium, and Weave actually enforce it.
For DNS, kubelet injects a resolv.conf into every pod pointing at CoreDNS with ndots:5 and three search domains. ndots:5 causes up to four DNS queries per hostname with fewer than five dots. For external-heavy workloads, set ndots:2 in your pod's dnsConfig. And for cross-namespace calls, always use the full FQDN β service.namespace.svc.cluster.local.β
If you can say that in one breath, you're getting the job.
Key Takeaways
- βPods can communicate directly without a Service β the Service provides discovery and load balancing, not connectivity.
- βEach pod has its own Linux network namespace β isolated routing table, iptables, interfaces. The pause container holds it.
- βCNI creates veth pairs: one end in the pod namespace (eth0), one end on the host connected to the cni0 bridge.
- βCross-node traffic: Flannel uses VXLAN (~50B overhead), Calico uses BGP native routing (zero overhead).
- βFlannel does not enforce NetworkPolicy. Neither does any CNI that is not Calico, Cilium, or Weave.
- βndots:5 causes 4 DNS queries per external hostname. Set ndots:2 on external-heavy Deployments.
- βNetworkPolicy default is wide-open. You must explicitly apply default-deny. You must verify the CNI enforces it.
Targeting a Kubernetes or Platform Engineering Role?
AiResumeFit matches your resume to Kubernetes, cloud, and SRE job descriptions β surfacing gaps in CNI experience, eBPF, NetworkPolicy, and observability so you walk in with the right keywords and the right stories.
Optimize My Resume β