SecurityπŸ”₯ Production-Critical⚑ Senior Engineer Level

Kubernetes Security Best Practices: RBAC, Secrets, PodSecurity, and the Defaults That Will Get You Breached

The interview question every platform engineering candidate faces β€” answered at the architecture level. RBAC vs ClusterRole, why Secrets are not secret, what PodSecurity actually blocks, and the ServiceAccount binding from a tutorial eight months ago that let a cryptominer deploy itself.

β€œThe cluster wasn't attacked. It was already open. Someone just noticed.”

Updated June 11, 2026|26 min read|Has prevented 2 cryptominer incidents

β€” Security Alert β€”

It's 2:17 AM.

Your security team sends an alert. A pod in your cluster is mining cryptocurrency.

You check:

βœ… Pod is Running in the production namespace

βœ… No alerts fired when it was created

βœ… API server is healthy

βœ… All your team's Deployments are accounted for

The pod wasn't deployed by your team.

It used the default ServiceAccount to create itself via the Kubernetes API.

That ServiceAccount had a cluster-admin ClusterRoleBinding.

From a tutorial. Someone followed it. Eight months ago.

Everything looked fine. Nothing alerted. Until the GPU bill arrived.

You don't know how this happened. You're about to.

Three Months Later. A Different Kind of War Room.

No security alert this time. Fluorescent lights. A conference room at a company you want to work for. The interviewer β€” calm, unhurried β€” asks one question:

β€œHow would you secure a Kubernetes cluster?”

You take a breath. You survived an incident. You know this. You start listing controls.

The Answer That Gets 80% of Candidates Eliminated

Most people answer with a checklist. Four boxes. Clean. Sounds complete.

RBAC
β–Ό
NetworkPolicy
β–Ό
Secrets Management
β–Ό
PodSecurity Admission
β–Ό
Done βœ“

The interviewer nods. Writes something down. Then looks up.

Interviewer keeps going:

❓ β€œWhat is the difference between a Role and a ClusterRole?”

❓ β€œAre Kubernetes Secrets actually secret?”

❓ β€œWhat is a ServiceAccount and why does every pod have one by default?”

❓ β€œWhat does PodSecurity admission actually prevent?”

❓ β€œYou set RBAC to read-only for a ServiceAccount. The pod can still read Secrets. Why?”

Five questions. Most candidates have given the checklist answer so many times they never thought past it. Engineers who can answer all five β€” and explain the counterintuitive edges β€” walk out with the offer. Let's answer every one.

Before we go architecture-deep, here's the mental model that makes everything click. Think of your Kubernetes cluster as an office building:

Office BuildingKubernetes Security
ID badge permissions (who can access which rooms)RBAC β€” controls who can call the Kubernetes API for what
Floor-level badge (one floor only)Role β€” grants permissions within one namespace
Master badge (access to all floors)ClusterRole β€” grants permissions cluster-wide
An employee's identity badge (each person has one)ServiceAccount β€” each pod has an identity the API server recognizes
The safe in the records room (locked, but who has a key?)Kubernetes Secret β€” base64 encoded, not encrypted; RBAC is the lock
Internal access control lists (which departments can call which)NetworkPolicy β€” controls pod-to-pod traffic by selector
Building code compliance (no one can remove fire exits)PodSecurity Admission β€” blocks root, privilege escalation, hostPath
Security checkpoint at the building entranceAdmission Controller β€” validates every resource before it lands
The CCTV system (records who did what, when)Audit Logs β€” every API call logged with subject, verb, resource
Always checking ID even for returning visitorsImagePullPolicy: Always β€” re-validates the image on every pod start

Hold that analogy. Everything below is the exact same thing β€” except the master badge can be bound with a RoleBinding to limit it to one floor, and the safe isn't actually locked until you configure etcd encryption at rest.

Q1: What Is the Difference Between a Role and a ClusterRole?

Most people say β€œRole is for one namespace, ClusterRole is for the whole cluster.” That is true but incomplete β€” and the part they miss is the part that trips up clusters in production.

The correct answer: A Role is namespace-scoped β€” it grants permissions to resources within a single namespace and can only be bound within that namespace. A ClusterRole is cluster-scoped by definition β€” it can grant permissions across all namespaces, or for cluster-scoped resources that don't belong to any namespace at all, like Nodes, PersistentVolumes, and StorageClasses.

Here is the counterintuitive part that most tutorials skip entirely: the scope of access is determined by the binding, not by the Role type.

A ClusterRole bound with a ClusterRoleBinding grants cluster-wide access. That is the dangerous one. A ClusterRole bound with a RoleBinding (in namespace X) grants that ClusterRole's permissions β€” but scoped to that namespace only. This lets you define a reusable role template as a ClusterRole and deploy it narrowly into specific namespaces with a RoleBinding. Many teams do this without knowing it.

  Subject (ServiceAccount, User, Group)
         β”‚
         β–Ό
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚        RoleBinding (namespace-scoped)       β”‚  ──── binds Role in ONE namespace
  β”‚   or   ClusterRoleBinding (cluster-scoped)  β”‚  ──── binds ClusterRole EVERYWHERE
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β–Ό
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚           Role (namespace-scoped)           β”‚  ← namespaced resources only
  β”‚   or   ClusterRole (cluster-scoped)         β”‚  ← can reference any resource type
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β–Ό
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚       Kubernetes API Server: ALLOW/DENY     β”‚
  β”‚  (deny-by-default; first ALLOW wins)        β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

  Four combinations and what they mean:
  ─────────────────────────────────────────────────────────────────────
  Role           + RoleBinding         β†’ namespace-scoped (standard)
  ClusterRole    + ClusterRoleBinding  β†’ cluster-wide (dangerous if broad)
  ClusterRole    + RoleBinding         β†’ namespace-scoped (reusable template pattern)
  Role           + ClusterRoleBinding  β†’ not valid (Kubernetes rejects this)
  ─────────────────────────────────────────────────────────────────────
Role, ClusterRole, and the binding that limits scope
# ── Step 1: A Role is namespace-scoped ──────────────────────────────────────
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: payments-config-reader
  namespace: payments
rules:
  - apiGroups: [""]
    resources: ["configmaps"]
    resourceNames: ["payments-config"]   # lock to a named resource, not all configmaps
    verbs: ["get"]

---
# ── A ClusterRole is cluster-scoped (definition) ────────────────────────────
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: node-reader
rules:
  - apiGroups: [""]
    resources: ["nodes"]       # cluster-scoped resource β€” only a ClusterRole can grant this
    verbs: ["get", "list", "watch"]

---
# ── Counter-intuitive: ClusterRole + RoleBinding = namespace-scoped access ──
# This binds the ClusterRole but limits it to ONE namespace.
# The binding scope, not the Role type, determines the effective scope.
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: payments-sa-rb
  namespace: payments
subjects:
  - kind: ServiceAccount
    name: payments-api-sa
    namespace: payments
roleRef:
  kind: ClusterRole
  name: payments-config-reader   # reusing a ClusterRole, scoped to this namespace only
  apiGroup: rbac.authorization.k8s.io

🚨 Interview Trap

The trap is β€œClusterRole always means cluster-wide access.” It doesn't. The binding determines scope. An interviewer testing RBAC depth will ask: β€œHow can you bind a ClusterRole to give access in only one namespace?” The answer is: use a RoleBinding, not a ClusterRoleBinding. Most candidates who answer the Role vs ClusterRole question correctly still get this wrong. Know the difference between the role definition and the binding scope.

⚑ Pro Tip

Audit every ClusterRoleBinding in your cluster quarterly. The command:
kubectl get clusterrolebindings -o json | jq '.items[] | select(.roleRef.name == "cluster-admin") | .subjects'
In most production clusters, this returns names that will surprise you. Anything non-system in that list is a finding.

Q2: Are Kubernetes Secrets Actually Secret?

No.

Not by default. The name β€œSecret” is marketing. Or historical accident. Either way, it sets an expectation the implementation does not meet.

Kubernetes Secrets are stored in etcd as base64-encoded strings. Base64 is an encoding scheme. It is not encryption. Anyone with kubectl get secret RBAC permission retrieves the value and decodes it in under a second. Anyone with direct access to the etcd backup file reads every secret in the cluster without touching Kubernetes at all.

what a Kubernetes Secret actually is β€” and what to use instead
# What a Kubernetes Secret actually looks like in etcd:
# base64-encoded. Not encrypted. Decodable in 0.3 seconds.
#
# apiVersion: v1
# kind: Secret
# data:
#   password: cGFzc3dvcmQxMjM=   ← echo -n 'password123' | base64

# Decode any secret:
kubectl get secret payments-db-creds -n payments \
  -o jsonpath='{.data.password}' | base64 -d
# β†’ password123
# That's it. That's the "security."

---
# ── The right approach: External Secrets Operator ───────────────────────────
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: payments-db-creds
  namespace: payments
spec:
  refreshInterval: "1h"
  secretStoreRef:
    name: aws-secretsmanager
    kind: SecretStore
  target:
    name: payments-db-creds
    creationPolicy: Owner
  data:
    - secretKey: password
      remoteRef:
        key: prod/payments/database
        property: password
# The Kubernetes Secret is created by ESO, rotated from AWS Secrets Manager,
# audited in AWS CloudTrail. The source of truth is the external store.

The name β€œSecret” means: β€œthis value is intended to be sensitive, so we store it separately from ConfigMaps and we only expose it to the pods and users with RBAC permission to read it.” That is it. RBAC is the lock. base64 is not.

What actually makes Secrets secure

Three controls, in order of impact:

  • etcd encryption at rest. Configure an EncryptionConfiguration on the API server. Use the KMS provider to integrate with AWS KMS, GCP KMS, or Azure Key Vault. Now a stolen etcd backup is useless β€” the values are encrypted with a key stored outside etcd.
  • RBAC with minimal scope. Only the pods that need a secret should have RBAC to read it. Lock down with resourceNames to restrict access to specific named Secrets. get is not the same as list β€” list secretsreturns all of them.
  • External secret stores. AWS Secrets Manager, HashiCorp Vault, GCP Secret Manager. Use the External Secrets Operator to sync values into Kubernetes Secrets automatically. The source of truth lives outside Kubernetes with its own audit trail, rotation policies, and access controls. The Kubernetes Secret becomes a cache. A breach of the cache exposes one rotation cycle, not your credential history.

πŸ”₯ Production Reality

Enable audit logging with level: RequestResponse on secrets resources. This records what was returned, not just that a request was made. When a secret is accessed by an unexpected subject at an unexpected time, the audit log is your evidence. Without it, the cryptominer story at 2:17 AM has no forensic trail.

The audit config is in this article. Set it up before the incident.

🧠 Memory Trick

β€œSecret” means β€œintended to be sensitive.” Security comes from etcd encryption + RBAC. The three questions to ask about any secret in your cluster: Is etcd encrypted? Who hasget or list RBAC? Is it mounted in any pod that could be compromised? All three need answers you can give.

Q3: What Is a ServiceAccount and Why Does Every Pod Have One by Default?

A ServiceAccount is a Kubernetes identity for a pod. The API server uses it to authenticate calls from within the cluster β€” to answer: β€œwho is making this API request?” when the caller is a pod, not a human.

Every pod automatically gets the default ServiceAccount for its namespace unless you specify otherwise. That default SA gets a token mounted at/var/run/secrets/kubernetes.io/serviceaccount/token. The token is valid. The token can authenticate to the API server. The API server will then check RBAC to decide what it can do.

Here is the problem: many clusters have β€” usually via someone following a tutorial β€” granted the default ServiceAccount broad permissions. Sometimescluster-admin. Sometimes just get secrets on all namespaces. Either way, every pod in that namespace inherits those permissions, including pods deployed by attackers who exploited a vulnerability in your application.

the right ServiceAccount pattern β€” opt-out by default, opt-in per pod
# ── Never use the default ServiceAccount ────────────────────────────────────
# This is what happens by default (a disaster waiting to happen):
#
#   spec:
#     # serviceAccountName: default  ← implicit
#     # automountServiceAccountToken: true  ← implicit
#
# The token is mounted at:
# /var/run/secrets/kubernetes.io/serviceaccount/token
# Any code in the container can read it and call the Kubernetes API.

---
# ── What you should do instead ──────────────────────────────────────────────
apiVersion: v1
kind: ServiceAccount
metadata:
  name: payments-api-sa
  namespace: payments
automountServiceAccountToken: false   # opt-out at the SA level β€” every pod using this SA
                                       # gets NO token unless explicitly opted in

---
# Then in the Pod spec, only opt-in if the pod genuinely needs API access:
spec:
  serviceAccountName: payments-api-sa
  automountServiceAccountToken: false  # belt and suspenders β€” explicit

# Check what tokens are being mounted right now:
# kubectl get pods -A -o json | jq \
#   '.items[] | select(.spec.automountServiceAccountToken != false) |
#    .metadata.namespace + "/" + .metadata.name'
# The output will surprise you.

The rule is: explicitly set automountServiceAccountToken: false at the ServiceAccount level unless the pod genuinely needs to call the Kubernetes API. Operators, controllers, and CD tooling need API access. Your stateless API server, your batch job, your frontend pod β€” they almost certainly do not.

πŸ˜… Senior Engineer Confession

β€œI followed the Helm chart quickstart, it said add cluster-admin to get the operator working, I said fine and moved on. That was eighteen months before anyone audited it.” Every experienced engineer has a version of this story. The cluster-admin binding from a tutorial is not a security failure. It is a configuration debt failure. The failure is not reviewing it quarterly. Treat every ClusterRoleBinding to a non-system subject as a quarterly review item. Not a suggestion.

Q4: What Does PodSecurity Admission Actually Prevent?

Most people say β€œPodSecurity stops privileged containers.” That is correct but it is about a quarter of the answer. The more important question is: what does PodSecurity not prevent, and why does that matter?

PodSecurity Admission (PSA) replaced PodSecurityPolicy in Kubernetes 1.25. It is a built-in admission controller β€” no installation required β€” that enforces one of three security profiles at the namespace level via namespace labels.

Three profiles, one clear hierarchy

  • Privileged. Anything goes. Use for system namespaces (kube-system, Cilium DaemonSets, Falco). Not for application workloads.
  • Baseline. Prevents known privilege escalation vectors. Blocks privileged: true, hostNetwork, hostPID,hostIPC, and hostPath volumes. Good for workloads that cannot immediately meet Restricted.
  • Restricted. Hardened. Everything in Baseline, plus: must run as non-root, must set allowPrivilegeEscalation: false, must drop ALL capabilities, and must use seccompProfile: RuntimeDefault or a custom profile. This is what production application workloads should use.
PSA namespace labeling and a compliant Pod spec
# Namespace labeled for PSA Restricted enforcement:
kubectl label namespace payments \
  pod-security.kubernetes.io/enforce=restricted \
  pod-security.kubernetes.io/enforce-version=v1.30 \
  pod-security.kubernetes.io/audit=restricted \
  pod-security.kubernetes.io/warn=restricted

---
# Pod spec that PASSES the Restricted profile:
spec:
  securityContext:
    runAsNonRoot: true
    runAsUser: 1000
    seccompProfile:
      type: RuntimeDefault

  containers:
    - name: payments-api
      image: registry.internal/payments-api:sha256-abc123  # pinned digest, NEVER :latest
      securityContext:
        allowPrivilegeEscalation: false
        readOnlyRootFilesystem: true
        runAsNonRoot: true
        runAsUser: 1000
        capabilities:
          drop: ["ALL"]     # drop every Linux capability β€” add nothing back
      volumeMounts:
        - name: tmp
          mountPath: /tmp   # only writable path β€” explicit

  volumes:
    - name: tmp
      emptyDir: {}

# What Restricted BLOCKS:
# - privileged: true
# - hostNetwork / hostPID / hostIPC: true
# - hostPath volumes (any)
# - allowPrivilegeEscalation: true (or absent)
# - running as root (UID 0)
# - capabilities beyond the minimal set

What PSA does NOT prevent

This is the part that matters for the interview:

  • Excessive RBAC permissions. PSA enforces what the pod looks like at creation time. It says nothing about what the pod can do once running β€” that is RBAC's job.
  • Insecure images. A Restricted pod can still run an image with 200 HIGH CVEs and a cryptominer baked in. Image scanning and signing are separate controls.
  • Lateral movement via the network. A Restricted pod can still make connections to every other pod in the cluster. That is NetworkPolicy's job.
  • Secrets already mounted. PSA does not inspect what secrets are in a pod's volumes. A Restricted pod with a mounted cluster-admin ServiceAccount token is still dangerous.

🚨 Interview Trap

The trap is treating PSA as a comprehensive security control. It is a container hardening control. A hardened container running with a cluster-admin ServiceAccount token is still a cluster-admin. PSA narrows what an attacker can do inside the container. RBAC narrows what they can do with the Kubernetes API. NetworkPolicy narrows where they can go. All three. Not one.

Q5: RBAC Is Read-Only. The Pod Still Reads Secrets. Why?

This is the question that separates engineers who understand the system from engineers who memorised the documentation. The wrong answer: β€œRBAC controls all secret access.”

The right answer: RBAC controls API access. Volume mounts bypass the API entirely.

When a Secret is mounted as a volume or injected as an environment variable at pod creation time, the kubelet retrieves it from the API server during pod setup and writes it directly to the container's filesystem. Once that happens, the container reads the file directly β€” no API call, no RBAC check. The file is on the container's local filesystem at whatever mount path you specified.

Limiting the ServiceAccount's RBAC to get secrets prevents the container from making new Kubernetes API calls to read secrets programmatically. It does not evict secrets that were already mounted when the pod started. If your pod spec has a secretKeyRef or a secretVolume, the container will read that secret regardless of what the ServiceAccount's RBAC says.

The implication: your security review cannot stop at β€œwhat RBAC does this ServiceAccount have?” You must also review what is mounted in every pod spec. Both vectors matter.

🧠 Memory Trick

Three ways a pod can access a secret:
1. API call β†’ RBAC-controlled.
2. Environment variable (secretKeyRef) β†’ set at pod creation, no RBAC check after.
3. Volume mount β†’ kubelet retrieves it at pod start, then it's a file.

RBAC only protects vector 1. Auditing pod specs for mounted secrets covers vectors 2 and 3. Both are required. Neither is optional.

⚑ Pro Tip

To audit what secrets are mounted in your cluster:
kubectl get pods -A -o json | jq '.items[] | {ns: .metadata.namespace, pod: .metadata.name, secrets: [.spec.volumes[]? | select(.secret) | .secret.secretName]} | select(.secrets | length > 0)'
This shows every pod and which secrets it has volume-mounted. Run this before your next security review. There will be things on this list you forgot about.

Supply Chain Security: The Attack Vector Nobody Audits Until After

RBAC, NetworkPolicy, and PodSecurity protect your cluster from the inside out. Supply chain security protects you from what enters the cluster in the first place.

The attack surface is: your application code, your dependencies, your base images, your build tools, and your registry. A malicious package introduced via a transitive dependency can deploy a cryptominer just as effectively as an over-privileged ServiceAccount. Both vectors are in the incident reports. Both require controls.

Image scanning

Scan every image with Trivy before it reaches production. Fail CI on HIGH and CRITICAL CVEs with available fixes. Running an image with 47 HIGH vulnerabilities is a choice. It should be a deliberate, documented exception β€” not the default.

Image signing with Cosign

Cosign signs container images and stores the signature in the OCI registry alongside the image. An admission controller (Kyverno or OPA Gatekeeper) can then block any unsigned image from entering the cluster. An attacker who compromises your registry cannot deploy their image without your CI signing key.

ImagePullPolicy: Always

In production, set imagePullPolicy: Always. The default IfNotPresentmeans a node that already has the image cached will use that cached copy β€” even if the registry version has been updated to include a security fix. For production workloads, always verify freshness on every start.

Restrict registries via admission

Block Docker Hub and public registries in production via an admission webhook policy. Only images from your internal registry β€” which you scan, sign, and control β€” should be deployable. A developer who pulls an unvetted image from Docker Hub in production is a supply chain breach waiting to happen.

Cosign image signing and a Kyverno policy that enforces it
# Sign an image with Cosign after CI builds it:
cosign sign --key cosign.key registry.internal/payments-api:v1.2.3

# Verify before deploying:
cosign verify --key cosign.pub registry.internal/payments-api:v1.2.3

---
# Kyverno policy: only allow signed images from your registry
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: verify-image-signatures
spec:
  validationFailureAction: Enforce
  rules:
    - name: check-image-signature
      match:
        any:
          - resources:
              kinds: ["Pod"]
      verifyImages:
        - imageReferences:
            - "registry.internal/*"
          attestors:
            - entries:
                - keys:
                    publicKeys: |-
                      -----BEGIN PUBLIC KEY-----
                      MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAE...
                      -----END PUBLIC KEY-----
  # If an image is not signed by your CI key, it is blocked at admission.
  # An attacker who compromises the registry cannot deploy their image
  # without the private signing key.
audit-policy.yaml β€” the CCTV configuration for your cluster
# Kubernetes Audit Policy β€” the CCTV system for your cluster
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
  # Every Secret access logged with request + response body
  # (you need to know WHAT was returned, not just that it was accessed)
  - level: RequestResponse
    resources:
      - group: ""
        resources: ["secrets"]

  # Log privilege escalation attempts (new ClusterRoleBindings, etc.)
  - level: Request
    verbs: ["create", "update", "patch"]
    resources:
      - group: "rbac.authorization.k8s.io"
        resources: ["clusterrolebindings", "rolebindings", "clusterroles", "roles"]

  # Log exec into pods β€” the most common post-compromise action
  - level: Request
    resources:
      - group: ""
        resources: ["pods/exec", "pods/attach", "pods/portforward"]

  # Drop read-only noise from system components
  - level: None
    verbs: ["get", "list", "watch"]
    users: ["system:kube-proxy", "system:serviceaccount:kube-system:kube-proxy"]

  # Default: metadata only for everything else
  - level: Metadata

The Production Disasters (Two Real Failures, One Lesson)

Story 1: The tutorial cluster-admin

A developer joined a six-person startup. They were setting up Helm in a new EKS cluster and hit a permission error on the third helm install. A Stack Overflow answer from 2019 said to run:

kubectl create clusterrolebinding default-admin --clusterrole=cluster-admin --serviceaccount=default:default

It worked. Helm installed. The developer moved on. Nobody documented it. Nobody reviewed it. The cluster grew. Six months later: a third-party analytics container had a server-side request forgery vulnerability. The attacker used it to read the pod's mounted ServiceAccount token. That token had cluster-admin.

The audit logs β€” which did exist β€” showed the API calls: kubectl get secrets -Aequivalent, then database credentials exfiltrated, then a cryptominer deployed via the API, then lateral movement to three other namespaces. The ServiceAccount binding had been there the whole time. Audit logs caught it after 72 hours, not in real time, because nobody had set alerts.

Prevention: monthly ClusterRoleBinding audit. Falco alert on unexpectedkubectl exec and bulk secret reads. The cluster-admin binding was visible in plain sight in kubectl get clusterrolebindings for six months. No one looked.

Story 2: Secrets in environment variables in CI logs

A fintech team stored their database password as a Kubernetes Secret and mounted it as an environment variable in their API deployment. Standard practice. Secure enough.

A junior developer, debugging a startup failure, ran kubectl describe podand pasted the output into the team's public Slack channel to ask for help. The output included the full environment variable list, including aDATABASE_URL that was not from a Secret β€” it was a plain value:field in the Deployment manifest. Someone had added it during a debug session fourteen months earlier and never removed it. The password was in plaintext in Slack. In fourteen months of build logs. In three Git commit histories.

This was not a Kubernetes failure. It was a Secret hygiene failure. The Kubernetes Secret was fine. The plain value: field in the Deployment was not. The lesson: securing the Kubernetes object is not enough. You need to audit the full propagation path: Deployment manifest β†’ Git β†’ CI logs β†’ pod describe output β†’ Slack. At each step, is the value exposed?

πŸ”₯ Production Reality

Both stories have the same root cause: a security control existed but nobody checked whether it was working. The ClusterRoleBinding was reviewable. The Deployment manifest was in Git. Audit logging was enabled. The answers were findable before the incidents. Security is not a deployment. It is a practice. The controls mean nothing without the quarterly review, the alert configuration, and the culture that treats β€œjust give it cluster-admin” as a security finding, not a shortcut.

The Wall of Shame: Eight Mistakes That Are in Production Right Now

πŸ˜… Senior Engineer Confession

Every item on this list appears in real production clusters at companies you have heard of, running right now. The first step is recognising them. The second step is YAML.
  1. cluster-admin on the default ServiceAccount. Giving every visitor to the building a master key to the server room because they needed to use the printer. The default ServiceAccount is shared by every pod in the namespace that doesn't specify otherwise. Granting it cluster-admin grants cluster-admin to every unspecified pod in the cluster. An attacker who gets code execution in any of those pods owns the cluster. This is the exact mechanism of the 2:17 AM incident. Run kubectl get clusterrolebindings -o wideright now. Fix what you find.
  2. Kubernetes Secrets are base64, and people think that's encryption. Hiding a spare key under the doormat and calling it a β€œsecure location.” Base64 is reversible by every Unix system in the world with one command. Without etcd encryption at rest, a stolen etcd backup is a complete credential dump. Enable encryption at rest. Use an external secret store. Treat every unencrypted etcd backup as a breach event.
  3. No audit logging configured. A building with no CCTV. You know someone took something. You have no idea who, when, or what they touched on the way out. Without audit logs, a breach investigation is guesswork. The audit policy is 40 lines of YAML. The regret of not having it when you need it is considerably longer.
  4. Running containers as root. Giving every contractor in the building full admin rights to all systems because it was easier than scoping their access. Running as UID 0 inside a container does not give you root on the host β€” until there is a container escape vulnerability. At that point, root in the container means root on the node. SetrunAsNonRoot: true and runAsUser: 1000 in the securityContext. Every container. Every workload. No exceptions.
  5. imagePullPolicy: IfNotPresent in production. Trusting that the milk you bought last week is still fine because you don't want to check. A cached image on a node is the version that was there when the node was provisioned or when the image was last pulled. If a security patch was released and you updated the registry, nodes with cached images will not pick it up until forced. In production, imagePullPolicy: Always. The extra second on pod start is not worth the compromise window.
  6. No NetworkPolicy (flat network). An office where every employee can walk into every room because β€œwe trust our people.” The moment one person is compromised, every room is compromised. A default-deny NetworkPolicy with explicit allow rules means a compromised frontend pod cannot reach the database directly. The blast radius of a compromise is contained to the permitted network paths. Thirty lines of YAML prevent lateral movement across the entire cluster.
  7. Wildcard RBAC rules (resources: ["*"], verbs: ["*"]). Writing an access badge that says β€œyes to everything” because listing specific rooms seemed tedious. A wildcard RBAC rule is cluster-admin by another name. It grants access to resources that don't even exist yet. Every new resource type added to the cluster is automatically accessible to every wildcard binding. Always enumerate specific resources and verbs. Always.
  8. Secrets mounted as env vars, then logged in CI. Writing your password on a post-it, photographing it, uploading the photo to twelve different systems, and being surprised when it leaks. Environment variables appear in crash dumps, in kubectl describe pod output, in CI job logs when the environment is printed during debugging, and in application logs when the application does what many applications do and logs its configuration at startup. Mount secrets as files. Read them at the specific path. Never log the environment.

Production Best Practices

  1. Audit all ClusterRoleBindings quarterly. Any non-system subject with cluster-admin is a finding. Treat it as one. Fix it in the same sprint you find it.
  2. automountServiceAccountToken: false on every ServiceAccount by default. Opt-in per pod for the workloads that genuinely need API access. This is the single control that would have prevented the cryptominer incident.
  3. Enable etcd encryption at rest with the KMS provider. A stolen etcd backup should be useless. Without this, it is a complete credential dump.
  4. Apply PSA Restricted to every production namespace. Label the namespace. Run in audit mode first to see what breaks. Remediate. Then enforce.
  5. Default-deny NetworkPolicy in every namespace on day one. Include the DNS egress exception. Open only the network paths you explicitly need.
  6. Use External Secrets Operator with an external store. AWS Secrets Manager, HashiCorp Vault, or GCP Secret Manager. The Kubernetes Secret is a cache. The external store is the source of truth.
  7. Configure audit logging with RequestResponse on secrets. Alert when unexpected subjects access secrets. Without this, a breach investigation is guesswork.
  8. Sign all production images with Cosign. Enforce signing in admission. An attacker who compromises your registry cannot deploy without your CI signing key. Kyverno can block unsigned images at admission.

FAQ

Can I use PodSecurityPolicy instead of PSA?

No. PodSecurityPolicy was deprecated in Kubernetes 1.21 and removed in 1.25. If you are running 1.25 or later (which is likely), PSP does not exist. PodSecurity Admission is the replacement β€” it is built in, requires no installation, and is enforced via namespace labels. If you are on an older cluster still using PSP, migrating to PSA before upgrading is the correct sequence.

Does Flannel enforce NetworkPolicies?

No. Flannel does not implement NetworkPolicy enforcement. If you create a NetworkPolicy in a Flannel cluster, it is accepted by the API server and silently ignored at the network level. All pods still have unrestricted access to all other pods. NetworkPolicy enforcement requires a CNI plugin that supports it: Calico, Cilium, Antrea, or Weave. Verify your CNI before relying on NetworkPolicies for security.

What is the difference between seccomp and AppArmor?

seccomp restricts which system calls a process can make. AppArmor restricts which file paths a process can access, which network operations it can perform, and which capabilities it can use. They operate at different levels and are complementary. For most clusters, starting with seccompProfile: RuntimeDefault (blocks ~300 unusual syscalls with zero application changes) is the right first step. AppArmor runtime/default is available on Ubuntu nodes and adds file path restrictions on top.

How do I rotate Secrets without downtime?

With External Secrets Operator: set a refreshInterval on the ExternalSecret. When you rotate in the external store, ESO picks up the new value within one refresh interval and updates the Kubernetes Secret. Applications that read secrets from mounted files pick up the new value dynamically β€” Kubernetes updates the mounted file when the Secret changes (with up to 60 seconds propagation delay). Applications that read from environment variables require a pod restart. This is why mount-as-file beats inject-as-env for secrets that rotate.

Is it ever acceptable to use privileged: true in production?

Rarely. Legitimate cases: node-level observability DaemonSets that need eBPF access (Falco, Cilium, Datadog Agent), some storage CSI drivers, and specific network plugin DaemonSets. These belong in dedicated system namespaces with strong RBAC and explicit PSA Privileged labeling β€” not in application namespaces. If an application developer says their workload needs privileged: true, the correct response is: β€œWhich specific capability does it need?” It is almost always possible to grant a specific Linux capability instead of full privilege.

🎀 The 60-Second Interview Answer

Back in the interview room. The whiteboard is still there. You've answered all five follow-up questions. Here is how you deliver the complete answer β€” covering the surface-level checklist and the architecture depth that gets you the offer:

🎀 Say This Out Loud Until You Own It

β€œKubernetes security is defense in depth across four layers: the cloud infrastructure, the cluster, the container, and the code. Most cluster-level breaches trace back to three root causes: over-privileged ServiceAccounts, unencrypted Secrets, and missing NetworkPolicies.

For RBAC: every workload gets a dedicated ServiceAccount with automountServiceAccountToken disabled by default. Roles are namespace-scoped; ClusterRoles are cluster-scoped β€” but a ClusterRole bound with a RoleBinding only applies in that one namespace. That distinction is how you reuse role definitions without granting cluster-wide access.

For Secrets: they are base64-encoded by default, not encrypted. Security comes from etcd encryption at rest with KMS, RBAC restricting who can read them, and ideally an external secret store synced via External Secrets Operator. Audit logging at RequestResponse level on secrets is non-negotiable β€” it is your forensic record.

For pod hardening: PodSecurity Admission Restricted profile blocks privileged containers, hostPath volumes, root processes, and privilege escalation. It does not protect against excessive RBAC, insecure images, or lateral network movement β€” those are NetworkPolicy and image signing's jobs.

The critical production detail: RBAC controls API access. Volume mounts bypass the API. A read-only ServiceAccount RBAC policy does not protect Secrets that are already mounted in the pod at creation time. Review both vectors. That is how the cryptominer story ends before 2:17 AM.”

If you can say that in one breath, you're getting the job.

Key Takeaways

  • β†’ClusterRole + ClusterRoleBinding = cluster-wide. ClusterRole + RoleBinding = namespace-scoped. The binding determines scope, not the role type.
  • β†’Kubernetes Secrets are base64, not encrypted. Security comes from etcd encryption at rest + RBAC + external secret stores.
  • β†’Every pod gets the default ServiceAccount token unless you opt out. Set automountServiceAccountToken: false at the SA level.
  • β†’PSA Restricted blocks container escape vectors. It does not protect against excessive RBAC, insecure images, or network lateral movement.
  • β†’RBAC controls API access. Volume mounts bypass the API. Both vectors require separate auditing.
  • β†’Audit logs at RequestResponse level on secrets are your forensic record. Configure them before the incident, not after.

Targeting a Security or Platform Engineering Role?

AiResumeFit matches your resume to Kubernetes security, SRE, and platform engineering job descriptions β€” improving your ATS score in seconds.

Optimize My Resume β†’