← Interview Questions
AWS90+ Questions · Beginner to Expert
AWS Interview Questions & Answers (2026)
90+ AWS interview questions with detailed answers. EC2, S3, Lambda, VPC, IAM, RDS, EKS, CloudFormation, architecture design, and real-world scenarios. Beginner to expert.
Beginner
Q: What is the difference between a Region and an Availability Zone in AWS?
**Region:**
A geographic area where AWS operates data centers. Examples: us-east-1 (N. Virginia), eu-west-1 (Ireland), ap-southeast-1 (Singapore). AWS has 33+ regions globally (2026). Each Region is completely independent — a failure in one Region does not affect another.
When to choose a Region: based on data residency requirements (GDPR, data sovereignty), latency to your users, and AWS service availability (not all services are in all regions).
**Availability Zone (AZ):**
An isolated data center (or group of data centers) within a Region. Each Region has 2–6 AZs. AZs within a Region are physically separated (different flood plains, power grids) but connected by low-latency, high-bandwidth fiber links.
AZ naming: us-east-1a, us-east-1b, us-east-1c.
**Why AZs matter for architecture:**
Deploying resources across multiple AZs is the foundation of high availability in AWS:
- An EC2 Auto Scaling Group spanning 3 AZs survives the failure of any single AZ.
- An RDS Multi-AZ deployment automatically fails over to a standby in another AZ.
- An ALB distributes traffic across healthy instances in multiple AZs.
**Rule:** Deploy production workloads across at least 2 AZs. Critical workloads: 3 AZs.
Q: What is an AWS IAM Role, and how is it different from an IAM User?
**IAM User:**
Represents a specific person or application. Has long-term credentials: a password (for console) and/or access keys (for API/CLI). Access keys are static credentials that can be compromised if leaked.
**IAM Role:**
An identity that can be assumed by a trusted entity (EC2 instance, Lambda function, another AWS account, a Kubernetes service account). Has no long-term credentials — when a role is assumed, AWS generates temporary security credentials (valid 15 minutes to 12 hours).
**Why Roles are preferred over Users:**
- **EC2 instances:** Instead of storing AWS access keys in environment variables (risky), assign an IAM Role to the instance profile. The application running on EC2 calls the Instance Metadata Service (IMDS) to get temporary credentials automatically — no credential management needed.
- **Lambda functions:** Execution roles grant the function permissions to call DynamoDB, S3, etc.
- **Cross-account access:** Role assumption enables secure delegation between AWS accounts without sharing long-term credentials.
- **EKS with IRSA:** Kubernetes service accounts can be annotated to assume IAM Roles via OIDC federation — Pods get temporary AWS credentials scoped to exactly what they need.
**Best practice:** Never create long-term access keys for EC2 instances or Lambda functions. Always use IAM Roles. Enforce MFA for IAM Users with console access. Use IAM Identity Center (SSO) for human access to multiple accounts.
Q: Explain the difference between S3 storage classes and when to use each.
AWS S3 offers multiple storage classes optimized for different access patterns and cost profiles.
**S3 Standard:**
- Frequent access, millisecond retrieval.
- 99.99% availability, 3 AZ redundancy.
- Use for: actively served website assets, application data, frequently accessed logs.
**S3 Intelligent-Tiering:**
- Automatically moves objects between frequent and infrequent access tiers based on access patterns.
- No retrieval fee. Small monitoring/automation fee per object.
- Use for: data with unpredictable or changing access patterns.
**S3 Standard-IA (Infrequent Access):**
- Lower storage cost but per-retrieval fee and 30-day minimum storage.
- 3 AZ redundancy, millisecond retrieval.
- Use for: disaster recovery data, backups accessed occasionally.
**S3 One Zone-IA:**
- Like Standard-IA but stored in a single AZ (cheaper, but data lost if AZ is destroyed).
- Use for: reproducible data (can be regenerated), secondary backups.
**S3 Glacier Instant Retrieval:**
- Millisecond retrieval, very low storage cost. 90-day minimum.
- Use for: medical images, media archives accessed a few times per year.
**S3 Glacier Flexible Retrieval:**
- Retrieval: minutes to hours (expedited/standard/bulk). Very low cost.
- Use for: long-term backups, compliance archives.
**S3 Glacier Deep Archive:**
- Cheapest storage tier. 12-hour retrieval.
- Use for: data retained for 7+ years for compliance, rarely (if ever) accessed.
**Cost optimization strategy:** Implement S3 Lifecycle Policies to automatically transition objects through storage classes as they age (e.g., Standard → Standard-IA after 30 days → Glacier after 90 days → Deep Archive after 180 days).
Intermediate
Q: How does AWS VPC networking work? Explain subnets, route tables, and internet gateways.
A Virtual Private Cloud (VPC) is a logically isolated network within AWS that you fully control. Every resource in AWS lives in a VPC.
**CIDR Block:**
A VPC is defined by a CIDR block (e.g., 10.0.0.0/16), which gives you 65,536 IP addresses. You then divide this into subnets.
**Subnets:**
A subnet is a range of IPs within your VPC, tied to a specific Availability Zone. Types:
**Public Subnet (10.0.1.0/24 in us-east-1a):**
- Has a route to the Internet Gateway in its route table.
- Resources in a public subnet with a public IP can reach and be reached from the internet.
- Common residents: Application Load Balancers, NAT Gateways, bastion hosts.
**Private Subnet (10.0.2.0/24 in us-east-1a):**
- No direct route to the internet. Can reach the internet via a NAT Gateway in the public subnet.
- Common residents: EC2 application servers, RDS databases, EKS worker nodes.
- **Security principle:** Always put databases and application servers in private subnets.
**Route Tables:**
Each subnet has an associated route table that determines where traffic is sent. The public subnet route table has:
```
0.0.0.0/0 → igw-xxxxxx (Internet Gateway)
10.0.0.0/16 → local (local VPC traffic)
```
The private subnet route table has:
```
0.0.0.0/0 → nat-xxxxxx (NAT Gateway - for outbound internet, not inbound)
10.0.0.0/16 → local
```
**Internet Gateway:** Enables bidirectional internet communication for public subnet resources.
**NAT Gateway:** Enables private subnet resources to initiate outbound internet connections (software updates, API calls) without being reachable from the internet.
**Security Groups:** Stateful firewall at the resource level (instance, ENI). Only allow rules, no deny rules.
**Network ACLs:** Stateless firewall at the subnet level. Both allow and deny rules. Evaluated in order.
Q: What is AWS Lambda and what are its limitations?
AWS Lambda is a serverless compute service that runs code in response to events without provisioning or managing servers. You provide the code; AWS handles the infrastructure.
**How it works:**
- Upload code (or a container image up to 10GB) and define a trigger.
- Lambda creates execution environments on demand, runs your function, and charges only for compute time (100ms billing increments).
- AWS manages scaling (up to 1,000 concurrent executions per region by default), patching, and availability.
**Triggers:** S3 events, API Gateway, SQS/SNS, DynamoDB Streams, EventBridge, Kinesis, Step Functions, CloudFront, ALB, and more.
**Lambda Limitations (important for interviews):**
- **Execution timeout:** Maximum 15 minutes. Not suitable for long-running processes.
- **Memory:** 128MB–10GB. CPU is proportional to memory.
- **Deployment package size:** 50MB (zip, direct upload) / 250MB (unzipped) / 10GB (container image).
- **Ephemeral storage:** 512MB–10GB in /tmp (not persistent between invocations).
- **Cold starts:** The first invocation of a function (or after inactivity) requires environment initialization (100ms–5s depending on runtime and package size). Can be mitigated with Provisioned Concurrency (pre-initialized environments) at additional cost.
- **Concurrency limits:** 1,000 concurrent invocations per region (soft limit, can be increased). Functions can be rate-limited if limit is reached (throttling).
- **No persistent connections:** Lambda is stateless. Database connection pooling (via RDS Proxy) is essential for functions accessing RDS.
- **VPC cold starts:** Lambda functions inside a VPC previously had significant cold start overhead. AWS largely resolved this with the Hyperplane ENI implementation, but VPC-attached Lambdas still have slightly higher cold starts.
**When NOT to use Lambda:** Long-running batch jobs (use ECS/Fargate/Batch), persistent connection servers (WebSockets — use ECS), or CPU-heavy workloads where EC2/container pricing is more efficient.
Q: Explain AWS EKS and how it differs from self-managed Kubernetes.
Amazon Elastic Kubernetes Service (EKS) is a managed Kubernetes service where AWS operates the Kubernetes control plane (API server, etcd, scheduler, controller manager) on your behalf.
**What AWS manages in EKS:**
- Kubernetes control plane (across 3 AZs for HA).
- Control plane upgrades (you initiate, AWS executes).
- etcd backups.
- Control plane scaling.
- AWS-native integrations (IAM for RBAC via aws-auth ConfigMap or EKS Access Entries, VPC CNI for Pod networking, EBS/EFS CSI drivers).
**What you still manage:**
- Worker nodes (EC2 instances or Fargate) — their OS, updates, kubelet.
- Kubernetes add-ons (CoreDNS, kube-proxy, VPC CNI versions).
- Application deployments, Kubernetes objects.
- Node scaling (Cluster Autoscaler or Karpenter).
**EKS Pricing:**
- $0.10/hour per EKS cluster (~$73/month).
- Worker node EC2 costs (your responsibility).
**Self-managed Kubernetes (kubeadm):**
- You provision and manage master nodes, etcd, all control plane components.
- Full control over configuration (useful for very specific requirements).
- Significantly higher operational burden: you handle upgrades, HA for masters, etcd backup/restore, certificate rotation.
- Only justifiable for cost optimization at very large scale or specific compliance requirements.
**EKS vs. EKS Anywhere vs. ECS:**
- **EKS Anywhere:** Run EKS on-premises (VMware, bare metal) — same API as cloud EKS.
- **ECS (Elastic Container Service):** AWS proprietary container orchestrator. Simpler than Kubernetes, tighter AWS integration, but vendor lock-in and smaller ecosystem. ECS is often faster to get started with for AWS-only workloads.
**Karpenter:** AWS-developed node autoscaler that replaces Cluster Autoscaler. More responsive, efficient, and supports Spot instances, Graviton, and mixed instance types natively. The recommended autoscaler for EKS in 2026.
Advanced / Architecture
Q: Design a highly available, fault-tolerant web application architecture on AWS.
This is the classic AWS architecture interview question. The answer demonstrates understanding of multi-AZ resilience, managed services, and defense-in-depth.
**Architecture (for a 3-tier web application):**
**DNS & CDN Layer:**
- **Route 53:** Latency-based or geolocation routing across regions. Health checks that fail over to a DR region if primary is unhealthy.
- **CloudFront:** CDN for static assets and cacheable content. Reduces latency globally, offloads origin traffic, provides DDoS protection (Shield Standard included).
**Load Balancing Layer (Public Subnet):**
- **Application Load Balancer (ALB):** Deployed across 3 AZs in public subnets. Provides HTTP/HTTPS load balancing, path-based routing, host-based routing, and SSL termination. Integrated with WAF for OWASP protection.
**Compute Layer (Private Subnet):**
- **EC2 Auto Scaling Group** (or **ECS Fargate**) spanning 3 AZs. Health checks remove unhealthy instances automatically. Scaling policies respond to CPU or custom metrics.
- **Target: minimum 2 instances per AZ** — survives 1 full AZ failure without capacity loss.
**Database Layer (Private Subnet):**
- **RDS Aurora Multi-AZ:** Aurora with 2+ read replicas across AZs. Automatic failover in 30 seconds if primary fails. Aurora's storage is inherently multi-AZ at the storage layer.
- **ElastiCache (Redis) with cluster mode:** Caches frequently accessed data, reducing DB load. Replication groups across AZs.
**Asynchronous Processing:**
- **SQS:** Decouple write-heavy operations. Background workers in a separate ASG process messages. Dead-letter queue (DLQ) for failed processing.
- **Lambda:** For event-driven, lightweight processing tasks.
**Storage:**
- **S3:** Static assets, user uploads. Multi-AZ by default. Versioning enabled. Replication to another region for DR.
**Security:**
- **VPC:** Public subnets (ALB, NAT GW), private subnets (EC2, RDS). SGs restrict traffic to minimum required ports.
- **WAF:** Block OWASP Top 10, rate limiting, bot detection.
- **KMS:** Encrypt data at rest (RDS, S3, EBS).
- **Secrets Manager:** Database credentials with automatic rotation.
- **AWS Config + GuardDuty:** Continuous compliance monitoring and threat detection.
**Observability:**
- **CloudWatch:** Metrics, logs, dashboards, alarms to SNS → PagerDuty.
- **X-Ray:** Distributed tracing for the application tier.
- **AWS Health Dashboard:** Track service events affecting your resources.
**Recovery objectives:** RTO < 15 minutes, RPO < 5 minutes with Multi-AZ + automated failover.
Prepare Your AWS Resume Too
Make sure your resume passes ATS before the interview call.