REST (Representational State Transfer) uses HTTP methods to perform CRUD operations on resources identified by URIs.
Method
Action
Idempotent
GET
Read resource
✅ Yes
POST
Create resource
❌ No
PUT
Replace resource
✅ Yes
PATCH
Partial update
⚠️ Maybe
DELETE
Delete resource
✅ Yes
# REST resource: /users/{id}/orders
GET /users/42/orders ← list POST /users/42/orders ← create GET /users/42/orders/7 ← get one PUT /users/42/orders/7 ← replace PATCH /users/42/orders/7 ← update DELETE /users/42/orders/7 ← delete
# Status codes: 200 OK 201 Created 204 No Content 400 Bad Request 401 Unauthorized 404 Not Found 429 Rate Limited 500 Server Error 503 Unavailable
System architecture, scalability, reliability, and CAP theorem
Architecture Evolution
Monolithic Architecture — all components (UI, business logic, data access) live in ONE deployable unit. The entire application ships as a single artifact: one build, one deploy, one process.
Simple deploymentEasy local devSimple debuggingLow latency (in-process)Hard to scale selectivelyTech stack lock-inLong build/test cyclesDeploy risk — all or nothing
When to use a Monolith
✓Early-stage startups — move fast, validate product first
✓Small teams (<10 engineers) — microservices overhead not worth it
✓Well-understood domain where service boundaries aren't clear yet
✗Avoid when components need very different scaling profiles
✗Avoid when teams need to deploy independently at high velocity
💡 Real world: Instagram, Shopify, and Stack Overflow ran monoliths for years at massive scale. The "Majestic Monolith" is a valid, often overlooked pattern. Don't rush to microservices.
// All called in-process — no network hops classUserController { asyncregister(req, res) { const user = await this.db.save(req.body); await this.emailSvc.sendWelcome(user);
res.json(user); // single process
}
}
Layered (N-Tier) Architecture — strict separation into horizontal layers. Each layer ONLY communicates with the adjacent layer directly below it. Classic enterprise pattern — MVC is a variant.
Clear separation of concernsTestable each layer in isolationWell-known patternSinkhole anti-pattern riskRigid layer dependencyAlways traverses all layers
① Presentation Layer
HTTP request/response, input validation, auth. No business logic. Controllers, REST handlers, GraphQL resolvers.
② Business Logic Layer
Core application rules, workflows. No knowledge of HTTP or DB. Pure functions — the heart of your app. Services, Use Cases, Domain objects.
③ Data Access Layer (DAL)
Abstracts storage behind Repositories/DAOs. Business layer doesn't know if you use MySQL or MongoDB. Enables zero-change swaps of storage backends.
④ Database Layer
Actual persistence — SQL, NoSQL, Cache. Transactions, indexes, and query optimization live here.
// Each layer only knows the layer directly below classOrderController { // Presentation asynccreate(req, res) { return res.json(await this.orderService.create(req.body));
} // ← no DB knowledge here
} classOrderService { // Business asynccreate(data) { await this.payGateway.charge(data); // rule returnawait this.orderRepo.save(data); // via DAL
}
}
Microservices Architecture — each service is independently deployable, owns its own database, has a single business responsibility, and communicates via REST/gRPC (sync) or message queues (async). The dominant pattern for large-scale internet systems.
Direct request-response. Caller waits for reply. gRPC is faster (Protocol Buffers). REST is simpler (JSON). Risk: cascading failures — use circuit breakers.
Asynchronous — Events (Kafka/SQS)
Fire and forget. Best for notifications, analytics, billing. Decouples services, better fault tolerance, eventual consistency. Dead letter queues for failures.
Decomposition Strategies
①By Business Capability — User, Order, Payment, Notification. Most common.
②By Domain (DDD Bounded Contexts) — Aligns service boundaries with team ownership.
⚠ Conway's Law: "Systems mirror the communication structure of the organization." Design your team structure first, then your service boundaries.
Non-Functional Requirements (NFRs)
NFRs define how well the system performs — not what it does. They're the quality attributes that separate a production system from a toy. In FAANG interviews, always clarify NFRs before designing.
⚡
Performance / Latency
How fast does the system respond? Always ask about percentile SLOs: p50/p95/p99. Tail latency matters — the 1% of slow requests affect real users.
p50: 12ms (median user) p95: 45ms (most users) p99: 180ms (1 in 100) p999: 800ms (1 in 1000)
💡 Google found 500ms added latency reduced searches by 20%. Amazon: 100ms = 1% revenue loss.
🟢
Availability / SLA
% of time the system is operational and serving requests. Each additional "9" = 10× less downtime. Most FAANG systems target 99.99%+.
99% = 87.6 hrs/yr down 99.9% = 8.76 hrs/yr down 99.99% = 52.6 min/yr down 99.999%= 5.26 min/yr down
SLI (metric) → SLO (target) → SLA (contract with penalties)
📈
Scalability / Throughput
Ability to handle growing load without degradation. Throughput = requests per second (RPS). Design for 3× peak load.
System continues operating correctly despite component failures. MTBF (Mean Time Between Failures) and MTTR (Mean Time To Recovery) are key engineering metrics.
✓Graceful degradation (serve stale data > error)
✓Circuit breakers (fail fast, don't cascade)
✓Bulkhead isolation (contain failures)
✓Retry with exponential backoff + jitter
🔒
Security
CIA Triad: Confidentiality, Integrity, Availability. Defense-in-depth — multiple layers of controls.
→TLS 1.3 for all data in transit
→AES-256 encryption at rest
→OAuth2/JWT for authentication
→Zero-trust network architecture
→Rate limiting at API gateway
🌍
Durability & Data Consistency
Data must survive failures. Design RPO (data loss tolerance) and RTO (recovery time). Replication factor ≥ 3 for critical data.
→Write-ahead logs (WAL) for crash recovery
→Synchronous replication for strong durability
→Cross-region backups for disaster recovery
Scaling Strategies
Vertical Scaling (Scale Up)
Add more CPU/RAM/Disk to a single server. Zero code changes. Simple but has hard limits (largest AWS instance is 448 vCPUs, 24TB RAM) and creates a single point of failure.
Zero code changeSimple opsSPOFCost grows exponentiallyHard ceiling
Horizontal Scaling (Scale Out)
Add more commodity servers behind a load balancer. Requires stateless services (store session in Redis, not in-memory). How all internet-scale systems work — cloud auto-scaling groups.
Distributes requests sequentially: 1→2→3→1→2→3. Simple, zero overhead. Works well when all servers have equal capacity and requests have uniform cost.
Req 1 → Server A Req 2 → Server B Req 3 → Server C Req 4 → Server A ↺
SimpleIgnores server load
Least Connections
Route to server with fewest active connections. Dynamically adapts to variable request duration. Best for WebSocket, streaming, or long-running DB queries.
Server A: 120 conns Server B: 45 conns ← next Server C: 98 conns → Route to B (fewest)
AdaptiveHandles slow requests
IP Hash / Sticky Sessions
hash(clientIP) % N → always same server for same client. Maintains session affinity. Avoid this — store sessions in Redis instead for true stateless scaling.
hash("10.0.1.4") = 4821 4821 % 3 = 2 → Server B
Same IP always → Server B
Session affinityUneven distribution
CAP Theorem (Brewer, 2000) — In any distributed system, you can only guarantee 2 of 3 properties: Consistency, Availability, and Partition Tolerance. Since network partitions are inevitable in real distributed systems, the real choice is between CP or AP.
Consistency
Every read receives the most recent write — or returns an error. All nodes see identical data at any moment. Required for banking, inventory, distributed locks.
Availability
Every request receives a non-error response — though it may not be the latest data. System stays online even during partial failures. Required for social feeds, product catalogs.
Partition Tolerance
System operates despite network splits between nodes. Not optional — network partitions are a physical reality. Every distributed system must be partition tolerant.
PACELC Extension: Even without partitions (E), you still choose between Latency (L) and Consistency (C). A more complete model: CP with low L (Dynamo in strong mode), or AP with low L (Cassandra).
ConsistencyAvailability
Balanced — DynamoDB eventual consistency mode
These patterns solve recurring architectural challenges at system scale. Understanding them is critical for FAANG system design interviews. Each solves a specific class of problem.
Event-Driven Architecture
Components communicate via events published to a message bus (Kafka, EventBridge, SNS). Publishers don't know about subscribers. Excellent for async workflows and high throughput.
Separate write model (Commands) from read model (Queries). Optimize reads independently from writes. Write side emits events; read side builds materialized views.
Write: POST /orders → OrderDB → event → rebuild Read: GET /orders/feed → ElasticSearch view
Read/write optimizedExtra complexity
Event Sourcing
Never store current state directly — store the sequence of events that led to the current state. Rebuild any past state by replaying events. Complete audit trail, temporal queries.
Manage distributed transactions across microservices using a sequence of local transactions + compensating transactions on failure. No distributed 2PC needed.
A dedicated API layer tailored to each frontend client (Web BFF, Mobile BFF, TV BFF). Aggregates backend services and shapes responses for each client's specific needs.
Mobile BFF → lighter payloads Web BFF → richer data TV BFF → streaming data Each team owns their BFF
Client-optimizedTeam autonomy
Strangler Fig Pattern
Incrementally replace a monolith by building new features as microservices and routing traffic. Named after the fig tree that grows around a host tree. Zero-downtime migration strategy.
Phase 1: Monolith handles all Phase 2: /auth → Auth Service Phase 3: /orders → Order Svc Phase 4: Monolith retired
Low risk migrationZero downtime
Infrastructure Components
The building blocks that make production systems reliable and fast
Load Balancer — L4 vs L7
Load balancers distribute traffic across multiple servers to prevent overload and ensure high availability.
Feature
L4 (Transport)
L7 (Application)
Layer
TCP/UDP
HTTP/HTTPS
Routing
IP + Port
URL, Headers, Cookies
Speed
Very Fast
Slower
SSL
No
Yes (SSL termination)
Examples
AWS NLB, HAProxy
AWS ALB, Nginx
Redis Caching
Cache stores frequently accessed data in memory, reducing database load and improving latency dramatically.
Cache Hit: Data found in cache → return immediately (~1ms) Cache Miss: Not in cache → query DB → store in cache → return
// MISS — fetch from DB const user = await db.findUser(id); await redis.setex(`user:${id}`, 3600, JSON.stringify(user)); return user;
}
Content Delivery Network
CDNs cache static content at edge servers geographically closer to users, reducing latency and origin server load.
Without CDN: User in India → Server in US = 200ms With CDN: User in India → Edge server in Mumbai = 10ms
→Static assets (images, CSS, JS) cached at edge
→User request routed to nearest PoP (Point of Presence)
→Cache miss → fetch from origin → cache at edge
→Reduces DDoS impact by distributing traffic
Message Queues — Kafka & RabbitMQ
Message queues decouple producers and consumers, enabling async processing, buffering traffic spikes, and reliable delivery.
Circuit Breaker Pattern
Prevents cascading failures. When a service fails repeatedly, the circuit "opens" and immediately returns fallback instead of waiting for timeouts.
Distributed Systems
Sharding, replication, consistency, leader election, and observability
Database Sharding
Sharding splits data across multiple databases (shards). Each shard holds a subset of the data, enabling horizontal database scaling.
Sharding Strategies
→Range-based: shard by ID ranges (1-1M, 1M-2M). Simple but can be uneven.
→Hash-based: hash(user_id) % N shards. Even distribution but hard to range query.
→Directory-based: lookup table maps keys to shards. Flexible but single point of failure.
Hotspot problem: Celebrity users (millions of followers) on one shard overwhelm it. Solution: add secondary shard or special-case hot keys.
Database Replication
Replication copies data across multiple nodes for fault tolerance and read scaling.
Leader-Follower (Primary-Replica)
→All writes go to Leader
→Reads can be served by any follower
→Leader sends replication log to followers
→If leader fails, a follower is promoted
Synchronous: Leader waits for follower ack. Slow but no data loss. Asynchronous: Leader doesn't wait. Fast but possible data loss on failover.
Eventual Consistency
In distributed systems, achieving strong consistency has a high cost. Eventual consistency allows temporary divergence — all nodes will converge to the same state eventually.
Propagation delay: Write hits Node A → A syncs to B and C after some delay. During this window, reading from B returns stale data.
Consistency Models
→Strong: All reads return latest write
→Eventual: Will be consistent, but timing unknown
→Monotonic Read: Never see older data after newer
→Read-your-writes: Always see own writes
Leader Election (Raft Simplified)
When a leader node fails, remaining nodes must elect a new leader. Raft uses randomized timeouts and majority votes.
Observability — Metrics, Logs, Traces
📊
Metrics
Numerical measurements over time. CPU%, request rate, error rate, p99 latency. Stored in time-series DB (Prometheus).
📋
Logs
Timestamped event records. Structured (JSON) or unstructured. ELK stack (Elasticsearch, Logstash, Kibana).
🔍
Traces
Track a request across multiple services. Jaeger, Zipkin. See full call graph and latency at each hop.
Glowing particles travel along connection edges showing live request paths. Thicker edges = higher traffic volume. Pulsing nodes = active processing.
Distributed Systems Simulator
Live simulation of distributed system behaviors — interact in real time
5
Active Nodes
0
Total Requests
0ms
Avg Latency
0
Errors
Consistent Hashing
A hashing technique that minimizes key remapping when nodes are added or removed. Each node owns a range on the hash ring. Virtual nodes (vnodes) ensure even distribution.
Event Log
Interview Practice Mode
Simulate real FAANG system design interviews with guided steps and AI evaluation
🐦
Design Twitter
🔗
URL Shortener
🎬
Design YouTube
🚗
Design Uber
💬
Design WhatsApp
🎞
Design Netflix
1.Requirements
2.Scale
3.Architecture
4.Components
5.Scaling
6.Failures
7.Evaluate
Architecture Builder
Drag, drop and connect components to design your own system architecture
TOOL:Click palette to add · Connect tool to wire components
Network
🧑 Client
📱 Mobile App
🌐 CDN
⚖️ Load Balancer
Services
🔧 API Gateway
🖥 Web Server
⚡ Microservice
🔐 Auth Service
Data
🗄 Database
⚡ Cache (Redis)
🔍 Search (ES)
📦 Object Storage
Infra
📨 Message Queue
🌊 Stream Processor
📊 Monitor
Component Config
Select a component to configure its properties.
How to Use
1. Drag components from palette onto canvas 2. Select Connect tool, then drag between nodes 3. Click a node to configure it 4. Press Simulate to animate traffic flow
Tips
• Nodes snap to 40px grid automatically • Double-click a node to rename it • Connections show animated request flow • Red glow = bottleneck in simulation mode
Architecture Stats
Nodes: 0 · Edges: 0 Add components to start building.
Architecture Advisor
Build an architecture to receive AI-powered suggestions.
Interview Question Bank
Real FAANG system design questions with animated solutions
FILTER:
Distributed Systems Game
Keep your system alive under increasing load — scale or fail!
0
Score
100%
Availability
12ms
P99 Latency
$1000
Budget
System Design Game
Keep your distributed system alive as traffic increases. React to failures by adding components!
Scoring
+1 per second alive +10 per 1000 requests handled -50 for each server crash Bonus for high availability
Events
🔴 Server crash — reroutes traffic ⚡ Cache failure — DB load spikes 🌊 Traffic spike — 3× normal load 🔀 Network partition — split cluster
Win Condition
Survive 3 minutes with >95% availability. Latency must stay below 500ms. Achieve 10,000 score to win!
Architecture Whiteboard
Sketch system architectures freely — like Excalidraw for engineers
Keyboard Shortcuts
P Pen L Line A Arrow R Rect T Text E Eraser Ctrl+Z Undo Del Clear selection
Shape Stamps
Click SVR/DB/CAC/QUE/LB/CLI toolbar buttons to stamp pre-drawn system components at cursor position. Then connect with Arrow tool.
Export
Click 💾 to export the whiteboard as PNG. Great for saving your architecture sketches to share with your team.
Performance Analyzer
Latency heatmaps, scalability analysis, and bottleneck detection for your architecture
20%
50%
2
LowHigh
Est. RPS Capacity
48K
requests/second
DB Write Load
72%
of capacity
Cache Hit Ratio
82%
estimated
P99 Latency
340ms
under peak load
Network Egress
2.4GB/s
estimated bandwidth
Fault Tolerance
High
single node failure: OK
Bottleneck Analysis
Database Write Throughput
Single primary at 72% capacity. Add write queue + worker consumers or switch to multi-primary.
API Gateway Single Point
API gateway not horizontally scaled. Add 2+ instances behind load balancer.
Cache Eviction Under Load
Cache hit ratio drops under 3× load spike. Pre-warm critical keys on startup.
Throughput vs. Latency Curve
Capacity Calculator
AI Architecture Generator
Describe any system — instantly generate an interactive architecture diagram
Enter a system prompt and click Generate to see an interactive architecture diagram with detailed component analysis.
Key Tradeoffs
Back-of-envelope Numbers
Real-Time Traffic Simulator
Visualize requests flowing through your architecture — watch bottlenecks emerge in real time
Traffic Rate1,000 RPS
100 RPS10M RPS
0
Req/sec
0ms
Latency p99
0%
Error Rate
0
Throughput
Architecture Mode
Chaos Engineering Lab
Inject failures. Break things deliberately. Build resilient systems.
System Resilience100%
Inject Failure
Mitigations
Event Log
CAP Theorem Playground
Simulate network partitions and see how real databases sacrifice Consistency vs Availability
Cassandra
Cassandra is an AP system. It prioritizes availability over consistency. During a network partition, all nodes remain available but may serve stale data.