System Design interviews test your ability to architect scalable, reliable, and maintainable large-scale distributed systems. Questions range from foundational concepts to complex trade-offs in real-world architectures.
Vertical scaling adds more resources (CPU, RAM) to a single machine; horizontal scaling adds more machines to a pool. Prefer vertical scaling for simplicity and low-latency single-node workloads, and horizontal scaling when you need fault tolerance, near-infinite capacity, or need to handle distributed traffic.
A load balancer distributes incoming traffic across multiple servers to prevent any single server from becoming a bottleneck. Common algorithms include Round Robin, Least Connections, IP Hash, and Weighted Round Robin; choice depends on session stickiness needs and server heterogeneity.
CAP theorem states a distributed system can guarantee at most two of three properties: Consistency (every read sees the latest write), Availability (every request gets a response), and Partition Tolerance (system operates despite network partitions). Since network partitions are unavoidable, real systems choose between CP (e.g., HBase) or AP (e.g., Cassandra) trade-offs.
SQL databases are relational, schema-based, and ACID-compliant, ideal for structured data with complex queries. NoSQL databases (document, key-value, column-family, graph) offer flexible schemas and horizontal scalability, better suited for unstructured data, high write throughput, or specific access patterns. Choose based on data model, consistency requirements, and scale.
Caching stores frequently accessed data in a fast-access layer (e.g., Redis, Memcached) to reduce latency and database load. Common strategies include Cache-Aside (application manages cache), Write-Through (write to cache and DB simultaneously), Write-Back (write to cache first, async to DB), and Read-Through; each offers different consistency and performance trade-offs.
A Content Delivery Network is a geographically distributed network of servers that caches and serves static assets (images, JS, CSS, video) from locations closer to the user. Use a CDN to reduce latency, offload origin server traffic, and improve global availability for static or cacheable content.
Synchronous communication requires the caller to wait for a response before continuing (e.g., REST, gRPC), which is simpler but creates tight coupling and cascading failure risk. Asynchronous communication uses message queues or event streams (e.g., Kafka, RabbitMQ) so the caller continues without waiting, improving decoupling, resilience, and throughput at the cost of added complexity.
Sharding horizontally partitions a database across multiple nodes, each holding a subset of data, to scale beyond a single machine's capacity. Common strategies include Range-Based sharding (partition by key range), Hash-Based sharding (partition by hashed key for even distribution), and Directory-Based sharding (a lookup service maps keys to shards); each involves trade-offs in rebalancing complexity and hot-spot risk.
A message queue (e.g., RabbitMQ, SQS) delivers each message to one consumer and typically deletes it after acknowledgment, optimized for task distribution. An event streaming platform (e.g., Kafka) retains ordered, immutable log of events for a configurable retention period, allowing multiple consumer groups to replay events independently, making it suitable for event sourcing and stream processing.
A rate limiter restricts how many requests a client can make in a time window using algorithms like Token Bucket (allows bursts), Leaky Bucket (smooths traffic), Fixed Window Counter, or Sliding Window Log. In a distributed system, use a centralized store like Redis with atomic operations (INCR + EXPIRE) or Lua scripts to maintain consistent counters across multiple API gateway instances.
Eventual consistency means that given no new updates, all replicas will converge to the same value over time, but reads may return stale data in the interim. Applications handle this by designing idempotent operations, using version vectors or timestamps to resolve conflicts, communicating staleness to users (e.g., 'may take a few minutes to update'), and avoiding strong consistency requirements where possible.
REST uses HTTP/1.1 with JSON (human-readable, easy to debug, broad client support), while gRPC uses HTTP/2 with Protocol Buffers (binary, strongly typed, lower latency, supports streaming and multiplexing). Choose gRPC for internal microservice-to-microservice communication where performance matters, you need bi-directional streaming, or you want strict contract enforcement via Protobuf schemas.
High Availability (HA) minimizes downtime through redundancy, failover mechanisms, health checks, and eliminating single points of failure, typically targeting 99.9%+ uptime. Fault Tolerance goes further—the system continues operating correctly even during failure without any degradation or interruption. Achieving HA involves active-passive or active-active replication, circuit breakers, graceful degradation, and multi-AZ deployments.
Core components: an API service that generates a unique short code (base62 encoding of an auto-incremented ID or hash), a key-value store (e.g., Redis or Cassandra) mapping short code to original URL, and a redirect service returning HTTP 301/302. Scale with read replicas and a CDN for redirect caching, use a distributed ID generator (Snowflake) to avoid collisions at scale, and handle analytics asynchronously via a message queue.
2PC is a distributed transaction protocol where a coordinator first sends a PREPARE phase to all participants (who vote commit or abort), then sends a COMMIT or ROLLBACK in phase two based on unanimous agreement. Its drawbacks are blocking behavior (participants hold locks waiting for coordinator), single point of failure at the coordinator, and poor performance at scale; alternatives like Saga pattern or eventual consistency with compensating transactions are often preferred.
The Saga pattern manages distributed transactions as a sequence of local transactions, each publishing an event/message to trigger the next step; if one fails, compensating transactions roll back previous steps. Use Saga (choreography or orchestration-based) over 2PC in microservices where long-running transactions, service autonomy, and avoiding distributed locks are priorities, accepting that you deal with eventual consistency rather than atomicity.
Consistent hashing maps both data keys and server nodes onto a virtual ring using a hash function, assigning each key to the nearest node clockwise on the ring. When a node is added or removed, only the keys that were assigned to that node need to be remapped, minimizing data movement compared to modular hashing. It's used in distributed caches (e.g., Memcached), load balancers, and distributed databases (e.g., Cassandra, DynamoDB) to enable elastic scaling.
Distributed consensus algorithms allow a cluster of nodes to agree on a single value or sequence of values even in the presence of failures. Raft elects a leader via randomized timeouts; the leader accepts client writes, appends them to its log, replicates them to a majority of followers, and only then commits—guaranteeing consistency as long as a majority (quorum) of nodes are available. Raft is designed to be more understandable than Paxos and is used in etcd, CockroachDB, and TiKV.
Key components include a persistent job store (e.g., PostgreSQL or Redis) holding job definitions and next-run times, a leader-election mechanism (via Zookeeper or etcd) to ensure only one scheduler node triggers jobs at a time, a task queue (Kafka or SQS) to dispatch triggered jobs to worker pools, and idempotency keys with at-least-once delivery handling to prevent duplicate execution. Monitor with dead letter queues and heartbeat checks on workers.
Back-pressure is a feedback mechanism that signals upstream producers to slow down when downstream consumers are overwhelmed, preventing unbounded queue growth and memory exhaustion. Implementation strategies include bounded queues that block or drop when full, reactive streams (e.g., Project Reactor, Akka Streams) with explicit demand signaling, consumer-controlled poll rates in Kafka, and circuit breakers that reject requests when error rates exceed thresholds.
© RM Full Stack & AI Engineer · All interview questions · Roadmaps · Open the app