Question 1

What is horizontal scaling vs vertical scaling? When would you choose each?

Accepted Answer

Vertical scaling adds more resources (CPU, RAM) to a single machine; horizontal scaling adds more machines to a pool. Prefer vertical scaling for simplicity and low-latency single-node workloads, and horizontal scaling when you need fault tolerance, near-infinite capacity, or need to handle distributed traffic.

Question 2

What is a load balancer and what are common load-balancing algorithms?

Accepted Answer

A load balancer distributes incoming traffic across multiple servers to prevent any single server from becoming a bottleneck. Common algorithms include Round Robin, Least Connections, IP Hash, and Weighted Round Robin; choice depends on session stickiness needs and server heterogeneity.

Question 3

What is the CAP theorem?

Accepted Answer

CAP theorem states a distributed system can guarantee at most two of three properties: Consistency (every read sees the latest write), Availability (every request gets a response), and Partition Tolerance (system operates despite network partitions). Since network partitions are unavoidable, real systems choose between CP (e.g., HBase) or AP (e.g., Cassandra) trade-offs.

Question 4

What is the difference between SQL and NoSQL databases? How do you choose?

Accepted Answer

SQL databases are relational, schema-based, and ACID-compliant, ideal for structured data with complex queries. NoSQL databases (document, key-value, column-family, graph) offer flexible schemas and horizontal scalability, better suited for unstructured data, high write throughput, or specific access patterns. Choose based on data model, consistency requirements, and scale.

Question 5

What is caching and what are common caching strategies?

Accepted Answer

Caching stores frequently accessed data in a fast-access layer (e.g., Redis, Memcached) to reduce latency and database load. Common strategies include Cache-Aside (application manages cache), Write-Through (write to cache and DB simultaneously), Write-Back (write to cache first, async to DB), and Read-Through; each offers different consistency and performance trade-offs.

Question 6

What is a CDN and when should you use one?

Accepted Answer

A Content Delivery Network is a geographically distributed network of servers that caches and serves static assets (images, JS, CSS, video) from locations closer to the user. Use a CDN to reduce latency, offload origin server traffic, and improve global availability for static or cacheable content.

Question 7

Explain the difference between synchronous and asynchronous communication in distributed systems.

Accepted Answer

Synchronous communication requires the caller to wait for a response before continuing (e.g., REST, gRPC), which is simpler but creates tight coupling and cascading failure risk. Asynchronous communication uses message queues or event streams (e.g., Kafka, RabbitMQ) so the caller continues without waiting, improving decoupling, resilience, and throughput at the cost of added complexity.

Question 8

What is database sharding and what are the main sharding strategies?

Accepted Answer

Sharding horizontally partitions a database across multiple nodes, each holding a subset of data, to scale beyond a single machine's capacity. Common strategies include Range-Based sharding (partition by key range), Hash-Based sharding (partition by hashed key for even distribution), and Directory-Based sharding (a lookup service maps keys to shards); each involves trade-offs in rebalancing complexity and hot-spot risk.

Question 9

What is the difference between a message queue and an event streaming platform?

Accepted Answer

A message queue (e.g., RabbitMQ, SQS) delivers each message to one consumer and typically deletes it after acknowledgment, optimized for task distribution. An event streaming platform (e.g., Kafka) retains ordered, immutable log of events for a configurable retention period, allowing multiple consumer groups to replay events independently, making it suitable for event sourcing and stream processing.

Question 10

How would you design a rate limiter?

Accepted Answer

A rate limiter restricts how many requests a client can make in a time window using algorithms like Token Bucket (allows bursts), Leaky Bucket (smooths traffic), Fixed Window Counter, or Sliding Window Log. In a distributed system, use a centralized store like Redis with atomic operations (INCR + EXPIRE) or Lua scripts to maintain consistent counters across multiple API gateway instances.

Question 11

What is eventual consistency and how do you handle it in application design?

Accepted Answer

Eventual consistency means that given no new updates, all replicas will converge to the same value over time, but reads may return stale data in the interim. Applications handle this by designing idempotent operations, using version vectors or timestamps to resolve conflicts, communicating staleness to users (e.g., 'may take a few minutes to update'), and avoiding strong consistency requirements where possible.

Question 12

What are the key differences between REST and gRPC? When would you choose gRPC?

Accepted Answer

REST uses HTTP/1.1 with JSON (human-readable, easy to debug, broad client support), while gRPC uses HTTP/2 with Protocol Buffers (binary, strongly typed, lower latency, supports streaming and multiplexing). Choose gRPC for internal microservice-to-microservice communication where performance matters, you need bi-directional streaming, or you want strict contract enforcement via Protobuf schemas.

Question 13

How do you design a system for high availability? What is the difference between HA and fault tolerance?

Accepted Answer

High Availability (HA) minimizes downtime through redundancy, failover mechanisms, health checks, and eliminating single points of failure, typically targeting 99.9%+ uptime. Fault Tolerance goes further—the system continues operating correctly even during failure without any degradation or interruption. Achieving HA involves active-passive or active-active replication, circuit breakers, graceful degradation, and multi-AZ deployments.

Question 14

Design a URL shortening service (like bit.ly). Walk through your architecture.

Accepted Answer

Core components: an API service that generates a unique short code (base62 encoding of an auto-incremented ID or hash), a key-value store (e.g., Redis or Cassandra) mapping short code to original URL, and a redirect service returning HTTP 301/302. Scale with read replicas and a CDN for redirect caching, use a distributed ID generator (Snowflake) to avoid collisions at scale, and handle analytics asynchronously via a message queue.

Question 15

What is the two-phase commit (2PC) protocol and what are its drawbacks?

Accepted Answer

2PC is a distributed transaction protocol where a coordinator first sends a PREPARE phase to all participants (who vote commit or abort), then sends a COMMIT or ROLLBACK in phase two based on unanimous agreement. Its drawbacks are blocking behavior (participants hold locks waiting for coordinator), single point of failure at the coordinator, and poor performance at scale; alternatives like Saga pattern or eventual consistency with compensating transactions are often preferred.

Question 16

What is the Saga pattern and when would you use it over 2PC?

Accepted Answer

The Saga pattern manages distributed transactions as a sequence of local transactions, each publishing an event/message to trigger the next step; if one fails, compensating transactions roll back previous steps. Use Saga (choreography or orchestration-based) over 2PC in microservices where long-running transactions, service autonomy, and avoiding distributed locks are priorities, accepting that you deal with eventual consistency rather than atomicity.

Question 17

How does consistent hashing work and why is it used in distributed systems?

Accepted Answer

Consistent hashing maps both data keys and server nodes onto a virtual ring using a hash function, assigning each key to the nearest node clockwise on the ring. When a node is added or removed, only the keys that were assigned to that node need to be remapped, minimizing data movement compared to modular hashing. It's used in distributed caches (e.g., Memcached), load balancers, and distributed databases (e.g., Cassandra, DynamoDB) to enable elastic scaling.

Question 18

What is a distributed consensus algorithm? Briefly explain Raft.

Accepted Answer

Distributed consensus algorithms allow a cluster of nodes to agree on a single value or sequence of values even in the presence of failures. Raft elects a leader via randomized timeouts; the leader accepts client writes, appends them to its log, replicates them to a majority of followers, and only then commits—guaranteeing consistency as long as a majority (quorum) of nodes are available. Raft is designed to be more understandable than Paxos and is used in etcd, CockroachDB, and TiKV.

Question 19

How would you design a distributed job scheduler (like a cron service at scale)?

Accepted Answer

Key components include a persistent job store (e.g., PostgreSQL or Redis) holding job definitions and next-run times, a leader-election mechanism (via Zookeeper or etcd) to ensure only one scheduler node triggers jobs at a time, a task queue (Kafka or SQS) to dispatch triggered jobs to worker pools, and idempotency keys with at-least-once delivery handling to prevent duplicate execution. Monitor with dead letter queues and heartbeat checks on workers.

Question 20

What is back-pressure and how do you implement it in a distributed system?

Accepted Answer

Back-pressure is a feedback mechanism that signals upstream producers to slow down when downstream consumers are overwhelmed, preventing unbounded queue growth and memory exhaustion. Implementation strategies include bounded queues that block or drop when full, reactive streams (e.g., Project Reactor, Akka Streams) with explicit demand signaling, consumer-controlled poll rates in Kafka, and circuit breakers that reject requests when error rates exceed thresholds.

System Design Interview Questions

1. What is horizontal scaling vs vertical scaling? When would you choose each?

2. What is a load balancer and what are common load-balancing algorithms?

3. What is the CAP theorem?

4. What is the difference between SQL and NoSQL databases? How do you choose?

5. What is caching and what are common caching strategies?

6. What is a CDN and when should you use one?

7. Explain the difference between synchronous and asynchronous communication in distributed systems.

8. What is database sharding and what are the main sharding strategies?

9. What is the difference between a message queue and an event streaming platform?

10. How would you design a rate limiter?

11. What is eventual consistency and how do you handle it in application design?

12. What are the key differences between REST and gRPC? When would you choose gRPC?

13. How do you design a system for high availability? What is the difference between HA and fault tolerance?

14. Design a URL shortening service (like bit.ly). Walk through your architecture.

15. What is the two-phase commit (2PC) protocol and what are its drawbacks?

16. What is the Saga pattern and when would you use it over 2PC?

17. How does consistent hashing work and why is it used in distributed systems?

18. What is a distributed consensus algorithm? Briefly explain Raft.

19. How would you design a distributed job scheduler (like a cron service at scale)?

20. What is back-pressure and how do you implement it in a distributed system?