RMRM Full Stack & AI Engineer · All questions · Roadmaps
Computer Science · interview questions

System Design Interview Questions

System Design interviews test your ability to architect scalable, reliable, and maintainable large-scale distributed systems. Questions range from foundational concepts to complex trade-offs in real-world architectures.

1. What is horizontal scaling vs vertical scaling? When would you choose each?

beginner

Vertical scaling adds more resources (CPU, RAM) to a single machine; horizontal scaling adds more machines to a pool. Prefer vertical scaling for simplicity and low-latency single-node workloads, and horizontal scaling when you need fault tolerance, near-infinite capacity, or need to handle distributed traffic.

2. What is a load balancer and what are common load-balancing algorithms?

beginner

A load balancer distributes incoming traffic across multiple servers to prevent any single server from becoming a bottleneck. Common algorithms include Round Robin, Least Connections, IP Hash, and Weighted Round Robin; choice depends on session stickiness needs and server heterogeneity.

3. What is the CAP theorem?

beginner

CAP theorem states a distributed system can guarantee at most two of three properties: Consistency (every read sees the latest write), Availability (every request gets a response), and Partition Tolerance (system operates despite network partitions). Since network partitions are unavoidable, real systems choose between CP (e.g., HBase) or AP (e.g., Cassandra) trade-offs.

4. What is the difference between SQL and NoSQL databases? How do you choose?

beginner

SQL databases are relational, schema-based, and ACID-compliant, ideal for structured data with complex queries. NoSQL databases (document, key-value, column-family, graph) offer flexible schemas and horizontal scalability, better suited for unstructured data, high write throughput, or specific access patterns. Choose based on data model, consistency requirements, and scale.

5. What is caching and what are common caching strategies?

beginner

Caching stores frequently accessed data in a fast-access layer (e.g., Redis, Memcached) to reduce latency and database load. Common strategies include Cache-Aside (application manages cache), Write-Through (write to cache and DB simultaneously), Write-Back (write to cache first, async to DB), and Read-Through; each offers different consistency and performance trade-offs.

6. What is a CDN and when should you use one?

beginner

A Content Delivery Network is a geographically distributed network of servers that caches and serves static assets (images, JS, CSS, video) from locations closer to the user. Use a CDN to reduce latency, offload origin server traffic, and improve global availability for static or cacheable content.

7. Explain the difference between synchronous and asynchronous communication in distributed systems.

intermediate

Synchronous communication requires the caller to wait for a response before continuing (e.g., REST, gRPC), which is simpler but creates tight coupling and cascading failure risk. Asynchronous communication uses message queues or event streams (e.g., Kafka, RabbitMQ) so the caller continues without waiting, improving decoupling, resilience, and throughput at the cost of added complexity.

8. What is database sharding and what are the main sharding strategies?

intermediate

Sharding horizontally partitions a database across multiple nodes, each holding a subset of data, to scale beyond a single machine's capacity. Common strategies include Range-Based sharding (partition by key range), Hash-Based sharding (partition by hashed key for even distribution), and Directory-Based sharding (a lookup service maps keys to shards); each involves trade-offs in rebalancing complexity and hot-spot risk.

9. What is the difference between a message queue and an event streaming platform?

intermediate

A message queue (e.g., RabbitMQ, SQS) delivers each message to one consumer and typically deletes it after acknowledgment, optimized for task distribution. An event streaming platform (e.g., Kafka) retains ordered, immutable log of events for a configurable retention period, allowing multiple consumer groups to replay events independently, making it suitable for event sourcing and stream processing.

10. How would you design a rate limiter?

intermediate

A rate limiter restricts how many requests a client can make in a time window using algorithms like Token Bucket (allows bursts), Leaky Bucket (smooths traffic), Fixed Window Counter, or Sliding Window Log. In a distributed system, use a centralized store like Redis with atomic operations (INCR + EXPIRE) or Lua scripts to maintain consistent counters across multiple API gateway instances.

11. What is eventual consistency and how do you handle it in application design?

intermediate

Eventual consistency means that given no new updates, all replicas will converge to the same value over time, but reads may return stale data in the interim. Applications handle this by designing idempotent operations, using version vectors or timestamps to resolve conflicts, communicating staleness to users (e.g., 'may take a few minutes to update'), and avoiding strong consistency requirements where possible.

12. What are the key differences between REST and gRPC? When would you choose gRPC?

intermediate

REST uses HTTP/1.1 with JSON (human-readable, easy to debug, broad client support), while gRPC uses HTTP/2 with Protocol Buffers (binary, strongly typed, lower latency, supports streaming and multiplexing). Choose gRPC for internal microservice-to-microservice communication where performance matters, you need bi-directional streaming, or you want strict contract enforcement via Protobuf schemas.

13. How do you design a system for high availability? What is the difference between HA and fault tolerance?

intermediate

High Availability (HA) minimizes downtime through redundancy, failover mechanisms, health checks, and eliminating single points of failure, typically targeting 99.9%+ uptime. Fault Tolerance goes further—the system continues operating correctly even during failure without any degradation or interruption. Achieving HA involves active-passive or active-active replication, circuit breakers, graceful degradation, and multi-AZ deployments.

14. Design a URL shortening service (like bit.ly). Walk through your architecture.

intermediate

Core components: an API service that generates a unique short code (base62 encoding of an auto-incremented ID or hash), a key-value store (e.g., Redis or Cassandra) mapping short code to original URL, and a redirect service returning HTTP 301/302. Scale with read replicas and a CDN for redirect caching, use a distributed ID generator (Snowflake) to avoid collisions at scale, and handle analytics asynchronously via a message queue.

15. What is the two-phase commit (2PC) protocol and what are its drawbacks?

advanced

2PC is a distributed transaction protocol where a coordinator first sends a PREPARE phase to all participants (who vote commit or abort), then sends a COMMIT or ROLLBACK in phase two based on unanimous agreement. Its drawbacks are blocking behavior (participants hold locks waiting for coordinator), single point of failure at the coordinator, and poor performance at scale; alternatives like Saga pattern or eventual consistency with compensating transactions are often preferred.

16. What is the Saga pattern and when would you use it over 2PC?

advanced

The Saga pattern manages distributed transactions as a sequence of local transactions, each publishing an event/message to trigger the next step; if one fails, compensating transactions roll back previous steps. Use Saga (choreography or orchestration-based) over 2PC in microservices where long-running transactions, service autonomy, and avoiding distributed locks are priorities, accepting that you deal with eventual consistency rather than atomicity.

17. How does consistent hashing work and why is it used in distributed systems?

advanced

Consistent hashing maps both data keys and server nodes onto a virtual ring using a hash function, assigning each key to the nearest node clockwise on the ring. When a node is added or removed, only the keys that were assigned to that node need to be remapped, minimizing data movement compared to modular hashing. It's used in distributed caches (e.g., Memcached), load balancers, and distributed databases (e.g., Cassandra, DynamoDB) to enable elastic scaling.

18. What is a distributed consensus algorithm? Briefly explain Raft.

advanced

Distributed consensus algorithms allow a cluster of nodes to agree on a single value or sequence of values even in the presence of failures. Raft elects a leader via randomized timeouts; the leader accepts client writes, appends them to its log, replicates them to a majority of followers, and only then commits—guaranteeing consistency as long as a majority (quorum) of nodes are available. Raft is designed to be more understandable than Paxos and is used in etcd, CockroachDB, and TiKV.

19. How would you design a distributed job scheduler (like a cron service at scale)?

advanced

Key components include a persistent job store (e.g., PostgreSQL or Redis) holding job definitions and next-run times, a leader-election mechanism (via Zookeeper or etcd) to ensure only one scheduler node triggers jobs at a time, a task queue (Kafka or SQS) to dispatch triggered jobs to worker pools, and idempotency keys with at-least-once delivery handling to prevent duplicate execution. Monitor with dead letter queues and heartbeat checks on workers.

20. What is back-pressure and how do you implement it in a distributed system?

advanced

Back-pressure is a feedback mechanism that signals upstream producers to slow down when downstream consumers are overwhelmed, preventing unbounded queue growth and memory exhaustion. Implementation strategies include bounded queues that block or drop when full, reactive streams (e.g., Project Reactor, Akka Streams) with explicit demand signaling, consumer-controlled poll rates in Kafka, and circuit breakers that reject requests when error rates exceed thresholds.

Practice these out loud with an AI interviewer that grills you and grades your answers.
Open the app — free to start

© RM Full Stack & AI Engineer · All interview questions · Roadmaps · Open the app