← Writings

WebSockets and Advanced Backend Communication

March 2025·18 min read

In production applications, we don't just have one big backend doing everything. We break our backend into multiple services, and these services talk to each other in fascinating ways.

The PhonePe Problem

Let's start with a real example. Imagine I'm sending 200 rupees to my friend through PhonePe. The money transfer needs to happen immediately — that's the core flow. But PhonePe also sends notifications and SMS confirmations.

Here's where things get tricky. If we build everything in one backend and the SMS service goes down, suddenly our entire payment system is affected. Users can't send money because we're waiting for the SMS service to respond. That's a terrible user experience.

The notification service being down shouldn't stop people from sending money. That's where microservices architecture comes in.

Enter Message Queues

When a request comes in, it reaches our primary backend. The money deduction and addition happens immediately — that's synchronous and critical. But for services like SMS and notifications? We use a queue.

Think of a queue like a todo list. Our primary backend says "hey, I need to send an SMS to this user" and puts that job in the queue. Then it moves on. It doesn't wait. A separate worker service picks up jobs from the queue and handles them one by one.

The beauty? Money transfers happen instantly. Notifications can take their own time. If the notification worker crashes, users can still send money.

The LeetCode Architecture

When you submit a solution on LeetCode, here's what happens:

  1. The request hits LeetCode's primary backend
  2. The backend (producer) puts a job in a queue with: userId, problemId, code, language
  3. A worker picks up the job and executes the code in an isolated container

Why not execute code in the primary backend? Two reasons:

Security: We're running untrusted user code. Workers run in isolated containers — sandboxed. If something goes wrong, we just kill that worker.

Reliability: What if the code has while(true)? Our primary backend would be stuck, and other users would suffer. Workers are isolated from this problem.

Types of Communication

Synchronous: - HTTP/HTTPS: Service A sends a request, waits for Service B to respond. - WebSockets: Start as HTTP, then upgrade. Allow full-duplex communication — both sides can send messages anytime without reopening connections.

Asynchronous: - Message Queues (Redis, RabbitMQ): Producer adds jobs, consumers pick them up when ready. Producer doesn't wait. - Pub/Sub: Publishers send messages, all subscribers receive them. Great for scaling.

The Full LeetCode Flow

  1. You submit code
  2. Primary backend puts job in queue
  3. Worker executes code in isolated container
  4. Worker publishes result to pub/sub
  5. WebSocket server subscribed to pub/sub receives the result
  6. WebSocket server sends event to your browser

Why can't workers talk to browsers directly?

Workers are isolated by design — they should never have direct access to clients. They're also short-lived; they scale up and down and might die before sending a result. Pub/sub decouples them entirely.

How Redis Handles Crashes

Redis has two persistence strategies:

AOF (Append Only File): Every write operation is logged to disk. If Redis crashes, we replay the log. Downside: large files take time to replay.

RDB (Redis Database Backup): Redis takes periodic snapshots creating a binary file. Recovery is fast but you might lose data between snapshots.

Scaling WebSocket Servers

In production, we have many WebSocket servers distributed geographically. Indian users connect to Indian servers, US users to US servers — lower latency.

But if Vinod (India) and a friend (USA) are in the same room on different servers, how does a message from Vinod reach the friend?

Workers just publish messages to pub/sub. All WebSocket servers subscribe. Whichever server has your friend connected receives the message and forwards it. That's how we scale globally.

Modern production systems aren't monoliths. They're distributed systems with synchronous communication for critical operations, asynchronous queues for non-critical tasks, pub/sub for scaling across servers, and WebSockets for real-time communication. Understanding these patterns changed how I think about building applications.