Posted by & filed under Node.js, Work.

Earlier this week I’ve spent some time on a problem of scaling in a Node.js cluster. Essence of the problem is, when you run multiple Node app threads (workers) on a server, or multiple servers, clients connections are routed by cluster in a random round-robin manner, and handshaken / authorized io client requests get handed to workers where they are not handshaken / authorized, where the mess begins. This happens if sockets created by workers use memory store and do not share transports between each other, or in other words, are not scale ready.

This is a known issue and StackOverflow has a few similar questions:

And more mentions elsewhere on the web: – see top answer by Drew Harry. – collection of links by alessioalex

Native solution developers Learnboost suggest to use Redis store, which is built-in to

or simply

io.set('store', new RedisStore());

I have tested this approach and it does not work. But it seems that I’m not the only one. Therefore a lot of sources above suggest different approaches and different architectures for scaling socketio. In my case, client connections seem to be trying to repeatedly try to re-handshake after disconnect, and server would not emit events to clients, because transports[id] would be null after initial connect. I have tried to look into these issues spending a few hours but I do not have a definitive answer.

Other approaches

Drew Harry (see Quora link above) suggests splitting Node app to three different pieces and have them talk between each other via a message queue or a pub/sub:

  1. Application core. This does all the actual application logic, and holds the state of the system in its own memory, or relies on some datastore. These application cores can usually be easily scaled up by partitioning in some application-specific way.
  2. layer. Clients connect directly to this, and it passes any messages from clients to the app core. Messages from the app core to clients are dispatched to the appropriate process which then sends the message on to the client.
  3. A load balancer. This could be nginx like in the examples elsewhere in this thread, or it could be a smarter app that can talk back and forth with the layers to measure their actual load and direct new connections appropriately.

Although I don’t see how this approach solves a problem of running on different workers, but possibly his idea is that managing load of socketio server is the solution, instead of scaling socketio server.

Another company who faces same issue is who rely heavily on They describe exactly the same issue:

The server currently has some problems with scaling up to more than 10K simultaneous client connections when using multiple processes and the Redis store, and the client has some issues that can cause it to open multiple connections to the same server, or not know that its connection has been severed. There are some issues with submitting our fixes (hacks!) back to the project – in many cases they only work with WebSockets (the only transport we use). We are working to get those changes which are fit for general consumption ready to submit back to the project.


Other developers turn away from completely in favor of other libraries, such as . This comes from Ryan Smith who posted this question on StackOverflow:

Sadly we turned away from due to the issues we encountered with this project and switched to Sock.js ( and have yet to look back. I haven’t seen the latest changes to but I have heard that version 1.0 will include many fixes including the issue with the redis store. One thing to keep in mind if you consider Sockjs is that is a much lower level library than, so if you need channels and groups you will have to build that out your self.

As for myself, I need to revisit this issue later. For now, the main takeaway for me is running servers on separate layers and do not even try to scale them, and scale only the core application itself.

10 Responses to “Scaling with Node.js cluster: unresolved”

        • Alexander

          Problem is if you’re using a cluster in Node.js and using XHR-polling instead of websockets protocol. Master routes XHR requests from client to different processes in a round-robin fashion so they constantly need to re-connect – do handshake, etc, which takes time but mainly doesn’t make sense. Theoretically you can use RedisStore for their store which fixes the problem but it never really worked for me and if you google, you’ll see many more people having same problem, and some companies who use extensively on a cluster actually had to hack source. See quote above from Trello, as an example.

      • joseph

        alex can you help with scaling and node.js
        use my direct email and leave yours

    • Alexander

      Theoretically. Never really worked for me and looks like for a bunch of other people according to some google search results about this topic.

    • Alexander

      Hi, sorry I just saw this comment. No, we have not yet found a solution. Currently we still run only 1 instance of server, and preferred transport is Websockets, so it’s not an issue. This becomes an issue only in the case when client uses HTTP polling and you have more than one server.


Leave a Reply

  • (will not be published)