Web server clusters are used for scalability (handling more requests), availability (always have a server available) and fault tolerance (handling the failure of a server).  Of these, the fault tolerance is the hardest and most costly mechanism to implement as it requires that all session data be replicated off the web server so that it will be available to another web server if the initial webserver becomes unavailable. 

Techniques for  replicating session data include distributing: to all nodes in a cluster; to a subset of the nodes in the cluster; to a database; to a cluster server; etc. etc. To this range of solutions, cometd/bayeux now adds the possibility of using the client machine to store replicated session.

While it sounds strange, if you think about it, the client machine is the perfect place to store a replicated copy of the server session data.  It is a machine that is always in communication to the correct  webserver,   if the client machine crashes, the session is by definition over so you don’t need to timeout the replicated data. Clients machines scale with the number of users, so if you have more users then you have then more machines available to store replicated session.

To test this concept I have created the BayeuxSessionManager for Jetty. This session manager keeps a dirty bit on the session, and if at the end of a request the session has been modified, then the session is serialized (should be encrypted) and B64 encoded and sent to the client as a Bayeux message.  The client saves the opaque blob of session data. If the client is directed to a new server (due to policy, maintenance, failure or happenstance), then the Bayeux protocol will see that the client ID is not know and a re-handshake will be invoked.  The session extension includes any saved blobs of session data with the handshake request, so that as the Bayeux connection is established with the new server, the session is restored and is again available.

Now I don’t think this approach is ever going to be 100% and I certainly would not like to see any pacemakers using this to communicate with a medical database about how many volts to zap you with to save you from a heart attack.   But then it may be good enough to deal with enough common failures that you will be able to rest easy and avoid the stress levels the could cause a heart attack in the first place!

At this stage, this is only a proof of concept and  more details do need to be  worked out, including:

  • encryption of the session data blob so that clients cannot see inside server data structures.
  • how to deal with page reloads: should the client use some client side persistence (or is it good enough to expect that page reloads are unlikely to coincide with server failures?)
  • how to deal with requests that need sessions that arrive on the new server before the bayeux connection is established.

I think all of these are solvable and for applications that are clustered primarily for scalability and availability, this technique could provide a bit of fault tolerance without the need for complex server interconnections or repositories than can hold all sessions from all nodes.

What this technique needs now is a real use-case to help drive it towards a complete solutions. So if anybody is interested in this style of replication and has a project that can afford to experiment with a new and novel approach, please contact me and we can see where this leads.

Client-side session replication for clusters

2 thoughts on “Client-side session replication for clusters

  • January 9, 2008 at 11:49 am
    Permalink

    Very neat idea.

    But aren’t you really going back/forward/towards a "REST" like solution? REST-like in that if you just always sent over the session data, the server could forget about session ids and memory retention of session data altogether – each access would be unique to the server. Furthermore, this could once and for all fix the problem of multiple windows/tabs towards the same server (two years ago I saw a demo of some RedHat stuff (IIRC) that also handled this, don’t quite rememeber how, but I basically think it kept a session id in the url to distinguish). And it could make the server scale forever, in regard to the number of simultaneous users, as each user don’t add any footprint. (However, obviously sacrificing throughput due to much heavier CPU demand on serialization and encryption).

    My last server side project was a somewhat advanced search application (slightly more than simple pagination through the result set). This was in its previous incarnation developed using sessions. I ditched it all, and established all the necessary info using echoed variables in the XML file, which was made into different URL strings , your basic REST-like application. But oh how powerful that solution felt afterwards – one could work with a bunch of different searches at the same time, bookmark a certain drill-downed/faceted/paged result, send it to a colleague, and best of all, the server was utterly dumb, viewing every access as a standalone service request. I was very happy with the end result, and will obviously strive to make other applications behave in a similar way.

    What I see in your solution if you pull it to the edge, is the ability to do this transparently: you may use standard Session stuff and code rather freely – but on deploy, it will behave as a REST-like client.

    Aspects on security: the need to encrypt and seal the session data is very apparent: You state "not to look inside the server structures". That’s not the big deal, IMHO – it is to completely make it impossible to change data inside the server structure. Also, some features should probably be developed to make it easy to time-out certain kind of data (e.g. the login-info, so that one must log in again (but then one could contine where one left of)), or to make some data "once only" – to hinder "replays" of specific parts of runs, e.g. "buy that ticket". This so that you can’t send a person a link (or a page, or cookies, or whatever you do to keep the data), and if he clicks it, one more ticket is bought.

  • January 10, 2008 at 7:09 am
    Permalink

    Endre,

    At one level it is RESTful, in that the client receives all the state.  But then it is not a single "document". It allows separation of concerns, in that security filters, application components, frameworks etc can all independently put their stuff in the session and it will be replicated out to the client.

    As you say, security is a concern and the encryption would need to be strong.

Comments are closed.