I have done some very rough preliminary benchmarks on the latest cometd-2.4.0-SNAPSHOT with the latest Jetty-7.5.0-SNAPSHOT and the results are rather impressive. The features that these two releases have added are:
- Optimised Jetty NIO with latest JVMs and JITs considered.
- Latest websocket draft implemented and optimised.
- Websocket client implemented.
- Jackson JSON parser/generator used for cometd
- Websocket cometd transport for the server improved.
- Websocket cometd transport for the bayeux client implemented.
The benchmarks that I’ve done have all been on my notebook using the localhost network, which is not the most realistic of environments, but it still does tell us a lot about the raw performance of the cometd/jetty. Specifically:
- Both the server and the client are running on the same machine, so they are effectively sharing the 8 CPUs available. The client typically takes 3x more CPU than the server (for the same load), so this is kind of like running the server on a dual core and the client on a 6 core machine.
- The local network has very high throughput which would only be matched by gigabit networks. It also has practically no latency, which is unlike any real network. The long polling transport is more dependent on good network latency than the websocket transport, so the true comparison between these transports will need testing on a real network.
The cometd load test is a simulated chat application. For this test I tried long-polling and websocket transports for 100, 1000 and 10,000 clients that were each logged into 10 randomly selected chat rooms from a total of 100 rooms. The messages sent were all 50 characters long and were published in batches of 10 messages at once, each to randomly selected rooms. There was a pause between batches that was adjusted to find a good throughput that didn’t have bad latency. However little effort was put into finding the optimal settings to maximise throughput.
The runs were all done on JVM’s that had been warmed up, but the runs were moderately short (approx 30s), so steady state was not guaranteed and the margin of error on these numbers will be pretty high. However, I also did a long run test at one setting just to make sure that steady state can be achieved.
The bubble chart above plots messages per second against number of clients for both long-polling and websocket transports. The size of the bubble is the maximal latency of the test, with the smallest bubble being 109ms and the largest is 646ms. Observations from the results are:
- Regardless of transport we achieved 100’s of 1000’s messages per second! These are great numbers and show that we can cycle the cometd infrastructure at high rates.
- The long-polling throughput is probably a over reported because there are many messages being queued into each HTTP response. The most HTTP responses I saw was 22,000 responses per second, so for many application it will be the HTTP rate that limits the throughput rather than the cometd rate. However the websocket throughput did not benefit from any such batching.
- The maximal latency for all websocket measurements was significantly better than long polling, with all websocket messages being delivered in < 200ms and the average was < 1ms.
- The websocket throughput increased with connections, which probably indicates that at low numbers of connections we were not generating a maximal load.
A Long Run
The throughput tests above need to be redone on a real network and longer runs. However I did do one long run ( 3 hours) of 1,000,013,657 messages at 93,856/sec. T results suggest no immediate problems with long runs. Neither the client nor the server needed to do a old generation collection and all young generation collections took on average only 12ms.
The output from the client is below:
Statistics Started at Fri Aug 19 15:44:48 EST 2011 Operative System: Linux 2.6.38-10-generic amd64 JVM : Sun Microsystems Inc. Java HotSpot(TM) 64-Bit Server VM runtime 17.1-b03 1.6.0_22-b04 Processors: 8 System Memory: 55.35461% used of 7.747429 GiB Used Heap Size: 215.7406 MiB Max Heap Size: 1984.0 MiB Young Generation Heap Size: 448.0 MiB - - - - - - - - - - - - - - - - - - - - Testing 1000 clients in 100 rooms, 10 rooms/client Sending 1000000 batches of 10x50 bytes messages every 10000 µs - - - - - - - - - - - - - - - - - - - - Statistics Ended at Fri Aug 19 18:42:23 EST 2011 Elapsed time: 10654717 ms Time in JIT compilation: 57 ms Time in Young Generation GC: 118473 ms (8354 collections) Time in Old Generation GC: 0 ms (0 collections) Garbage Generated in Young Generation: 2576746.8 MiB Garbage Generated in Survivor Generation: 336.53125 MiB Garbage Generated in Old Generation: 532.35156 MiB Average CPU Load: 433.23907/800 ---------------------------------------- Outgoing: Elapsed = 10654716 ms | Rate = 938 msg/s = 93 req/s = 0.4 Mbs All messages arrived 1000013657/1000013657 Messages - Success/Expected = 1000013657/1000013657 Incoming - Elapsed = 10654716 ms | Rate = 93856 msg/s = 90101 resp/s(96.00%) = 35.8 Mbs Thread Pool - Queue Max = 972 | Latency avg/max = 3/62 ms Messages - Wall Latency Min/Ave/Max = 0/8/135 ms
Note that the client was using 433/800 of the available CPU, while you can see that the server (below) was using only 170/800. This suggests that the server has plenty of spare capacity if it were given the entire machine.
Statistics Started at Fri Aug 19 15:44:47 EST 2011 Operative System: Linux 2.6.38-10-generic amd64 JVM : Sun Microsystems Inc. Java HotSpot(TM) 64-Bit Server VM runtime 17.1-b03 1.6.0_22-b04 Processors: 8 System Memory: 55.27913% used of 7.747429 GiB Used Heap Size: 82.58406 MiB Max Heap Size: 2016.0 MiB Young Generation Heap Size: 224.0 MiB - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Statistics Ended at Fri Aug 19 18:42:23 EST 2011 Elapsed time: 10655706 ms Time in JIT compilation: 187 ms Time in Young Generation GC: 140973 ms (12073 collections) Time in Old Generation GC: 0 ms (0 collections) Garbage Generated in Young Generation: 1652646.0 MiB Garbage Generated in Survivor Generation: 767.625 MiB Garbage Generated in Old Generation: 1472.6484 MiB Average CPU Load: 170.20532/800
These results are preliminary, but excellent none the less! The final releases of jetty 7.5.0 and cometd 2.4.0 will be out within a week or two and we will be working to bring you some more rigorous benchmarks with those releases.