Prelim Cometd WebSocket Benchmarks

I have done some very rough preliminary benchmarks on the latest cometd-2.4.0-SNAPSHOT with the latest Jetty-7.5.0-SNAPSHOT and the results are rather impressive. The features that these two releases have added are:

Optimised Jetty NIO with latest JVMs and JITs considered.
Latest websocket draft implemented and optimised.
Websocket client implemented.
Jackson JSON parser/generator used for cometd
Websocket cometd transport for the server improved.
Websocket cometd transport for the bayeux client implemented.

The benchmarks that I’ve done have all been on my notebook using the localhost network, which is not the most realistic of environments, but it still does tell us a lot about the raw performance of the cometd/jetty. Specifically:

Both the server and the client are running on the same machine, so they are effectively sharing the 8 CPUs available. The client typically takes 3x more CPU than the server (for the same load), so this is kind of like running the server on a dual core and the client on a 6 core machine.
The local network has very high throughput which would only be matched by gigabit networks. It also has practically no latency, which is unlike any real network. The long polling transport is more dependent on good network latency than the websocket transport, so the true comparison between these transports will need testing on a real network.

The Test

The cometd load test is a simulated chat application. For this test I tried long-polling and websocket transports for 100, 1000 and 10,000 clients that were each logged into 10 randomly selected chat rooms from a total of 100 rooms. The messages sent were all 50 characters long and were published in batches of 10 messages at once, each to randomly selected rooms. There was a pause between batches that was adjusted to find a good throughput that didn’t have bad latency. However little effort was put into finding the optimal settings to maximise throughput.

The runs were all done on JVM’s that had been warmed up, but the runs were moderately short (approx 30s), so steady state was not guaranteed and the margin of error on these numbers will be pretty high. However, I also did a long run test at one setting just to make sure that steady state can be achieved.

The Results

The bubble chart above plots messages per second against number of clients for both long-polling and websocket transports. The size of the bubble is the maximal latency of the test, with the smallest bubble being 109ms and the largest is 646ms. Observations from the results are:

Regardless of transport we achieved 100’s of 1000’s messages per second! These are great numbers and show that we can cycle the cometd infrastructure at high rates.
The long-polling throughput is probably a over reported because there are many messages being queued into each HTTP response. The most HTTP responses I saw was 22,000 responses per second, so for many application it will be the HTTP rate that limits the throughput rather than the cometd rate. However the websocket throughput did not benefit from any such batching.
The maximal latency for all websocket measurements was significantly better than long polling, with all websocket messages being delivered in < 200ms and the average was < 1ms.
The websocket throughput increased with connections, which probably indicates that at low numbers of connections we were not generating a maximal load.

A Long Run

The throughput tests above need to be redone on a real network and longer runs. However I did do one long run ( 3 hours) of 1,000,013,657 messages at 93,856/sec. T results suggest no immediate problems with long runs. Neither the client nor the server needed to do a old generation collection and all young generation collections took on average only 12ms.

The output from the client is below:

Statistics Started at Fri Aug 19 15:44:48 EST 2011
Operative System: Linux 2.6.38-10-generic amd64
JVM : Sun Microsystems Inc. Java HotSpot(TM) 64-Bit Server VM runtime 17.1-b03 1.6.0_22-b04
Processors: 8
System Memory: 55.35461% used of 7.747429 GiB
Used Heap Size: 215.7406 MiB
Max Heap Size: 1984.0 MiB
Young Generation Heap Size: 448.0 MiB
- - - - - - - - - - - - - - - - - - - -
Testing 1000 clients in 100 rooms, 10 rooms/client
Sending 1000000 batches of 10x50 bytes messages every 10000 µs
- - - - - - - - - - - - - - - - - - - -
Statistics Ended at Fri Aug 19 18:42:23 EST 2011
Elapsed time: 10654717 ms
	Time in JIT compilation: 57 ms
	Time in Young Generation GC: 118473 ms (8354 collections)
	Time in Old Generation GC: 0 ms (0 collections)
Garbage Generated in Young Generation: 2576746.8 MiB
Garbage Generated in Survivor Generation: 336.53125 MiB
Garbage Generated in Old Generation: 532.35156 MiB
Average CPU Load: 433.23907/800
----------------------------------------
Outgoing: Elapsed = 10654716 ms | Rate = 938 msg/s = 93 req/s =   0.4 Mbs
All messages arrived 1000013657/1000013657
Messages - Success/Expected = 1000013657/1000013657
Incoming - Elapsed = 10654716 ms | Rate = 93856 msg/s = 90101 resp/s(96.00%) =  35.8 Mbs
Thread Pool - Queue Max = 972 | Latency avg/max = 3/62 ms
Messages - Wall Latency Min/Ave/Max = 0/8/135 ms

Note that the client was using 433/800 of the available CPU, while you can see that the server (below) was using only 170/800. This suggests that the server has plenty of spare capacity if it were given the entire machine.

Statistics Started at Fri Aug 19 15:44:47 EST 2011
Operative System: Linux 2.6.38-10-generic amd64
JVM : Sun Microsystems Inc. Java HotSpot(TM) 64-Bit Server VM runtime 17.1-b03 1.6.0_22-b04
Processors: 8
System Memory: 55.27913% used of 7.747429 GiB
Used Heap Size: 82.58406 MiB
Max Heap Size: 2016.0 MiB
Young Generation Heap Size: 224.0 MiB
- - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - - -
Statistics Ended at Fri Aug 19 18:42:23 EST 2011
Elapsed time: 10655706 ms
	Time in JIT compilation: 187 ms
	Time in Young Generation GC: 140973 ms (12073 collections)
	Time in Old Generation GC: 0 ms (0 collections)
Garbage Generated in Young Generation: 1652646.0 MiB
Garbage Generated in Survivor Generation: 767.625 MiB
Garbage Generated in Old Generation: 1472.6484 MiB
Average CPU Load: 170.20532/800

Conclusion

These results are preliminary, but excellent none the less! The final releases of jetty 7.5.0 and cometd 2.4.0 will be out within a week or two and we will be working to bring you some more rigorous benchmarks with those releases.

4 Comments

Maks · 19/08/2011 at 13:24

Hi Grag,
cool number! Thanks for the great work!
And I have a small question:
The WebSocketClient sends today “http://example.com” in the Sec-WebSocket-Origin header field regardless of the host I connecting to. Is this wanted behavior or just by an oversight?

gregw · 20/08/2011 at 08:44

Ooops that would be a bug!

Prelim Cometd WebSocket Benchmarks | Eclipse | Syngu · 20/08/2011 at 05:35

[…] have added are: Optimised Jetty NIO with latest JVMs and JITs considered.Latest websocket draft… Read more… Categories: Eclipse Share | Related […]

CometD 2.4.0.beta1 Released | Webtide Blogs · 12/09/2011 at 12:05

[…] have performed some preliminary benchmarks with WebSocket; they look really promising, although have been done before the latest changes to […]

Prelim Cometd WebSocket Benchmarks

Published by admin on 19/08/2011

The Test

The Results

A Long Run

Conclusion

4 Comments

Maks · 19/08/2011 at 13:24

gregw · 20/08/2011 at 08:44

Prelim Cometd WebSocket Benchmarks | Eclipse | Syngu · 20/08/2011 at 05:35

CometD 2.4.0.beta1 Released | Webtide Blogs · 12/09/2011 at 12:05

Comments are closed.

HTTP

Google App Engine Performance Improvements

General

Back to the Future with Cross-Context Dispatch

General

If Virtual Threads are the solution, what is the problem?

Prelim Cometd WebSocket Benchmarks

Published by admin on 19/08/2011

The Test

The Results

A Long Run

Conclusion

4 Comments

Maks · 19/08/2011 at 13:24

gregw · 20/08/2011 at 08:44

Prelim Cometd WebSocket Benchmarks | Eclipse | Syngu · 20/08/2011 at 05:35

CometD 2.4.0.beta1 Released | Webtide Blogs · 12/09/2011 at 12:05

Comments are closed.

Related Posts

HTTP

Google App Engine Performance Improvements

General

Back to the Future with Cross-Context Dispatch

General

If Virtual Threads are the solution, what is the problem?