Slightly more than one year has passed since the last CometD 2 benchmarks, and more than three years since the CometD 1 benchmark. During this year we have done a lot of work on CometD, both by adding features and by continuously improving performance and stability to make it faster and more scalable.
With the upcoming CometD 2.4.0 release, one of the biggest changes is the implementation of a WebSocket transport for both the Java client and the Java server.
The WebSocket protocol is finalizing at the IETF, major browsers all support various draft versions of the protocol (and Jetty supports all draft versions), so while WebSocket is slowly picking up, it is interesting to compare how WebSocket behaves with respect to HTTP for the typical scenarios that use CometD.
HTTP Benchmark Results
Below you can find the benchmark result graph when using the CometD long-polling transport, based on plain HTTP.
Differently from the previous benchmark, where we reported the average latency, this time we report the median latency, which is a better indicator of the latencies seen by the clients.
Comparison with the previous benchmark would be unfair, since the hosts were different (both in number and computing power), and the JVM also was different.
As you can see from the graph above, the median latency is pretty much the same no matter the number of clients, with the exception of 50k clients at 50k messages/s.
The median latency stays well under 200 ms even at more than 50k messages/s, and it is in the range of 2-4 ms until 10k messages/s, and around 50 ms for 20k messages/s, even for 50k clients.
The result for 50k clients and 50k messages/s is a bit strange, since the hosts (both server and clients) had plenty of CPU available and plenty of threads available (which rules out locking contention issues in the code that would have bumped up threads use).
Could it be possible that at that message rate we hit some limit of the EC2 platform ? It might be possible and this blog post confirms that indeed there are limits in the virtualization of the network interfaces between host and guest. I have words from other people who have performed benchmarks on EC2 that they also hit limits very close to what the blog post above describes.
In any case, one server with 20k clients serving 50k messages/s with 150 ms median latency is a very good result.
For completeness, the 99th percentile latency is around 350 ms for 20k and 50k clients at 20k messages/s and around 1500 ms for 20k clients at 50k messages/s, and much less–quite close to the median latency–for the other results.
WebSocket Benchmark Results
The results for the same benchmarks using the WebSocket transport were quite impressive, and you can see them below.
Note that this graph uses a totally different scale for latencies and number of clients.
Whereas for HTTP we had a 800 ms as maximum latency (on the Y axis), for WebSocket we have 6 ms (yes you read that right); and whereas for HTTP we somehow topped at 50k clients per server, here we could go up to 200k.
We did not merge the two graphs into a single one to avoid that the WebSocket resulting trend lines were collapsed onto the X axis.
With HTTP, having more than 50k clients on the server was troublesome at any message rate, but with WebSocket 200k clients were stable up to 20k messages/s. Beyond that, we probably hit EC2 limits again, and the results were unstable–some runs could complete successfully, others could not.
- The median latencies, for almost any number of clients and any message rate, are below 10 ms, which is quite impressive.
- The 99th percentile latency is around 300 ms for 200k clients at 20k messages/s, and around 200 ms for 50k clients at 50k messages/s.
We have also conducted some benchmarks by varying the payload size from the default of 50 bytes to 500 bytes to 2000 bytes, but the results we obtained with different payload size were very similar, so we can say that payload size has a very little impact (if any) on latencies in this benchmark configuration.
We have also monitored memory consumption in “idle” state (that is, with clients connected and sending meta connect requests every 30 seconds, but not sending messages):
- HTTP: 50k clients occupy around 2.1 GiB
- WebSocket: 50k clients occupy around 1.2 GiB, and 200k clients occupy 3.2 GiB.
The benefits of WebSocket being a lighter weight protocol with respect to HTTP are clear in all cases.
The conclusions are:
- The work the CometD project has done to improve performances and scalability were worth the effort, and CometD offers a truly scalable solution for server-side event driven web applications, for both HTTP and WebSocket.
- As the WebSocket protocol gains adoption, CometD can leverage the new protocol without any change required to applications; they will just perform faster.
- Server-to-server CometD communication can now be extremely fast by using WebSocket. We have already updated the CometD scalability cluster Oort to take advantage of these enhancements.
The server was one EC2 instance of type “m2.4xlarge” (67 GiB RAM, 8 cores Intel(R) Xeon(R) X5550 @2.67GHz) running Ubuntu Linux 11.04 (2.6.38-11-virtual #48-Ubuntu SMP 64-bit).
The clients were 10 EC2 instances of type “c1.xlarge” (7 GiB RAM, 8 cores Intel Xeon E5410 @2.33GHz) running Ubuntu Linux 11.04 (2.6.38-11-virtual #48-Ubuntu SMP 64-bit).
The JVM used was Oracle’s Java HotSpot(TM) 64-Bit Server VM (build 21.0-b17, mixed mode) version 1.7.0 for both clients and server.
The server was started with the following options:
-Xmx32g -Xms32g -Xmn16g -XX:-UseSplitVerifier -XX:+UseParallelOldGC -XX:-UseAdaptiveSizePolicy -XX:+UseNUMA
while the clients were started with the following options:
-Xmx6g -Xms6g -Xmn3g -XX:-UseSplitVerifier -XX:+UseParallelOldGC -XX:-UseAdaptiveSizePolicy -XX:+UseNUMA
The OS was tuned for allowing a larger number of file descriptors, as described here.