One can only improve what can be reliably measured. To assert that Jetty’s performance is as good as it can be, doesn’t degrade over time and to facilitate future optimization work, we need to be able to reliably measure its performance.

The primary goal

The Jetty project wanted an automated performance test suite. Every now and then some performance measurements were done, with some ad-hoc tools and a lot of manual steps. In the past few months an effort has been made to try to come up with an automated performance test suite that could help us with the above goals and more, like making it easy to better visualize the performance characteristics of the tested scenarios for instance.

We have been working on and off such test suite over the past few months. The primary goal was to write a reliable, fully automated test that can be used over time to measure, understand and compare performance over time.

A basic load-testing scenario

A test must be stable over time, and the same is true for performance tests: these ought to report stable performance over time to be considered repeatable. Since this is already a challenge in itself, we decided to start with the simplest possible scenario that is limited in realism but easy to grasp and still useful to get a quick overview of the server’s overall performance.

The basis of that scenario is a simple HTTPS (i.e.: HTTP/1.1 over TLS) GET on a single resource that returns a few bytes of in-memory hard-coded data. To avoid a lot of complexity, the test is going to run on dedicated physical machines that are hosted in an environment entirely under our control. This way, it is easy to assert what kind of performance they’re capable of, that the performance is repeatable, that those machines are not doing anything else, that the network between them is capable enough and not overloaded, and so on.

Load, don’t strangle

As recommended in the Jetty Load Generator documentation, to get meaningful measurements we want one machine running Jetty (the server), one generating a fixed 100 requests/s load (the probe) and four machines each generating a fixed 60K requests/s load (the loaders). This setup is going to load Jetty with around 240K (4 loaders doing 60K each) requests per second, which is a good figure given the hardware we have: it was chosen based on the fact that it is enough traffic to get the server machine to burn around 50% of its total CPU time, i.e.: loading but not strangling it. The way we found this figure simply was by trial and error.

Choosing a load that will not push the server to constant 100% CPU is important: while running a test that tries to run the heaviest possible load does have its use, such test is not a load test but a limit test. A limit test is good for figuring out how a software behave under a load too heavy for the hardware it runs on, for instance to make sure that it degrades gracefully instead of crashing and burning into flames when a certain limit is reached. But such test is of very limited use to figure out how fast your software responds under a manageable (i.e.: normal) load, which is what we are most commonly interested in.

Planning the scenario

The server’s code is pretty easy since it’s just about setting up Jetty: configuring the connector, SSL context and test handler is basically all it takes. For the loaders, the Jetty Load Generator is meant just for that task so it’s again fairly easy to write this code by making use of that library. The same is also true for the probe as the Jetty Load Generator can be used for it too, and can be configured to record each request’s latency too. And say we want to do that for three minutes to get a somewhat realistic idea of how the server does behave under a flat load.

Deploying and running a test over multiple machines can be a daunting task, which is why we wrote the Jetty Cluster Orchestrator whose job is to make it easy to write some java code to distribute, execute and control it on a set of machines, using only the SSH protocol. Thanks to this tool, getting some code to run on the six necessary machines can be done simply while writing a plain standard JUnit test.

So we basically have these three methods that we get running over the six machines:

void startServer() { ... }

void runProbeGenerator() { ... }

void runLoadGenerator() { ... }

We also need a warmup phase during which the test runs but no recording is made. The Jetty Load Generator is configured with a duration, so the original three minutes duration has to grow by that warmup duration. We decided to go with one minute for that warmup, so the total load generation duration is now four minutes. So both runProbeGenerator() and runLoadGenerator() are going to run for four minutes each. After the first minute, a flag is flipped to indicate the end of the warmup phase and to make the recording start. Once runProbeGenerator() and runLoadGenerator() return the test is over and the server is stopped then the recordings are collected and analyzed.

Summarizing the test

Here’s a summary of the procedure the test is implementing:

  1. Start the Jetty server on one server machine: call startServer().
  2. Start the Jetty Load Generator with a 100/s throughput on one probe machine: call runProbeGenerator().
  3. Start the Jetty Load Generator with a 60K/s throughput on four load machines: call runLoadGenerator().
  4. Wait one minute for the warmup to be done.
  5. Start recording statistics on all six machines.
  6. Wait three minutes for the run to be done.
  7. Stop the Jetty server.
  8. Collect and process the recorded statistics.
  9. (Optional) Perform assertions based on the recorded statistics.

Results

It took some iterations to get to the above scenario, and to get it to run repeatably. Once we got confident the test’s reported performance figures could be trusted, we started seriously analyzing our latest release (Jetty 10.0.2 at that time) with it.

We quickly found a performance problem with a stack trace generated on the fast path, thanks to the Async Profiler’s flame graph that is generated on each run for each machine. Issue #6157 was opened to track this problem that has been solved and made it to Jetty 10.0.4.

After spending more time looking at the reported performance, we noticed that the ByteBuffer pool we use by default is heavily contended and reported as a major time consumer by the generated flame graphs. Issue #6379 was opened to track this issue. A quick investigation of that code proved that minor modifications could provide an appreciable performance boost that made it to Jetty 10.0.6.

While working on our backlog of general cleanups and improvements, issue #6322 made it to the top of the pile. Investigating it, it became apparent that we could improve the ByteBuffer pool a step further by adopting the RetainableByteBuffer interface everywhere in the input path and slightly modifying its contract, in a way that enabled us to write a much more performant ByteBuffer pool. This work was released as part of Jetty 10.0.7.

Current status of Jetty’s performance

Here are a few figures to give you some idea of what Jetty can achieve: while our test server (powered by a 16 cores Intel Core i9-7960X) is under a 240.000 HTTPS requests per second load, the probe measured that most of the time, 99% of its own HTTPS requests were served in less than 1 millisecond, as can be seen on this graph.

Thanks to the collected measurements, we could add performance-related assertions to the test and made it run regularly against 10.0.x and 11.0.x to make sure performance won’t unknowingly degrade over time for those branches. We are now also running the same test over HTTP/1.1 clear text and TLS as well as HTTP/2.0 clear text and TLS too.

The test also works against the 9.4.x branch but we do not yet have assertions for that branch because it has a different performance profile, so a different load profile is needed and different performance figures are to be expected. This has yet to happen but that is in our todo list.

More test scenarios are going to be added to the test suite over time as we see fit. For instance, to measure certain load scenarios we deem important, to cover certain aspects or features or any other reason why we’d want to measure performance and ensure its stability over time.

In the end, making Jetty as performant as possible and continuously optimizing it has always been on Webtide’s mind and that trend will continue in the future!


2 Comments

stolsvik · 24/11/2021 at 22:12

Would it not also be interesting to see how many reqs/s it was possible to handle with the different versions, now and in the future? That is, gradually push it from the “load test” towards the “limit test”, to see where the limit is – for some reasonable metric of what constitute a fail? That would give valuable information, like e.g. whether a new version could handle more or less reqs/s than the previous on the same hardware. Also, it could give information of how the degradation when reaching the limit was experienced – some modes of failure might be better than others. (E.g. random requests receiving 503 Overloaded might be better than wildly differing latencies when approaching the limit)

    Ludovic Orban · 26/11/2021 at 10:14

    Limit testing is a subject following up performance testing. It’s basically the thing to do to be able to answer those two question (which pretty much are what you describe with your examples):

    – How much overload can my hardware sustain before degrading the service to unacceptable levels?
    – How badly and how fast does my service degrade when overloaded?
    – What can I do to mitigate the degradation of my service when overloaded?

    It’s certainly a subject that we will touch sometime in the future, as more time gets spent on the performance effort.

Comments are closed.