The IBM developer works article “Java EE meets Web 2.0” argues the case well  that asynchronous concerns must be addressed in web-2.0.  However it concludes that Jetty continuations are a quick hack and that Tomcat has the more sane and straightforward approach! We investigate this claim. 
The article contains a theoretical analysis of a webapp that needs to wait for a database connection and a remote webservice call before producing a response. The analysis indeed shows that an asynchronous approach provides much better scalability, fairness and uniformity of response latency – ie better Quality of Service (QoS).
Unfortunately the article then makes a classic mistake when it examines the available “solutions” that could realize the theoretical gains of the asynchronous approach. The article confuses the ability to perform asynchronous IO with the ability to schedule asynchronous handling of events such as the availability of a database connection or the arrival of a webservice response.
Frameworks such as Grizzly and Tomcat’s asynchronous IO support are targeted at asynchronous IO. They provide for an asynchronous callback when more IO data has arrived or it is possible to write more IO data without blocking.  Asynchronous IO is a valuable mechanism, but it only assists with reading requests and writing responses. To quote from the article:

Reading the request from the client. Our model ignores this cost because a HTTP GET request is assumed. In this situation, the time needed to read the request from the client does not add to the servlet-request duration.
Sending the response to the client. Our model ignores this cost because for short servlet responses, an application server can buffer the response in the memory and send it later to the client using NIO. And we assume that the response is a short one. In this situation, time needed to write the response to the client doesn’t add to the servlet-request duration.

I agree completely! While there are frequent use-cases where the time taken to read requests and write responses is significant, for most webapps these are not the critical waits that prevent webapp scalability and QoS and can be ignored, as this analysis has done.  When a web-2.0 webapp is waiting to send a comet message, or waiting for a database connection or waiting for a web service response, it is NOT waiting for servlet IO.  Thus a framework that provides asynchronous servlet IO provides little or no support for creating an asynchronous web application.   Thus I do not understand how this article can conclude that the facilities of Grizzly and Tomcat are suitable solutions as they are targeted at the very use-case that the analysis has chosen to ignore.
What these async IO APIs do provide are user extension points that can be accessed outside of the blocking servlet API model.   Thus if you wish to asynchronously wait for a JMS message, a datasource or a webserver, you may do so by initiating and completing these actions from an asynchronous IO handler provided by Grizzly or Tomcat.   But the asynchronous IO features would not be used, instead the callbacks are used simply to allow application supplied code to be invoked outside of the scope of a synchronous call into Servlet.service. If anything is to be labeled a hack, I think using a callback for it’s calling context rather than its real purpose qualifies.
More importantly, there is a big down side of avoiding calls into Servlet.service.  While it is a blocking call, it is the the API that all existing web frameworks and applications are written to.  So if you write your asynchronous handling in an asynchronous IO handler, then you cannot use any of the servlet frames or container facilities to:

  • authenticate and/or authorize the request
  • establish transactional and JNDI contexts
  • map the request and it’s parameters to application handlers and java objects via MVC or some other framework abstraction
  • create the response via templates, markup, translations, components or whatever abstraction  is provided by your  framework of choice
  • filter the request/response with  servlet filters for logging, security or some other aspect to be applied
  • be the target or source of a request dispatch that allows web components to be aggregated into larger webapps (eg portlets)

Ie, you are on your own and you can’t use JSP, JSF, Struts, Spring, Tapestry, Wicket, Stripes or any existing framework to handle requests or generate content. This might be OK while you are writing your first comet chat application that just exchanges a few simple JSON messages, but for any non-trivial web application, you will soon be missing all the facilities that you need and that drove the creation of so many web frameworks.
Typically, when a webapp wants to wait for a message, a datasource or a webservice response, it is deep inside application code, which is itself deep inside a web framework which has been invoked via the Servlet.service call.  The asynchronous IO approach of Grizzly and Tomcat provide absolutely no support for this.  Once you have entered Serlvet.service, you are blocking and blocking you will remain.  Their solution is to throw away 10 years of framework development and developer experience and to start again from asynchronous event handlers which are themselves challenges to write even for experienced IO developers (eg what are you going to do when you get called with 5 bytes of a 6 byte UTF-8 character! ). The solution is not to avoid calling Servlet.service, but to find a way to invoke it asynchronously.
Jetty’s Solution
Jetty also has asynchronous IO facilities, but these are used behind the scenes and are not exposed to application developers (eg large requests may be asynchronously read before dispatch to a servlet and large responses asynchronously flushed after execution of a servlet).  Instead, Jetty provides a Continuation mechanism where handling of a request within Servlet.service may be suspended and restarted in response to an event, availability of a DataSource or the arrival of a web service response. Jetty allows for asynchronous actions to be started within a normal servlet and for completion of those actions to also be handled within a servlet.  Normal frameworks and techniques can be used to process requests and generate responses.
The key feature of Continuations is that request handling can be suspended and held in a low resource state by the container before being resumed. In order to do this, Jetty needed to solve 2 problems: 1) how to get the execution out of the servlet/framework/application when suspend is called and 2) how to get the execution back into the servlet/framework/application when resume is called.   Luckily there are already existing mechanism that existing webapplications already deal with for existing and retrying requests and Jetty has extended these existing approaches to support Continuations. It is the approaches taken for these problems that has resulted in most criticism of Jetty and I accept that to some extend they are compromises taken while there is no standard support for suspendable request handling.  However these compromises should not detract from the key idea of suspendable request handling and future enhancements of the servlet API may well remove the need for compromise (see Servlet 3.0 proposal below).
Suspending a Request
When the flow of control is deep inside application code, inside a web framework, inside an invocation of Servlet.service, there is already a mechanism to abort execution – throwing an exception. Jetty  uses a special RuntimeException (that is created once and reused for efficiency) to abort the execution of a servlet and return control to the container.  For frameworks that use the container’s handling of exceptions, this works without modification. For frameworks that catch and attempt to handling exceptions, some modification may be needed to either prevent or discard an error response  (this is also addressed in the Servlet 3.0 proposal).   The Jetty container catches the special exception and puts the request into a low resources state while it waits for a resume or a timeout.  Up until the point of suspension, the request was handled normally by framework. filters and JEE mechanisms.
Resuming a Request
When the waited-for event occurs (Datasource available, webservice response etc.), the Continuation is resumed, which causes the suspended request to be re-run. Theoretically we want the request to be re-run from the exact point that it was suspended, but we don’t have that ability without resorting to bytecode manipulation.  Instead we simply run the request again as if it had just been received from the client.  This works because the HTTP protocol is stateless and the frameworks have already been written to handle the statelessness and the possibility that clients, caches and proxies may retry requests.  The state the application deals with is in the request URI, the request parameters, the session and any application data accessed via keys derived from the URI, parameters and/or session.  All this state is preserved over a request retry, so the request handling proceeds to the same point that the request was suspended.  At that point where it previously suspended to wait for an event, a response or a connection, there will be a “here’s one I prepared earlier” moment as the code discovers the event that caused the resume and proceeds without suspending.   It is not always that simple as requests are not 100% stateless as input can be consumed, however it is a very simply modification to place the results of the parsed input in a request attribute to be available for the retried request.
Obviously there is a cost involved here as the code before the suspension is executed twice, so there is benefit in architecting applications so that any suspensions happen as soon as possible. A good example of this is applying a Throttling Filter in front of an existing webapplication. The filter can use Continuations to restrict the number of requests allowed into the normal servlet handling to be the same number as there are DB connections available. Requests that arrive when there are no DB connections available may be suspended before much application code is executed. When connections become available, because the requests could have been suspended after authentication, then application user details may be used to preferentially resume some users. Once requests are resumed, they continue past the filter to be handled normally by the existing servlet code.   If asynchronous IO handlers were used instead of Continuations then none of the existing code for authentication or response generation could be used and the application would need to be rewritten as a non-servlet based application, potentially without the use of an existing framework.
Servlet 3.0
I admit the use of an exception to suspend request handling is not as elegant as I would like.  I normally subscribe to the line of thought that Exceptions are for exceptional circumstances and that suspending requests should be a normal operation.
Thus in my proposal for JSR 315 Servlet 3.0, I have removed the need for an exception to be thrown, but kept the ability for requests to be suspended and retried.   With this proposal, after suspending a request handling simply returns out of the service method. If the servlet has been invoked by new code that is aware of suspension, then they will also simply return. If the servlet has been invoked by code that is unaware of suspension, then the response object is disabled so that any actions performed are ignored.   Also with support in the API, any code that does need to deal with the differences between an initial and retried request will be simplified.
The essential nature of the idea is that we need to access asynchronous behavior from within the Servlet.service method where we have the benefit of all the container, framework and application provided facilities for authentication, authorization, unmarshalling, marshalling, MVC, components, templates, portlets, etc. etc. an the benefit of all our developers experience with creating web applications. Let’s not throw the baby out with the bath water!
Note however that I am not saying that asynchronous servlet IO like that provided by Tomcat, Grizzly (and Jetty) is not valuable.  It too has a place in Servlet 3.0, but in my proposal it is not part of the servlet API.  Instead asynchronous IO would be provided on a new content handler API that would allow asynchronous code to be written to parse/generate bytes streams to/from higher level objects. These higher level objects could then be made available to servlets via a getContent and setContent API.  Thus the content converters could asynchronously process bytes to an XML Document and just pass the completed DOM to the servlet.  Moreover, the proposal allows for the container to provide common content converters so that every application would not need to provide their own.   More importantly, container provided converters could be written to the container IO and buffering mechanisms in the most efficient way without needing to go via a standard API that must pick one technology (eg stream with byte arrays, NIO channels with buffers etc.).  Applications with specific conversion needs could still provide their own converters if needed, but for things like file upload and XML handling common converters would be provided.
One would think that the asynchronous IO facilities of Grizzly, Tomcat and Jetty would be perfect for implementing converters that would be optimally efficient for those containers.
Conclusion
The article correctly identifies asynchronous issues as a key use-case that needs to be addressed for web-2.0 and JEE and provides some valuable analysis.  Unfortunately the article mixes the concerns of asynchronous IO with those of asynchronous access to application resources such as databases and webservices.  This mixing has clouded their analysis of available solutions.  I believe it has also prevented them looking past the exceptions used by Jetty Continuations to see the true value of suspendable requests from within the servlet model.  While Grizzly and Tomcat do provide valuable asynchronous IO features, I do not believe they well address the key asynchronous use-cases needed by web-2.0, so much so that  even the analysis of the article chose to ignore the asynchronous IO contributions to latency and request handling.
The good thing is that effort is being made to educate developers about the need to consider asynchronous concerns.  Hopefully this education can continue to the point that we can correctly separate the issues of asynchronous IO from asynchronous application event handling/scheduling.


3 Comments

Anonymous · 21/11/2007 at 20:23

Great write up… I hope to see Continuations become a more heralded technique in ajax development.

Vishal Santoshi · 24/11/2007 at 04:21

This is very interesting. Like Continuum we developed an almost anlogous technique to avoid multiple executions of a sme request ( We identified the request by an id) together with the ability to asycnhronously deliver response to the user, by basing the architecture on the basic stateless nature of HTTP protocol .

The meat of our architecture is in this discussion

An application server threads hit the some servlet to get the juice from main.g with a request Id RID1. The following parts of the system should exist:

  • Create a FIFO lock that is really a simple lock that guarantees it release to threads based on the order they tried acquiring the lock. This way when the thread acquires this lock it goes on its way to start executing.Subsequent threads with request id = RID1 Will try to acquire the locks but they will be blocked. Make sure this lock is designed per request Id, different threads with different request ids will have different FIFO lock object.This will give the following :
    • No need to spawn ANY threads for the waiting application server threads, we will just use them to wait until the time is right to dispatch a response back to the server.
    • No need for native thread level notification the FIFO Mutex (lock) will handle that.
  • Create an internal ThreadPool whose sole purpose is to cater for the active threads, that are really doing the real execution, with the following properties
    • It only creates non-daemon threads.
    • core size of the pool is the number of processor in the system running the JVM.We may revist the exact configuration of the pool based on tests and thus observations.
    • user the ThreadPoolExecutor.CallerRunsPolicy() when the traffic of threads coming to the system is more what the executor can handle.Again this may be revised based on tests and observations.
  • Once the app server thread get a hold on the lock it creates a ActiveProcess  callable and submits to the executor noted above, after successful entry to the queuing system.The app server threads goes ahead and blocks on an execution of the request by a thread from the pool. The active app server thread is blocked until we get a successfull completion of the Callable or the blocking call  is interrupted when pause/timeout  is reached.Note that all dispatches are done within the framework and the blocking  may not return any thing of substance, but an indication that the operation is successfully completed or just wait to be interrupted
  • The blocking can be resolved ( the block on the app server waiting on the get() of the Future in the 1.5 impl) by a normal successfull execution or by a signal when a Pause/Timeout state is reached ( This is a signal of the framework to the user that it’s request is taking long and he/she has to wait). In both cases the get()/blocking call method unblocks and the app server thread waiting on it , is returned back to the app server ( The same way as continuum would handle concurrent requests). Our ability to handle multiple requests with least overhead on open TCP connections is dependent on the time out ( before a pause page is shown and the server thread released back )
  • Please note that a successful unblocking of any thread , and that includes the app server thread executing the above Callable, through a pool , is preceded by a dispatch of some payload to a user agent. This assertion is true always other than in tha case where we to limit the number of waiting threads on an RID, take measures to releive app server thread that is waiting for longest time, to accomodate a new request thread on the same RID. In that case we unblock the oldest thread without any dispatch on it.Of course a rentrant, interruptible FIFO Mutex, will make this easy.

          The result of the execution stays with the mapping for the RID. Any subsequent requests for the same RID will get that result posted to them.  These subsequent requests coccur by a resubmission of the request through the PausePage that was dispatched to the user when timeout occured.

It seems very much like continuum in these ways.

1. A server thread is held on to for as long as the execution is not complete or the requested timeout does not occur.

Thus if there is a timeout the server thread is released back to the pool, and a "pause page" which has the ability to submit the same RID through java script is released to the user.

2. The result of an execution is retained in a WeakHashMap, with a stong reference to the key, till we do not have at least on user thread that we can submit the result to.

We have tested this framwork with very promising results and has been running our production systems for about 3 years now … One implementation was in 1.4 and the most recent one is in 1.5 JVM version. Also the framwork is generic enough to work with any driver ( In this case it is an HTTPDriver)  and configurable timeouts and is based heavily ( the newer version ) on the synchronizer framework on the java.util.concurrent package.

Good to know that this may be something u are proposong as a standard.

Vishal Santoshi · 24/11/2007 at 04:23

This is very interesting. Like Continuum we developed an almost anlogous technique to avoid multiple executions of a sme request ( We identified the request by an id) together with the ability to asycnhronously deliver response to the user, by basing the architecture on the basic stateless nature of HTTP protocol .

The meat of our architecture is in this discussion

An application server threads hit the some servlet to get the juice from main.g with a request Id RID1. The following parts of the system should exist:

  • Create a FIFO lock that is really a simple lock that guarantees it release to threads based on the order they tried acquiring the lock. This way when the thread acquires this lock it goes on its way to start executing.Subsequent threads with request id = RID1 Will try to acquire the locks but they will be blocked. Make sure this lock is designed per request Id, different threads with different request ids will have different FIFO lock object.This will give the following :
    • No need to spawn ANY threads for the waiting application server threads, we will just use them to wait until the time is right to dispatch a response back to the server.
    • No need for native thread level notification the FIFO Mutex (lock) will handle that.
  • Create an internal ThreadPool whose sole purpose is to cater for the active threads, that are really doing the real execution, with the following properties
    • It only creates non-daemon threads.
    • core size of the pool is the number of processor in the system running the JVM.We may revist the exact configuration of the pool based on tests and thus observations.
    • user the ThreadPoolExecutor.CallerRunsPolicy() when the traffic of threads coming to the system is more what the executor can handle.Again this may be revised based on tests and observations.
  • Once the app server thread get a hold on the lock it creates a ActiveProcess  callable and submits to the executor noted above, after successful entry to the queuing system.The app server threads goes ahead and blocks on an execution of the request by a thread from the pool. The active app server thread is blocked until we get a successfull completion of the Callable or the blocking call  is interrupted when pause/timeout  is reached.Note that all dispatches are done within the framework and the blocking  may not return any thing of substance, but an indication that the operation is successfully completed or just wait to be interrupted
  • The blocking can be resolved ( the block on the app server waiting on the get() of the Future in the 1.5 impl) by a normal successfull execution or by a signal when a Pause/Timeout state is reached ( This is a signal of the framework to the user that it’s request is taking long and he/she has to wait). In both cases the get()/blocking call method unblocks and the app server thread waiting on it , is returned back to the app server ( The same way as continuum would handle concurrent requests). Our ability to handle multiple requests with least overhead on open TCP connections is dependent on the time out ( before a pause page is shown and the server thread released back )
  • Please note that a successful unblocking of any thread , and that includes the app server thread executing the above Callable, through a pool , is preceded by a dispatch of some payload to a user agent. This assertion is true always other than in tha case where we to limit the number of waiting threads on an RID, take measures to releive app server thread that is waiting for longest time, to accomodate a new request thread on the same RID. In that case we unblock the oldest thread without any dispatch on it.Of course a rentrant, interruptible FIFO Mutex, will make this easy.

          The result of the execution stays with the mapping for the RID. Any subsequent requests for the same RID will get that result posted to them.  These subsequent requests coccur by a resubmission of the request through the PausePage that was dispatched to the user when timeout occured.

It seems very much like continuum in these ways.

1. A server thread is held on to for as long as the execution is not complete or the requested timeout does not occur.

Thus if there is a timeout the server thread is released back to the pool, and a "pause page" which has the ability to submit the same RID through java script is released to the user.

2. The result of an execution is retained in a WeakHashMap, with a stong reference to the key, till we do not have at least on user thread that we can submit the result to.

We have tested this framwork with very promising results and has been running our production systems for about 3 years now … One implementation was in 1.4 and the most recent one is in 1.5 JVM version. Also the framwork is generic enough to work with any driver ( In this case it is an HTTPDriver)  and configurable timeouts and is based heavily ( the newer version ) on the synchronizer framework on the java.util.concurrent package.

Good to know that this may be something u are proposong as a standard.

Comments are closed.