The tomcat 6 developers have proposed an asynchronous IO extension as their solution for Comet and Ajax push.  I have long argued that asynchronous handling is needed for Servlets (for comet and other use-cases), but that it is very important to make the distinction between asynchronous IO and asynchronous handling of requests. Asynchronous programming is hard and asynchronous IO even harder.  I maintain that asynchronous IO should be implemented by the container and that only asynchronous events should be delivered to an asynchronous servlet. As if to illustrate my point, the example code that tomcat provide for their asynchronous IO contains some classic bugs and inefficiencies that I examine here as they well illustrate why we should be making all efforts possible to encapsulate asynchronous IO below the level of the servlet API.

What to do with zero or more bytes?
The first error that the tomcat example makes is that it does not well handle the fact that asynchronous read may not return all he bytes you need for handle the content. Content can be provided in little chunks by a simple client, a slow network, a busy OS or a malicious attacker. The code that the tomcat example has for handling a read events is:

    ...
    if (event.getEventType() == CometEvent.EventType.READ) {
        InputStream is = request.getInputStream();
        byte[] buf = new byte[512];
        do {
            int n = is.read(buf); //can throw an IOException
            if (n > 0) {
                log("Read "+n+" bytes: " + new String(buf, 0, n)
                 +" for session: "+request.getSession(true).getId());
            }
         ...
        } while (is.available() > 0);}

The bug with this code is that it assumes a 1:1 mapping for bytes to characters and that any bytes read can be converted to a String.  If this JVM is using utf-8 as the
default encoding or the example is extended to explicitly handle character encodings, then  there is the possibility that  the read may return only a partial multi-byte character.   You can’t convert 2 bytes of a 3 byte unicode character with new String(…)!

So this seemingly simple example would need to be made a lot more complex before it could be exposed to the real world.  Real world code would
need to do something like:

  • parse the bytes to determine the boundary between content that can be handled and content that must be buffered waiting for more bytes to arrive.
  • persist unused bytes in a buffer
  • handle any of the bytes/characters that can be handled so as to free space in the buffer so a full buffer will not prevent the extra bytes required from being received.

This is tricky code and the result will be horribly inefficient:

  • Decoding utf-8 is non-trivial
  • Many extra temporary byte buffers will be created
  • If there are many connected users, then there could be many additional buffers persisted between callbacks, consuming significant memory.
  • The container will have already buffered the content in its own efficient buffers, so the data is duplicated moving it to the temp byte buffer.
  • Data that is stored in efficient container buffers must be copied into user memory to be handled as a byte array.  If the content is destined for a File or another network connection, it would be better to allow the container’s efficient buffers to be directly accessed by the operating system and avoid user space handling entirely.

Asynchronous IO is hard and it is even harder to make it efficient. The flaw with this example is both with the actual execution (not handling partial characters) and with the approach of expecting user supplied code to deal with the asynchronous IO in the first place.   A far better approach and the one that I advocate for Servlet 3.0 is to allow the container to handle asynchronous IO and data conversions.  For example If the application wants the request content as a String, then the container can perform the conversion efficiently without extra buffers or copies.

Did I write or should I go now?
The second bug in the tomcat example is with the writing of the response content.  It is unclear from the supporting text if the writer is in blocking mode or not, but either way this code is buggy:

// Send any pending message on all the open connections
for (int i = 0; i < connections.size(); i++) {
    try {
        PrintWriter writer = connections.get(i).getWriter();
        for (int j = 0; j < pendingMessages.length; j++) {
            writer.println(pendingMessages[j] + "<br>");
        }
        writer.flush();
        ...
     }
 ...}

If the underlying stream is in asynchronous non-blocking mode, then there is no guarantee that the messages will be written and the Writer.println method has no way to tell the caller that not all the content has been written. Of course the horrid multi-byte character issue remains as partial characters can be written and the unwritten bytes need to be buffered.  Thus it is probably the case that the stream is in blocking mode and the problem becomes that with a single thread writing the responses to all clients, one slow (or malicious) client can block that thread and prevent all other clients from receiving their messages.  Without the complexities of asynchronous writes, this example would need to be modified to have threads dispatched to handle each client and a thread pool to efficiently recycle those threads – but wait…. isn’t that all part of the mechanisms provided by the servlet container?  By avoiding doing your work inside Servlet.dispatch, the developer is going to have to re-invent quite a few wheels: buffering, dispatching, threadpools etc. etc.

Conclusion
Tomcat has good asynchronous IO buffers, dispatching and thread pooling built inside the container, yet when the  experienced developers that wrote tomcat came to write a simple example of using their IO API, they include some significant bugs (or completely over-simplified the real work that needs to be done). Asynchronous IO is hard and it is harder to make efficient. It is simply not something that we want application or framework developers having to deal with, as if the container developers can’t get it right, what chance do other developers not versed in the complexities have?!   An extensible asynchronous IO API is a good thing to have in a container, but I think it is the wrong API to solve the use-cases of Comet, Ajax push or any other asynchronous scheduling concerns that a framework developer may need to deal with.

 


2 Comments

Constantine Plotnikov · 23/11/2007 at 14:34

Hardness of asynchronous IO is not a necessary state of affairs. It is just the current state in Java. Thus your blog entry overgeneralize a bit. We have to think how to make it easier rather then dismissing it as impossible.

The tomcat 6 API has an obvious defect that it does not notify about change in availability for write. This is a bug, and it needs to be fixed. If it is adequately developed, the tomcat’s approach could be quite a good one.

The problems with most of current asynchronous API in Java is that they are designed in ad hoc manner. There are neither a set of  clear requirements, nor lower level components that can be reused. Each reinvented API wheel come with own defects and missing use cases.

As for relatively easy asynchronous IO, you could check AsyncObjects framework (see dev guide, samples, and net.sf.asyncobjects.io.IOUtils class). Note that the provided IO API is just unfinished experiment with immutable buffers. I even have not yet done comparative benchmarks with them. However in the past there was version with mutable buffers that was just a slightly more difficult to use. There are also experimental adapters Java EE in svn.

You could also check E programming language for what could be the best language level support for asynchronous components that I know right now.

Greg Wilkins · 25/11/2007 at 13:50

Constantine,

I completely agree – I should say that Asynchronous IO is hard if you do it all yourself.   What I’m trying to say is that the solution needs a bit more than an API based on streamed byte arrays and event call backs if it is to be widely used.  I’m advocating that the async solutions applied to the servlet model should learn from more evolved solutions and provide more assistance for the application and/or framework developers.

cheers

Comments are closed.