Jetty 5.0.0 is out the door and the 2.4 servlet spec is implemented.
So what’s next for Jetty and what’s next for the servlet API? It’s been a long journey for
Jetty from it’s birth in late 1995 to the 5.0.0 release. When the core of Jetty was written,
there was no J2ME, J2EE, or even J2SE, nor a servlet API, no non-blocking IO and multiple CPU
machines were rare and no JVM used them. The situation is much the same for the servlet
API, which has grown from a simple protocol handler to a core component architecture for
enterprise solutions.
While these changing environments and requirements have mostly been handled well, the
results are not perfect: I have previously blogged about the servlet API
problems
and Jetty is no longer best of breed when it comes to raw speed or features.
Thus the I believe the mid term future for Jetty and the Servlet API should involve a bit more
revolution than evolution. For this purpose the
JettyExperimental(JE) branch
has been created and is being used to test ideas to greatly improve the raw HTTP performance
as well as the application API. This blog introduces JE and some of my ideas for how Jetty and
the servlet API could change.
Push Me Pull You
At the root of many problems with the servlet API is that it is a pull-push API, where the servlet
is given control and pulls headers, parameters and other content from the request object before pushing
response code, headers and content at the request object. This style of API, while very convenient for
println style dynamic content generation, has many undesirable consequences:
- The request headers must be buffered in complex/expensive hash structures so that the application
can access it in arbitrary order. One could ask why application code should be handling HTTP headers anyway… - The application code contains the IO loops to read and write content. These IO loops
are written assuming the blocking IO API. - Pull-push API is based on stream, read and writer abstractions, which makes it impossible for the
servlet application code
to use efficient IO mechanisms such as gather-writes or memory mapped file buffers for
static content. - The response headers must be buffered in complex/expensive hash structures so that applications
can set and reset them in arbitrary order. One could ask why application code should be writing HTTP headers anyway… - The application code needs to be aware of HTTP codes and headers. The API itself provides
no support for separating the concerns of content generation and content transport.
From a container implementers point of view, it would be far more efficient for
the servlet API to be push-pull, where the container pushes headers, parameters and content
into the API as they are parsed from a request and then pulls headers and content from the
application as they are needed to construct the response.
This would remove the need for applications to do IO, additional buffering, arbitrary
ordering and dealing with application developers that don’t read HTTP RFCs.
Unfortunately a full push-pull API would also push an event driven model onto the application,
which is not an easy model to deal with nor suitable for the simple println style of
dynamic content generation used for most “hello world” inspired servlets.
The challenge of Jetty and servlet API reform is to allow the container to be written in
the efficient push-pull style, but to retain the fundamentals of pull-push in the application
development model we have come to know and live with. The way to do this is to change the
semantic level of what is being pushed and pulled, so that the container is written to
push-pull HTTP headers and bytes of data, but the application is written to pull-push content
in a non-IO style.
Content IO
Except for the application/x-www-form-urlencoded
mime-type, the application must perform it’s own IO to read content from the request and to
write content to the response. Due to the nature of the servlet API and threading model,
this IO is written assuming blocking semantics.
Thus it is difficult to apply alternative IO methods, such as NIO
Unfortunately the event driven nature of non-blocking IO is incompatible with the servlet
threading model, so it is not possible to simply ask developers to start writing IO assuming
non-blocking IO semantics or using NIO channels.
The NIO API cannot be effectively used without direct access to the low level IO classes, as
low level API is required to efficiently write static content using a file
MappedByteBuffer to a
WritableByteChannel
or to combine content
and HTTP header into a single packet without copying using a
GatheringByteChannel.
The true power of the NIO
API cannot be abstracted into InputStreams and OutputStreams.
Thus to use NIO, the servlet API must either expose these low levels (bad idea – as NIO might not
always be the latest and greatest) or to take away content IO responsibilities from the application
developers.
The answer is to take away from the application servlets the responsibility for performing
IO. This has already been done for application/x-www-form-urlencoded
, so
why not let the container handle the IO for text/xml
, text/html
etc.
If the responsible for reading and writing bytes (or characters) was moved to the container,
then the application servlet could code could deal with higher level content Objects
such as org.w3c.dom.Document, java.io.File or java.util.HashMap. Such a container mechanism
would avoid the current need for many webapps to provide their own implementation of a
multipart request class or
Compression filter.
If we look at the client side of HTTP connections, the
java.net package provides the
ContentHandlerFactory mechanism so that
the details of IO and parsing content can be hidden behind a simple call to
getContent(). Adding a similar mechanism (and
a setContent()
equivalent) to the servlet API would move the IO responsibility
to the container. The container could push-pull bytes from the content factories and
the application could pull-push high level objects from the same factories.
Note that a content based API does not preclude streaming of content or require that large
content be held in memory. Content objects passed to and from the container could include
references to content (eg File), content handlers (JAXP handler) or even Readers, Writers,
InputStream and OutputStreams.
HTTP Headers
As well as the IO of content, the application is currently responsible for handling the
associated meta-data such as character and content encoding, modification dates and caching control.
This meta-data is mostly well specified in HTTP and MIME RFCs and could be best handled by
the container itself rather than by the application or the libraries bundled with it. For
example it would be far better for the container to handle gzip encoding of content directly
to/from it’s private buffers rather than for webapps to bundle their own CompressFilter.
Without knowledge of what HTTP headers that an application uses or in what order they
will be accesses, the container is forced to parse incoming requests into expensive hashtables
of headers. The vast majority of application do not deal with most headers in
a HTTP request, for example with the following request from mozilla-firefox:
GET /MB/search.png HTTP/1.1 Host: www.mortbay.com User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040614 Firefox/0.8 Accept: image/png,image/jpeg,image/gif;q=0.2,*/*;q=0.1 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 keep-alive: 300 Connection: keep-alive Referer: http://localhost:8080/MB/log/ Cookie: JSESSIONID=1ttffb8ss1idk If-Modified-Since: Fri, 21 Nov 2003 16:59:29 GMT Cache-Control: max-age=0
an application is likely to only make indirect usage of the Host
and Cookie
headers and perhaps direct usage of the If-Modified-Since
and Accept-Encoding
headers. Yet all these headers and values are available via the HttpServletRequest object to be pulled
by the application at any time during the request processing. Expensive hashmaps are created
and values received as bytes either have to be stringified or buffers kept aside for later
lazy evaluation.
If the application was written at a content level, then most (if not all) HTTP header
handling could be performed by the content factories. For example, if given an org.w3c.dom.Document
to write, the container could set the http headers for a content type of text/xml
with
an acceptable charset and encoding selected by server configuration and the request headers.
Once the headers are set, the byte content can be generated accordingly by the container,
but scheduled so that excess buffering is not required and non-blocking IO can be done.
Unfortunately, not all headers will be able to be handled directly from the content objects.
For example, If-Modified-Since
headers could be handled for a File content Objects,
but not for a org.w3c.dom.Document. So a mechanism for the application to communicate additional
meta-data will need to be provided.
Summary and Status
JettyExperimental now implements most of HTTP/1.1 is a push-pull architecture that works with
either bio or nio. When using nio, gather writes are used to combine header and content into
a single write and static content is served directed from mapped file buffers. An advanced
NIO scheduler avoid many of the
NIO problems
inherent with a producer/consumer model.
Thus JE is ready as a platform to experiment with the content API ideas introduced
above. I plan to initially work toward a pure content based application API and thus to
discover what cannot be easily and efficiently fitted into that model. Hopefully what
will result is lots of great ideas for the next generation servlet API and a great HTTP
infrastructure for Jetty6.
1 Comment
pune net · 08/07/2009 at 06:59
If one downloaded a big file and got network problems on 99% of the file, one wouldn’t be happy to discover the need to download the complete file again after getting network back. If a browser decides to check the cached images for changes, it would send a HEAD request to determine under each the unique file identifier and its timestamp or it would send a conditional GET request to determine the response status. If the image isn’t changed according to the server response, the client won’t re-request the image again to save the network bandwidth and other efforts.
Comments are closed.