I have been working lately with the new JDK 7’s Async I/O APIs (“AIO” from here), and I would like to summarize here my findings, for future reference (mostly my own).
My understanding is that the design of the AIO API aimed at simplifying non-blocking operations, and it does: what in AIO requires 1-5 lines of code, in JDK 1.4’s non-blocking APIs (“NIO” from here) requires 50+ lines of code, and a careful threading design of those.
The context I work in is that of scalable network servers, so this post is mostly about AIO seen from my point of view and from the point of view of API design.
Studying AIO served as a great stimulus to review ideas for Jetty and learn something new.

Introduction

Synchronous API are simple: ServerSocketChannel.accept() blocks until a channel is accepted; SocketChannel.read(ByteBuffer) blocks until some bytes are read, and SocketChannel.write(ByteBuffer) is guaranteed to write everything from the buffer and return only when the write has completed.
With asynchronous I/O (and therefore both AIO and NIO), the blocking guarantee is gone, and this alone complicates things a lot more, and I mean a lot.

AIO Accept

To accept a connection with AIO, the application needs to call:

<A> AsynchronousServerSocketChannel.accept(A attachment, CompletionHandler<AsynchronousSocketChannel, ? super A> handler)

As you can see, the CompletionHandler is parametrized, and the parameters are an AsynchronousSocketChannel (the channel that will be accepted), and a generic attachment (that can be whatever you want).
This is a typical implementation of the CompletionHandler for accept():

class AcceptHandler implements CompletionHandler<AsynchronousSocketChannel, Void>
{
    public void completed(AsynchronousSocketChannel channel, Void attachment)
    {
        // Call accept() again
        AsynchronousServerSocketChannel serverSocket = ???
        serverSocket.accept(attachment, this);
        // Do something with the accepted channel
        ...
    }
    ...
}

Note that Void it is used as attachment, because in general, there is not much to attach for the accept handler.
But nevertheless the attachment feature is a powerful idea.
It turns out immediately that the code needs the AsynchronousServerSocketChannel reference (see the ??? in above code snippet) because it needs to call AsynchronousServerSocketChannel.accept() again (otherwise no further connections will be accepted).
Unfortunately the signature of the CompletionHandler does not contain any reference to the AsynchronousServerSocketChannel that the code needs.
Ok, no big deal, it can be referenced with other means.
At the end it is the application code that creates both the AsynchronousServerSocketChannel and the CompletionHandler, so the application can certainly pass the AsynchronousServerSocketChannel reference to the CompletionHandler.
Or the class can be implemented as anonymous inner class, and therefore will have the AsynchronousServerSocketChannel reference in lexical scope.
It is even possible to use the attachment to pass the AsynchronousServerSocketChannel reference, instead of using Void.
I do not like this design of recovering needed references with application intervention; my reasoning is as follows: if the API forces me to do something, in this case call AsynchronousServerSocketChannel.accept(), should not have been better that the AsynchronousServerSocketChannel reference was passed as a parameter of CompletionHandler.completed(...) ?
You will see how this lack is the tip of the iceberg in the following sections.
Let’s move on for now, and see how you can connect with AIO.

AIO Connect

To connect using AIO, the application needs to call:

<A> AsynchronousSocketChannel.connect(SocketAddress remote, A attachment, CompletionHandler<Void, ? super A> handler);

The CompletionHandler is parametrized, but this time the first parameter is forcefully Void.
The first thing to notice is the absence of a timeout parameter.
AIO solves the connect timeout problem in the following way: if the application wants a timeout for connection attempts, it has to use the blocking version:

channel.connect(address).get(10, TimeUnit.SECONDS);

The application can either block and have an optional timeout by calling get(...), or can be non-blocking and hope that the connection succeeds or fails, because there is no mean to time it out.
This is a problem, because it is not uncommon that opening a connection takes few hundreds of milliseconds (or even seconds), and if an application wants to open 5-10 connections concurrently, then the right way to do it would be to use a non-blocking API (otherwise it has to open the first, wait, then open the second, wait, etc.).
Alas, it starts to appear that some facility (a “framework”) is needed on top of AIO, to provide additional useful features like asynchronous connect timeouts.
This is a typical implementation of the CompletionHandler for connect(...):

class ConnectHandler implements CompletionHandler<Void, Void>
{
    public void completed(Void result, Void attachment)
    {
        // Connected, now must read
        ByteBuffer buffer = ByteBuffer.allocate(8192);
        AsynchronousSocketChannel channel = ???
        channel.read(buffer, null, readHandler);
    }
}

Like before, Void it is used as attachment (it is not evident what I need to attach to a connect handler), so the signature of completed() takes two Void parameters. Uhm.
It turns out that after connecting, most often the application needs to signal its interest in reading from the channel and therefore needs to call AsynchronousSocketChannel.read(...).
Like before, the AsynchronousSocketChannel reference is not immediately available from the API as parameter (and like before, the solutions for this problem are similar).
The important thing to note here is that the API forces the application to allocate a ByteBuffer in order to call AsynchronousSocketChannel.read(...).
This is a problem because it wastes resources: imagine what happens if the application has 20k connections opened, but none is actually reading: it has 20k * 8KiB = 160 MiB of buffers allocated, for nothing.
Most, if not all, scalable network servers out there use some form of buffer pooling (Jetty certainly does), and can serve 20k connection with a very small amount of allocated buffer memory, leveraging the fact that not all connections are active exactly at the same time.
This optimization is very similar to what it is done with thread pooling: in asynchronous I/O, in general, threads are pooled and there is no need to allocate one thread per connection. You can happily run a busy server with very few threads, and ditto for buffers.
But in AIO, it is the API that forces the application to allocate a buffer even if there may be nothing (yet) to read, because you have to pass that buffer as a parameter to AsynchronousSocketChannel.read(...) to signal your interest to read.
All right, 160 MiB is not that much with modern computers (my laptop has 8GiB), but differently from the connect timeout problem, there is not much that a “framework” on top of AIO can do here to reduce memory footprint. Shame.

AIO Read

Both accept and connect operations will normally need to read just after having completed their operation.
To read using AIO, the application needs to call:

<A> AsynchronousSocketChannel.read(ByteBuffer buffer, A attachment, CompletionHandler<Integer, ? super A> handler)

This is a typical implementation of the CompletionHandler for read(...):

class ReadHandler implements CompletionHandler<Integer, ReadContext>
{
    public void completed(Integer read, ReadContext readContext)
    {
        // Read some byte, process them, and read more
        if (read < 0)
        {
            // Connection closed by the other peer
            ...
        }
        else
        {
            // Process the bytes read
            ByteBuffer buffer = ???
            ...
            // Read more bytes
            AsynchronousSocketChannel channel = ???
            channel.read(buffer, readContext, this);
        }
    }
}

This is where things get really… weird: the application, in the read handler, is supposed to process the bytes just read, but it has no reference to the buffer that is supposed to contain those bytes.
And, as before, the application will need a reference to the channel in order to call again read(...) (to read more data), but that also is missing.
Like before, the application has the burden to pack the buffer and the channel into some sort of read context (shown in the code above using the ReadContext class), and pass it as the attachment (or be able to reference those from the lexical scope).
Again, a “framework” could take care of this step, which is always required, and it is required because of the way the AIO APIs have been designed.
The reason why the number of bytes read is passed as first parameter of completed(...) is that it can be negative when the connection is closed by the remote peer.
If it is non-negative this parameter is basically useless, since the buffer must be available in the completion handler and one can figure out how many bytes were read from the buffer itself.
In my humble opinion, it is a vestige from the past that the application has to read to know whether the other end has closed the connection or not. The I/O subsystem should do this, and notify the application of a remote close event, not of a read event. It will also save the application to always do the check on the number of bytes read to test if it is negative or not.
I sorely missed this remote close event in NIO, and I am missing it in AIO too.
As before, a “framework” on top of AIO could take care of this.
Differently from the connect operation, asynchronous reads may take a timeout parameter (which makes the absence of this parameter in connect(...) look like an oversight).
Fortunately, there cannot be concurrent reads for the same connection (unless the application really messes up badly with threads), so the read handler normally stays quite simple, if you can bear the if statement that checks if you read -1 bytes.
But things get more complicated with writes.

AIO Write

To write bytes in AIO, the application needs to call:

<A> AsynchronousSocketChannel.write(ByteBuffer buffer, A attachment, CompletionHandler<Integer, ? super A> handler)

This is a naive, non-thread safe, implementation of the CompletionHandler for write(...):

class WriteHandler implements CompletionHandler<Integer, WriteContext>
{
    public void completed(Integer written, WriteContext writeContext)
    {
        ByteBuffer buffer = ???
        // Decide whether all bytes have been written
        if (buffer.hasRemaining())
        {
            // Not all bytes have been written, write again
            AsynchronousSocketChannel channel = ???
            channel.write(buffer, writeContext, this);
        }
        else
        {
            // All bytes have been written
            ...
        }
    }
}

Like before, the write completion handler is missing the required references to do its work, in particular the write buffer and the AsynchronousSocketChannel to call write(...).
The completion handler parameters provide the number of bytes written, that may be different from the number of bytes that were requested to be written, determined by the remaining bytes at the time
of the call to AsynchronousSocketChannel.write(...).
This leads to partial writes: to fully write a buffer you may need multiple partial writes, and the application has the burden to pack the some sort of write context (referencing the buffer and the channel) like it had to do for reads.
But the main problem here is that this write completion handler is not safe for concurrent writes, and applications – in general – may write concurrently.
What happens if one thread starts a write, but this write cannot be fully completed (and hence only some of the bytes in the buffer are written), and another thread concurrently starts another write ?
There are two cases: the first case happens when the second thread starts concurrently a write while the first thread is still writing, and in this case a WritePendingException is thrown to the second thread; the second case happens when the second write starts after the first thread has completed a partial write but not yet started writing the remaining, and in this case the output will be garbled (will be a mix of the bytes of the two writes), but no errors will be reported.
Asynchronous writes are hard, because each write must be fully completed before starting the next one, and differently from reads, writes can – and often are – concurrent.
What AIO provides is a guard against concurrent partial writes (by throwing WritePendingException), but not against interleaved partial writes.
While in principles there is nothing wrong with this scheme (apart being complex to use), my opinion is that it would have been better for the AIO API to have a “fully written” semantic such that CompletionHandlers were invoked when the write was fully completed, not for every partial write.
How can you allow applications to do concurrent asynchronous writes ?
The typical solution is that the application must buffer concurrent writes by maintaining a queue of buffers to be written and by using the completion handler to dequeue the next buffer when a write is fully completed.
This is pretty complicated to get right (the enqueuing/dequeuing mechanism must be thread safe, fast and memory-leak free), and it is entirely a burden that the AIO APIs put on the application.
Furthermore, buffer queuing opens up for more issues, like deciding if the queue can have an infinite size (or, if it is bounded, decide what to do when the limit is reached), like deciding the exact lifecycle of the buffer, which impacts the buffer pooling strategy, if present (since buffers are enqueued, the application cannot assume they have been written and therefore cannot reuse them), like deciding if you can tolerate the extra latency due to the permanence of the buffer in the queue before it is written, etc.
Like before, the buffer queuing can be taken care of by a “framework” on top of AIO.

AIO Threading

AIO performs the actual reads and writes and invokes completion handlers via threads that are part of a AsynchronousChannelGroup.
If I/O operations are requested by a thread that is not belonging to the group, it is scheduled to be executed by a group thread, with the consequent context switch.
Compare this with NIO, where there is only one thread that runs the selector loop waiting for I/O events and upon an I/O event, depending on the pattern used, the selector thread may perform the I/O operation and call the application or another thread may be tasked to perform the I/O operation and invoke the application freeing the selector thread.
In the NIO model, it is easy to block the I/O system by using the selector thread to invoke the application, and then having the application performing a blocking call (for example, a JDBC query that lasts minutes): since there is only one thread doing I/O (the selector thread) and this thread is now blocked in the JDBC call, it cannot listen for other I/O events and the system blocks.
The AIO model “powers up” the NIO model because now there are multiple threads (the ones belonging to the group) that take care of I/O events, perform I/O operations and invoke the application (that is, the completion handlers).
This model is flexible and allows the configuration of the thread pool for the AsynchronousChannelGroup, so it is really matter for the application to decide the size of the thread pool, whether to have it bounded or not, etc.

Conclusions

JDK 7’s AIO API are certainly an improvement over NIO, but my impression is that they are still too low level for the casual user (lack of remote close event, lack of an asynchronous connect timeout, lack of full write semantic), and potentially scale less than a good framework built on top of NIO, due to the lack of buffer pooling strategies and less control over threading.
Applications will probably need to write some sort of framework on top of AIO, which defeats a bit what I think was one of the main goals of this new API: to simplify usage of asynchronous I/O.
For me, the glass is half empty because I had higher expectations.
But if you want to write a quick small program that does network I/O asynchronously, and you don’t want any library dependency, by all means use AIO and forget about NIO.