From owner-freebsd-hackers Wed Apr 28 17:54:54 1999 Delivered-To: freebsd-hackers@freebsd.org Received: from mailer.syr.edu (mailer.syr.edu [128.230.18.29]) by hub.freebsd.org (Postfix) with ESMTP id 7814514EB2 for ; Wed, 28 Apr 1999 17:54:50 -0700 (PDT) (envelope-from cmsedore@mailbox.syr.edu) Received: from rodan.syr.edu by mailer.syr.edu (LSMTP for Windows NT v1.1a) with SMTP id <0.5E256980@mailer.syr.edu>; Wed, 28 Apr 1999 20:54:54 -0400 Received: from localhost (cmsedore@localhost) by rodan.syr.edu (8.8.7/8.8.7) with SMTP id UAA29624 for ; Wed, 28 Apr 1999 20:54:49 -0400 (EDT) X-Authentication-Warning: rodan.syr.edu: cmsedore owned process doing -bs Date: Wed, 28 Apr 1999 20:54:49 -0400 (EDT) From: Christopher Sedore X-Sender: cmsedore@rodan.syr.edu Reply-To: Christopher Sedore To: hackers@freebsd.org Subject: async io and sockets update Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG I've mostly finished what I set out to do with the kernel aio routines. Below is a summary: 1. I've added a new system call, aio_waitcomplete(struct aiocb **cb, struct timespec *tv). This system call will cause a process to sleep until the next async io op completes or the timeout expires. When the operation completes, a pointer to the userland aiocb is placed cb. This makes fire and forget async io programming both possible and easy. 2. I've changed the way that async operations on sockets get handled. a. Sockets are checked to see if the operations will complete immediately. If not, they are placed on a separate queue are processed when upcalled by the sowakeup routine. b. When upcalled as writeable all pending writes are moved to the regular io queue to be processed. c. When upcalled as readable, reads are executed in the upcall routine as long as the socket stays readable. 3. I believe I fixed a bug in aio_process that would allow it to try to execute operations on descriptors that have been closed, causing a panic. Notes: Ideally, operations on sockets that would complete immediately should be executed during the aio_read system call, and the results made ready to be picked up later. Benefits: The old aio code would pass socket operations on to the aio daemons immediately, causing them to block (sbwait). Once you blocked the maximum number of aiod's, no more operations would progress until one of the aiods could complete an operation. This methodology can be significantly faster than using select() to poll sockets. A simple test program showed that before optimization 2c above, the async io routines would only be faster than select() after about 37 descriptors were being monitored. With optimization 2c, async io is faster for the all testing I did (I did not test with less than 10 descriptors). The performance difference (again with a simple test program) between aio and select() for reading looks something like this: select() aio_read()/aio_waitcomplete() num fds kb/s secs kb/s secs 10 26315 19 35714 14 20 20833 24 35714 14 30 17241 29 33333 15 40 14285 35 33333 15 50 12195 41 33333 15 60 10416 48 33333 15 70 9259 54 31250 16 80 8196 61 33333 15 90 7575 66 31250 16 100 6944 72 33333 15 select() continues to trail off up to 250 descriptors, while aio shows no significant degradation. Note that using aio_suspen instead of aio_waitcomplete would probably be non-trivially slower than aio_waitcomplete, but still faster than select on large numbers of descriptors (though it might not be much faster, depending on the order that operations completed vs the order of the pointers to them passed into aio_suspend). The test program simply creates the requested number of descriptors using socketpair(), and either places an outstanding aio_read on each, or puts each in an fd_set for select. Then, descriptors are chosen at random() out of this set, and written to. aio_waitcomplete or select() are used to get the [completed aio_read aiocb/fd to read], and then [aio_read is done again/the fd_set is reset]. The tests above were done with 1000000 writes of 512 bytes each, and a corresponding read of 1000000 buffers of 512 bytes each. One remaining problem with the aio code is that aio operations won't "cross over" to other kernel threads, because they are based on the procs that issue them, rather than the file descriptor itself. I may investigate creating a variation of NT's io completion ports to enable async io with kernel threads. I don't think that the modifications are too invasive. There are numerous mods to kern/vfs_aio.c, some mods to uipc_socket.c and uipc_socket2.c and small changes sys/aio.h, and sys/socketvar.h. (Plus the syscall addition). I hope to do some more tweaking and see if I can get some one to look it over with an eye to committing some or all of it. -Chris To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message