From owner-freebsd-questions Wed Mar 13 09:59:28 1996 Return-Path: owner-questions Received: (from root@localhost) by freefall.freebsd.org (8.7.3/8.7.3) id JAA13170 for questions-outgoing; Wed, 13 Mar 1996 09:59:28 -0800 (PST) Received: from phaeton.artisoft.com (phaeton.Artisoft.COM [198.17.250.211]) by freefall.freebsd.org (8.7.3/8.7.3) with SMTP id JAA13132 Wed, 13 Mar 1996 09:59:19 -0800 (PST) Received: (from terry@localhost) by phaeton.artisoft.com (8.6.11/8.6.9) id KAA08607; Wed, 13 Mar 1996 10:52:40 -0700 From: Terry Lambert Message-Id: <199603131752.KAA08607@phaeton.artisoft.com> Subject: Re: non-blocking read ? To: msmith@atrad.adelaide.edu.au (Michael Smith) Date: Wed, 13 Mar 1996 10:52:39 -0700 (MST) Cc: luigi@labinfo.iet.unipi.it, leisner@sdsp.mc.xerox.com, msmith@atrad.adelaide.edu.au, questions@freebsd.org, current@freebsd.org In-Reply-To: <199603122357.KAA00112@genesis.atrad.adelaide.edu.au> from "Michael Smith" at Mar 13, 96 10:27:08 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-questions@freebsd.org X-Loop: FreeBSD.org Precedence: bulk > > Someone suggested doing async IO and handling SIGIO (I suppose this > > refers to doing > > See my previous message regarding not being sure about this, and > definitely check how harvest does it, I'm sure I was wrong earlier 8( > > > But how much data will be available when I get the SIGIO (or select > > will return that I/O is possible) ? The amount I requested (assuming it > > is available), or the system idea of a block, or what ? > > "some"; you call read and examine the return value to see how much you get. > > Thinking about it further, I don't see how this would work for disk I/O; > it's not until the read itself is issued that the disk request is > queued ... You get the SIGIO when there is data pending, not when the read has completed (man fcntl, look for O_ASYNC). All it is is a "data pending" notification -- it is a hack around non-I/O based messaging mechanisms so that they may be used (and then you hit select on a SIGIO) to let you multiplex, for instance, System V message queues. Unfortunately, it's not always possible to distinguish signal events from their causes: that is, if I have multiple completions in the time it takes me to handle a signal, I'm screwed, even if SIGIO passed, for instance, the address of the buffer that completed to the handler. Signals are persistent conditions. Signals are *not* events. There is no "wait" equivalent call family for SGIO, like there is for SIGCLD. It would be best if you dealt with the kernel reentrancy issues and implemented aioread/aiowrite/aiowait/aiocancel. This was a lot easier under 1.1.5.1 (where I fully implemented a SunOS LWP clone library at one time) because the lock recursion on read/write reeentrancy wasn't overly complex, like it is in -current, with the unified VM and the vnode/inode dissociation code (which I still think is broken). To handle this, you would need to: 1) Move VM to using device/extent instead of vnode/extent for the buffer cache. This would allow o Reuse of a vnode without discarding the buffer cache entry or needing an ihash entry as a second chance cache. o Allow you to get around the IN_RECURSE and lock complexity in the VOP_LOCK/VOP_UNLOCK code, which currently affects both the vnode and the underlying FS. This would limit device size from 8TB down to 1TB. File size limitations would not change. 2) With the VM change in place, you would need to change the VOP_LOCK code. Specifically, you would need to define counting semaphores for routines, probably called vn_lock and vn_unlock. These routines would acquire the vnode lock (allowing recursion for the same PID in all cases) and then call the underlying FS's VOP_LOCK/VOP_UNLOCK respectively. If the VOP_LOCK call failed, the vn_lock would be released and failed. This allows for an FS specific "veto", and is there solely to support FS layering for union, translucent, loopback, and overlay FS's (an overlay FS would be, for example, a umsdos-on-dos, vfat-on-dos, or quota-on-any type FS). The lock code changes would allow a process to have multiple outstanding read/write requests, as long as global structure modifications were semaphored. This is the first step in kernel reeentrancy, allowing kernel multithreading, or kernel preemption (necessary for POSIX RT processing); it is also the first step (with conversion of the semaphore to a mutex or, better, a hierarchical lock with a mutex "top") toward supporting multiple processor reentrancy for the VFS kernel subsystem. 3) With the lock code changed, a single multiplexed system call should be designated as "aio". Yes, it's possible to do 4 call slots, but why? This system call would use stub wrappers to pass down alternate argument list selectors, and would provide an aioread/aiowrite/ aiowait/aiocancel mechanism. An aioread or aiowrite need to be handled in the ordinary read or write path, and when a blocking operation is issued, need to pass a completion routine and an argument address. The completion routine is the same for all processes; the argument address is the address of a context structure, which points to the proc structure for the process that issues the request, as well as context information for the actual copy out (buffer length, buffer address for copyout, etc.). When an I/O completes, it needs to be unlinked from the "pending" list and linked to the "completed" list. These are two new pointers hung off the proc structure. The aiowait/aiocancel operation operate on the context identifiers on the lists, with obvious results. A more generic mechanism would be to convert the aio multiplex call into a call gate instead. This would allow the user to issue *any* system call as an async operation. A flag would be added to the sysent structure to flag calls that were allowed to return to user space pending completion; by default, the flag would be 0, but set to 1 for read and write for the first rev of the code. Any operation that could require paging to be satisfied should, in fact, be capable of being issued asynchronously. A "middle ground" implementation would make the multiplex system call something like "aiosyscall" -- that is, the same as "syscall", the multiplex entry point for call by syscall.h manifest constant of all existing system calls. This would not be a horrific amount of work, but it would require dragging several of the kernel people into the process (well, not really, but you'd need their approval to commit the changes). Regards, Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers.