From owner-freebsd-arch@FreeBSD.ORG Wed Dec 20 16:22:14 2006 Return-Path: X-Original-To: freebsd-arch@freebsd.org Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id E814216A500; Wed, 20 Dec 2006 16:22:13 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0A13343CA7; Wed, 20 Dec 2006 16:21:48 +0000 (GMT) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id A3F4246E2C; Wed, 20 Dec 2006 10:48:59 -0500 (EST) Date: Wed, 20 Dec 2006 15:48:59 +0000 (GMT) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Daniel Eischen In-Reply-To: Message-ID: <20061220153126.G85384@fledge.watson.org> References: <32874.1165905843@critter.freebsd.dk> <200612132010.49601.davidxu@freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: David Xu , freebsd-arch@freebsd.org Subject: Re: close() of active socket does not work on FreeBSD 6 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Dec 2006 16:22:14 -0000 On Wed, 13 Dec 2006, Daniel Eischen wrote: > [CC trimmed] > > On Wed, 13 Dec 2006, David Xu wrote: > >> On Wednesday 13 December 2006 04:49, Daniel Eischen wrote: >>> >>> Well, if threads waiting on IO are interruptable by signals, can't we make >>> a new signal that's only used by the kernel and send it to all threads >>> waiting on IO for that descriptor? When it gets out to actually setup the >>> signal handler, it just resumes like it is returning from an SA_RESTART >>> signal handler (which according to another posting would reissue the IO >>> command and get EBADF). >> >> Even if you have implemented the close() with the interruption, another >> thread openning a file still can reuse the file handle immediately, >> according to specifications, the lowest free file handle will be returned, >> if SA_RESTART is used, the interrupted thread restart the syscall, it will >> be using a wrong file, I think even if we have implemented the feature in >> kernel, useland threads still has serious race to fix. > > If you use a special signal that is only used for this purpose, there is no > reason you have to try the IO operation again. You can just return EBADF. > > Anyway, this was just a thought/idea. I don't mean to argue against any of > the other reasons why this isn't a good idea. Whatever may be implemented to solve this issue will require a fairly serious re-working of how we implement file descriptor reference counting in the kernel. Do you propose similar "cancellation" of other system calls blocked on the file descriptor, including select(), etc? Typically these system calls interact with the underlying object associated with the file descriptor, not the file descriptor itself, and often, they act directly on the object and release the file descriptor before performing their operation. I think before we can put any reasonable implementation proposal on the table, we need a clear set of requirements: - What is the scope of cancellation? Are we cancelling oustanding simultaneous I/O operations on the same fd index in the process, use of any fd pointing at the same open file entry in the process (i.e., all dup'd instances), or the same open file entry across all processes? I've been presuming only use of the same fd index in the same process is relevant, but if so, let's make sure we state that. If not, what do we mean? - Exactly which potentially blocking operations will be cancelled as a result of close() of an "in use" file descriptor? read()? write()? sendfile()? connect()? ioctl()? select()? poll()? close()? Is the set of possible cancellation points equal to the existing set of interruptible sleeps? Notice that in our current implementation, objects are often reached using a file descriptor, but then separately referenced for the duration of the operation, with the file descriptor being released. This means that we currently don't maintain any useful list of threads currently interacting with the file descriptor, and only have a limited notion of which threads are interacting with the underlying object. - What semantics are expected regarding the underlying object when an operation is cancelled due to simultaneous close() on the same file descriptor? Keep in mind that the underlying object may be referenced by other file descriptor indexes pointing at the same open file state (shared offset, etc). For example, if we cancel connect(), is it safe to say that what we've done is cancel the wait for connect() to complete, rather than the connection operation itself, which may continue and be visible on other file descriptor indexes referencing the same object, or to other processes also referencing it? While providing Solaris-like semantics here makes some amount of sense, this is a very tricky area, and one where we're still refining performance behavior, reference counting behavior, etc. I don't think there will be any easy answers, and we need to think through the semantic and performance implications of any change very carefully before starting to implement. Robert N M Watson Computer Laboratory University of Cambridge