From owner-freebsd-arch@FreeBSD.ORG Wed Dec 20 16:22:14 2006 Return-Path: X-Original-To: freebsd-arch@freebsd.org Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id E814216A500; Wed, 20 Dec 2006 16:22:13 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0A13343CA7; Wed, 20 Dec 2006 16:21:48 +0000 (GMT) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id A3F4246E2C; Wed, 20 Dec 2006 10:48:59 -0500 (EST) Date: Wed, 20 Dec 2006 15:48:59 +0000 (GMT) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Daniel Eischen In-Reply-To: Message-ID: <20061220153126.G85384@fledge.watson.org> References: <32874.1165905843@critter.freebsd.dk> <200612132010.49601.davidxu@freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: David Xu , freebsd-arch@freebsd.org Subject: Re: close() of active socket does not work on FreeBSD 6 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Dec 2006 16:22:14 -0000 On Wed, 13 Dec 2006, Daniel Eischen wrote: > [CC trimmed] > > On Wed, 13 Dec 2006, David Xu wrote: > >> On Wednesday 13 December 2006 04:49, Daniel Eischen wrote: >>> >>> Well, if threads waiting on IO are interruptable by signals, can't we make >>> a new signal that's only used by the kernel and send it to all threads >>> waiting on IO for that descriptor? When it gets out to actually setup the >>> signal handler, it just resumes like it is returning from an SA_RESTART >>> signal handler (which according to another posting would reissue the IO >>> command and get EBADF). >> >> Even if you have implemented the close() with the interruption, another >> thread openning a file still can reuse the file handle immediately, >> according to specifications, the lowest free file handle will be returned, >> if SA_RESTART is used, the interrupted thread restart the syscall, it will >> be using a wrong file, I think even if we have implemented the feature in >> kernel, useland threads still has serious race to fix. > > If you use a special signal that is only used for this purpose, there is no > reason you have to try the IO operation again. You can just return EBADF. > > Anyway, this was just a thought/idea. I don't mean to argue against any of > the other reasons why this isn't a good idea. Whatever may be implemented to solve this issue will require a fairly serious re-working of how we implement file descriptor reference counting in the kernel. Do you propose similar "cancellation" of other system calls blocked on the file descriptor, including select(), etc? Typically these system calls interact with the underlying object associated with the file descriptor, not the file descriptor itself, and often, they act directly on the object and release the file descriptor before performing their operation. I think before we can put any reasonable implementation proposal on the table, we need a clear set of requirements: - What is the scope of cancellation? Are we cancelling oustanding simultaneous I/O operations on the same fd index in the process, use of any fd pointing at the same open file entry in the process (i.e., all dup'd instances), or the same open file entry across all processes? I've been presuming only use of the same fd index in the same process is relevant, but if so, let's make sure we state that. If not, what do we mean? - Exactly which potentially blocking operations will be cancelled as a result of close() of an "in use" file descriptor? read()? write()? sendfile()? connect()? ioctl()? select()? poll()? close()? Is the set of possible cancellation points equal to the existing set of interruptible sleeps? Notice that in our current implementation, objects are often reached using a file descriptor, but then separately referenced for the duration of the operation, with the file descriptor being released. This means that we currently don't maintain any useful list of threads currently interacting with the file descriptor, and only have a limited notion of which threads are interacting with the underlying object. - What semantics are expected regarding the underlying object when an operation is cancelled due to simultaneous close() on the same file descriptor? Keep in mind that the underlying object may be referenced by other file descriptor indexes pointing at the same open file state (shared offset, etc). For example, if we cancel connect(), is it safe to say that what we've done is cancel the wait for connect() to complete, rather than the connection operation itself, which may continue and be visible on other file descriptor indexes referencing the same object, or to other processes also referencing it? While providing Solaris-like semantics here makes some amount of sense, this is a very tricky area, and one where we're still refining performance behavior, reference counting behavior, etc. I don't think there will be any easy answers, and we need to think through the semantic and performance implications of any change very carefully before starting to implement. Robert N M Watson Computer Laboratory University of Cambridge From owner-freebsd-arch@FreeBSD.ORG Wed Dec 20 18:28:22 2006 Return-Path: X-Original-To: freebsd-arch@freebsd.org Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 8482916A407 for ; Wed, 20 Dec 2006 18:28:22 +0000 (UTC) (envelope-from deischen@freebsd.org) Received: from mail.ntplx.net (mail.ntplx.net [204.213.176.10]) by mx1.FreeBSD.org (Postfix) with ESMTP id D906043C9F for ; Wed, 20 Dec 2006 18:28:21 +0000 (GMT) (envelope-from deischen@freebsd.org) Received: from sea.ntplx.net (sea.ntplx.net [204.213.176.11]) by mail.ntplx.net (8.13.8/8.13.8/NETPLEX) with ESMTP id kBKIIBUN022183; Wed, 20 Dec 2006 13:18:11 -0500 (EST) Date: Wed, 20 Dec 2006 13:18:11 -0500 (EST) From: Daniel Eischen X-X-Sender: eischen@sea.ntplx.net To: Robert Watson In-Reply-To: <20061220153126.G85384@fledge.watson.org> Message-ID: References: <32874.1165905843@critter.freebsd.dk> <200612132010.49601.davidxu@freebsd.org> <20061220153126.G85384@fledge.watson.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Greylist: Message whitelisted by DRAC access database, not delayed by milter-greylist-3.0 (mail.ntplx.net [204.213.176.10]); Wed, 20 Dec 2006 13:18:11 -0500 (EST) X-Virus-Scanned: by AMaViS and Clam AntiVirus (mail.ntplx.net) Cc: David Xu , freebsd-arch@freebsd.org Subject: Re: close() of active socket does not work on FreeBSD 6 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Daniel Eischen List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Dec 2006 18:28:22 -0000 On Wed, 20 Dec 2006, Robert Watson wrote: > > On Wed, 13 Dec 2006, Daniel Eischen wrote: > >> >> Anyway, this was just a thought/idea. I don't mean to argue against any of >> the other reasons why this isn't a good idea. > > Whatever may be implemented to solve this issue will require a fairly serious > re-working of how we implement file descriptor reference counting in the > kernel. Do you propose similar "cancellation" of other system calls blocked > on the file descriptor, including select(), etc? Typically these system > calls interact with the underlying object associated with the file > descriptor, not the file descriptor itself, and often, they act directly on > the object and release the file descriptor before performing their operation. > I think before we can put any reasonable implementation proposal on the > table, we need a clear set of requirements: [ ... ] > While providing Solaris-like semantics here makes some amount of sense, this > is a very tricky area, and one where we're still refining performance > behavior, reference counting behavior, etc. I don't think there will be any > easy answers, and we need to think through the semantic and performance > implications of any change very carefully before starting to implement. I don't think the behavior here has to be any different that what we currently (or desire to) do with regard to (unblocked) signals interrupting threads waiting on IO. You can spend a lot of time thinking about how close() should affect IO operations on the same file descriptor, but a very simple approach is to treat them the same as if the operations were interrupted by a signal. I'm not suggesting it is implemented the same way, just that it seems to make a lot of sense to me that the behavior is consistent between the two. -- DE From owner-freebsd-arch@FreeBSD.ORG Thu Dec 21 00:20:14 2006 Return-Path: X-Original-To: freebsd-arch@freebsd.org Delivered-To: freebsd-arch@freebsd.org Received: from localhost.my.domain (localhost [127.0.0.1]) by hub.freebsd.org (Postfix) with ESMTP id 5030E16A403; Thu, 21 Dec 2006 00:20:14 +0000 (UTC) (envelope-from davidxu@freebsd.org) From: David Xu To: freebsd-arch@freebsd.org, Daniel Eischen Date: Thu, 21 Dec 2006 08:20:09 +0800 User-Agent: KMail/1.8.2 References: <32874.1165905843@critter.freebsd.dk> <20061220153126.G85384@fledge.watson.org> In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200612210820.09955.davidxu@freebsd.org> Cc: Robert Watson Subject: Re: close() of active socket does not work on FreeBSD 6 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Dec 2006 00:20:14 -0000 On Thursday 21 December 2006 02:18, Daniel Eischen wrote: > On Wed, 20 Dec 2006, Robert Watson wrote: > > On Wed, 13 Dec 2006, Daniel Eischen wrote: > >> Anyway, this was just a thought/idea. I don't mean to argue against any > >> of the other reasons why this isn't a good idea. > > > > Whatever may be implemented to solve this issue will require a fairly > > serious re-working of how we implement file descriptor reference counting > > in the kernel. Do you propose similar "cancellation" of other system > > calls blocked on the file descriptor, including select(), etc? Typically > > these system calls interact with the underlying object associated with > > the file descriptor, not the file descriptor itself, and often, they act > > directly on the object and release the file descriptor before performing > > their operation. I think before we can put any reasonable implementation > > proposal on the table, we need a clear set of requirements: > > [ ... ] > > > While providing Solaris-like semantics here makes some amount of sense, > > this is a very tricky area, and one where we're still refining > > performance behavior, reference counting behavior, etc. I don't think > > there will be any easy answers, and we need to think through the semantic > > and performance implications of any change very carefully before starting > > to implement. > > I don't think the behavior here has to be any different that > what we currently (or desire to) do with regard to (unblocked) > signals interrupting threads waiting on IO. You can spend > a lot of time thinking about how close() should affect IO > operations on the same file descriptor, but a very simple > approach is to treat them the same as if the operations were > interrupted by a signal. I'm not suggesting it is implemented > the same way, just that it seems to make a lot of sense to me > that the behavior is consistent between the two. I think the main concern is if we will record every thread using a fd, that means, when you call read() on a fd, you record your thread pointer into the fd's thread list, when one wants to close the fd, it has to notify all the threads in the list, set a flag for each thread, the flag indicates a thread is interrupted because the fd was closed, when the thread returns from deep code path to read() syscall, it should check the flag, and return EBADF to user if it was set. whatever, a reserved signal or TDF_INTERRUPT may interrupt a thread. but since there are many file operations, I don't know if we are willing to pay such overheads to every file syscall, extra locking is not welcomed. Regards, David Xu From owner-freebsd-arch@FreeBSD.ORG Thu Dec 21 13:02:53 2006 Return-Path: X-Original-To: freebsd-arch@freebsd.org Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id E495516A4FC; Thu, 21 Dec 2006 13:02:52 +0000 (UTC) (envelope-from prvs=jelischer=5032be78a@ironport.com) Received: from a50.ironport.com (a50.ironport.com [63.251.108.112]) by mx1.freebsd.org (Postfix) with ESMTP id C20F713C478; Thu, 21 Dec 2006 13:02:08 +0000 (UTC) (envelope-from prvs=jelischer=5032be78a@ironport.com) DomainKey-Signature: s=key512; d=ironport.com; c=nofws; q=dns; b=V+OKw0zuVq8ZsZGjGy8vtEEmEVbfZ8JijHzBCQeFqlFgVZOUbBcKfeDfU7OFBmtkxT9x+A+pP2Rtf0a5caa58Q==; Received: from unknown (HELO [10.251.18.229]) ([10.251.18.229]) by a50.ironport.com with ESMTP; 20 Dec 2006 17:48:02 -0800 Message-ID: <4589E7D2.9010608@ironport.com> Date: Wed, 20 Dec 2006 17:48:02 -0800 From: Julian Elischer User-Agent: Thunderbird 1.5.0.8 (Macintosh/20061025) MIME-Version: 1.0 To: David Xu References: <32874.1165905843@critter.freebsd.dk> <20061220153126.G85384@fledge.watson.org> <200612210820.09955.davidxu@freebsd.org> In-Reply-To: <200612210820.09955.davidxu@freebsd.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Daniel Eischen , Robert Watson , freebsd-arch@freebsd.org Subject: Re: close() of active socket does not work on FreeBSD 6 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Dec 2006 13:02:53 -0000 David Xu wrote: > On Thursday 21 December 2006 02:18, Daniel Eischen wrote: >> On Wed, 20 Dec 2006, Robert Watson wrote: >>> On Wed, 13 Dec 2006, Daniel Eischen wrote: >>>> Anyway, this was just a thought/idea. I don't mean to argue against any >>>> of the other reasons why this isn't a good idea. >>> Whatever may be implemented to solve this issue will require a fairly >>> serious re-working of how we implement file descriptor reference counting >>> in the kernel. Do you propose similar "cancellation" of other system >>> calls blocked on the file descriptor, including select(), etc? Typically >>> these system calls interact with the underlying object associated with >>> the file descriptor, not the file descriptor itself, and often, they act >>> directly on the object and release the file descriptor before performing >>> their operation. I think before we can put any reasonable implementation >>> proposal on the table, we need a clear set of requirements: >> [ ... ] >> >>> While providing Solaris-like semantics here makes some amount of sense, >>> this is a very tricky area, and one where we're still refining >>> performance behavior, reference counting behavior, etc. I don't think >>> there will be any easy answers, and we need to think through the semantic >>> and performance implications of any change very carefully before starting >>> to implement. >> I don't think the behavior here has to be any different that >> what we currently (or desire to) do with regard to (unblocked) >> signals interrupting threads waiting on IO. You can spend >> a lot of time thinking about how close() should affect IO >> operations on the same file descriptor, but a very simple >> approach is to treat them the same as if the operations were >> interrupted by a signal. I'm not suggesting it is implemented >> the same way, just that it seems to make a lot of sense to me >> that the behavior is consistent between the two. > > I think the main concern is if we will record every thread using a > fd, that means, when you call read() on a fd, you record your > thread pointer into the fd's thread list, when one wants to close > the fd, it has to notify all the threads in the list, set a flag > for each thread, the flag indicates a thread is interrupted > because the fd was closed, when the thread returns from deep code > path to read() syscall, it should check the flag, and return EBADF to > user if it was set. whatever, a reserved signal or TDF_INTERRUPT may > interrupt a thread. but since there are many file operations, I don't > know if we are willing to pay such overheads to every file syscall, > extra locking is not welcomed. I think you are only intersted in treads that are sleeping.. so you allow a sleeping thread to save a pointer to the fd (or whatever) on which it is sleeping, along with the sleep address. items that are not sleeping are either already returning, or are going to sleep, in which case they can check at that time. > > Regards, > David Xu > _______________________________________________ > freebsd-arch@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" From owner-freebsd-arch@FreeBSD.ORG Thu Dec 21 13:45:58 2006 Return-Path: X-Original-To: freebsd-arch@freebsd.org Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 984F716A509; Thu, 21 Dec 2006 13:45:58 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.freebsd.org (Postfix) with ESMTP id 4560C13C475; Thu, 21 Dec 2006 13:45:58 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id E39DC47112; Thu, 21 Dec 2006 05:38:33 -0500 (EST) Date: Thu, 21 Dec 2006 10:38:33 +0000 (GMT) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: David Xu In-Reply-To: <200612210820.09955.davidxu@freebsd.org> Message-ID: <20061221102909.O83974@fledge.watson.org> References: <32874.1165905843@critter.freebsd.dk> <20061220153126.G85384@fledge.watson.org> <200612210820.09955.davidxu@freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Daniel Eischen , freebsd-arch@freebsd.org Subject: Re: close() of active socket does not work on FreeBSD 6 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Dec 2006 13:45:58 -0000 On Thu, 21 Dec 2006, David Xu wrote: > On Thursday 21 December 2006 02:18, Daniel Eischen wrote: >> On Wed, 20 Dec 2006, Robert Watson wrote: >>> On Wed, 13 Dec 2006, Daniel Eischen wrote: >>>> Anyway, this was just a thought/idea. I don't mean to argue against any >>>> of the other reasons why this isn't a good idea. >>> >>> Whatever may be implemented to solve this issue will require a fairly >>> serious re-working of how we implement file descriptor reference counting >>> in the kernel. Do you propose similar "cancellation" of other system >>> calls blocked on the file descriptor, including select(), etc? Typically >>> these system calls interact with the underlying object associated with the >>> file descriptor, not the file descriptor itself, and often, they act >>> directly on the object and release the file descriptor before performing >>> their operation. I think before we can put any reasonable implementation >>> proposal on the table, we need a clear set of requirements: >> >> [ ... ] >> >>> While providing Solaris-like semantics here makes some amount of sense, >>> this is a very tricky area, and one where we're still refining performance >>> behavior, reference counting behavior, etc. I don't think there will be >>> any easy answers, and we need to think through the semantic and >>> performance implications of any change very carefully before starting to >>> implement. >> >> I don't think the behavior here has to be any different that what we >> currently (or desire to) do with regard to (unblocked) signals interrupting >> threads waiting on IO. You can spend a lot of time thinking about how >> close() should affect IO operations on the same file descriptor, but a very >> simple approach is to treat them the same as if the operations were >> interrupted by a signal. I'm not suggesting it is implemented the same >> way, just that it seems to make a lot of sense to me that the behavior is >> consistent between the two. > > I think the main concern is if we will record every thread using a fd, that > means, when you call read() on a fd, you record your thread pointer into the > fd's thread list, when one wants to close the fd, it has to notify all the > threads in the list, set a flag for each thread, the flag indicates a thread > is interrupted because the fd was closed, when the thread returns from deep > code path to read() syscall, it should check the flag, and return EBADF to > user if it was set. whatever, a reserved signal or TDF_INTERRUPT may > interrupt a thread. but since there are many file operations, I don't know > if we are willing to pay such overheads to every file syscall, extra locking > is not welcomed. Yes, as well as adding quite a bit of complexity and opening the door for some rather odd/unfortunate races. You can inspect the bulk of the Solaris implementation by looking at three spots: http://fxr.watson.org/fxr/ident?v=OPENSOLARIS;i=closeandsetf http://fxr.watson.org/fxr/ident?v=OPENSOLARIS;i=post_syscall http://fxr.watson.org/fxr/search?v=OPENSOLARIS&string=MUSTRETURN In closeandsetf(), you can see that an additional layer of indirection associated with the file descriptor is maintained in order to count consumers of a particular fd, not just the open file record, and the set of active fds for each thread is maintained. When a close() is performed and there are still other open consumers, the process is suspended and all threads are inspected to see if the fd is active for the thread, in which case a thread flag indicating that a stale fd is set. I believe that the interrupt here is an implicit part of the process suspend/restart, and in post_syscall() the EINTR returns are remapped to EBADF. That extra level of indirection and use tracking will be both complex and a performance hit in a critical kernel path. I'm not opposed to investigating implementing something along these lines, but I think we should defer this for some time while we sort out more pressing issues in our kernel file descriptor/socket/etc code and revist this in a few months. We will need to carefully evaluate the performance costs, and if they are significant, figure out how to avoid this causing a significant hit. It's worth observing that removing one level of reference counting from the socket send/receive paths (using the file descriptor reference instead of the socket reference) made a 5%+ difference in high speed send performance. Robert N M Watson Computer Laboratory University of Cambridge From owner-freebsd-arch@FreeBSD.ORG Thu Dec 21 15:22:17 2006 Return-Path: X-Original-To: freebsd-arch@freebsd.org Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 8E92716A412; Thu, 21 Dec 2006 15:22:17 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.freebsd.org (Postfix) with ESMTP id 4E78913C44B; Thu, 21 Dec 2006 15:22:17 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id A4B8B46FC2; Thu, 21 Dec 2006 10:22:16 -0500 (EST) Date: Thu, 21 Dec 2006 15:22:16 +0000 (GMT) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Julian Elischer In-Reply-To: <4589E7D2.9010608@ironport.com> Message-ID: <20061221152115.U83974@fledge.watson.org> References: <32874.1165905843@critter.freebsd.dk> <20061220153126.G85384@fledge.watson.org> <200612210820.09955.davidxu@freebsd.org> <4589E7D2.9010608@ironport.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Daniel Eischen , David Xu , freebsd-arch@freebsd.org Subject: Re: close() of active socket does not work on FreeBSD 6 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Dec 2006 15:22:17 -0000 On Wed, 20 Dec 2006, Julian Elischer wrote: >> I think the main concern is if we will record every thread using a fd, that >> means, when you call read() on a fd, you record your thread pointer into >> the fd's thread list, when one wants to close the fd, it has to notify all >> the threads in the list, set a flag for each thread, the flag indicates a >> thread is interrupted because the fd was closed, when the thread returns >> from deep code path to read() syscall, it should check the flag, and return >> EBADF to user if it was set. whatever, a reserved signal or TDF_INTERRUPT >> may interrupt a thread. but since there are many file operations, I don't >> know if we are willing to pay such overheads to every file syscall, extra >> locking is not welcomed. > > I think you are only intersted in treads that are sleeping.. so you allow a > sleeping thread to save a pointer to the fd (or whatever) on which it is > sleeping, along with the sleep address. > > items that are not sleeping are either already returning, or are going to > sleep, in which case they can check at that time. Hence my question about select and poll: should they throw an exception state when a file descriptor is closed out from under them? They often sleep on hundreds or thousands of file descriptors, and not just one. Robert N M Watson Computer Laboratory University of Cambridge From owner-freebsd-arch@FreeBSD.ORG Thu Dec 21 17:15:49 2006 Return-Path: X-Original-To: freebsd-arch@freebsd.org Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 73F2516A412; Thu, 21 Dec 2006 17:15:49 +0000 (UTC) (envelope-from deischen@freebsd.org) Received: from mail.ntplx.net (mail.ntplx.net [204.213.176.10]) by mx1.freebsd.org (Postfix) with ESMTP id F2AA613C462; Thu, 21 Dec 2006 17:15:48 +0000 (UTC) (envelope-from deischen@freebsd.org) Received: from sea.ntplx.net (sea.ntplx.net [204.213.176.11]) by mail.ntplx.net (8.13.8/8.13.8/NETPLEX) with ESMTP id kBLGeV9x020613; Thu, 21 Dec 2006 11:40:31 -0500 (EST) Date: Thu, 21 Dec 2006 11:40:31 -0500 (EST) From: Daniel Eischen X-X-Sender: eischen@sea.ntplx.net To: Robert Watson In-Reply-To: <20061221152115.U83974@fledge.watson.org> Message-ID: References: <32874.1165905843@critter.freebsd.dk> <20061220153126.G85384@fledge.watson.org> <200612210820.09955.davidxu@freebsd.org> <4589E7D2.9010608@ironport.com> <20061221152115.U83974@fledge.watson.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Greylist: Message whitelisted by DRAC access database, not delayed by milter-greylist-3.0 (mail.ntplx.net [204.213.176.10]); Thu, 21 Dec 2006 11:40:31 -0500 (EST) X-Virus-Scanned: by AMaViS and Clam AntiVirus (mail.ntplx.net) Cc: Julian Elischer , David Xu , freebsd-arch@freebsd.org Subject: Re: close() of active socket does not work on FreeBSD 6 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Daniel Eischen List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Dec 2006 17:15:49 -0000 On Thu, 21 Dec 2006, Robert Watson wrote: > On Wed, 20 Dec 2006, Julian Elischer wrote: > >>> I think the main concern is if we will record every thread using a fd, >>> that means, when you call read() on a fd, you record your thread pointer >>> into the fd's thread list, when one wants to close the fd, it has to >>> notify all the threads in the list, set a flag for each thread, the flag >>> indicates a thread is interrupted because the fd was closed, when the >>> thread returns from deep code path to read() syscall, it should check the >>> flag, and return EBADF to user if it was set. whatever, a reserved signal >>> or TDF_INTERRUPT may interrupt a thread. but since there are many file >>> operations, I don't know if we are willing to pay such overheads to every >>> file syscall, extra locking is not welcomed. >> >> I think you are only intersted in treads that are sleeping.. so you allow a >> sleeping thread to save a pointer to the fd (or whatever) on which it is >> sleeping, along with the sleep address. >> >> items that are not sleeping are either already returning, or are going to >> sleep, in which case they can check at that time. > > Hence my question about select and poll: should they throw an exception state > when a file descriptor is closed out from under them? They often sleep on > hundreds or thousands of file descriptors, and not just one. Yes, I would think so. Solaris behaves this way also, although there seems to be a bug in Solaris 8 (version tested) in that select() returns -1 but errno isn't properly set (it is 0). -- DE From owner-freebsd-arch@FreeBSD.ORG Fri Dec 22 02:18:15 2006 Return-Path: X-Original-To: freebsd-arch@FreeBSD.org Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 8AB3D16A40F; Fri, 22 Dec 2006 02:18:15 +0000 (UTC) (envelope-from jmg@hydrogen.funkthat.com) Received: from hydrogen.funkthat.com (gate.funkthat.com [69.17.45.168]) by mx1.freebsd.org (Postfix) with ESMTP id 36F8F13C458; Fri, 22 Dec 2006 02:18:15 +0000 (UTC) (envelope-from jmg@hydrogen.funkthat.com) Received: from hydrogen.funkthat.com (ohaeya3em79mpkgl@localhost.funkthat.com [127.0.0.1]) by hydrogen.funkthat.com (8.13.6/8.13.3) with ESMTP id kBM211Yg031508; Thu, 21 Dec 2006 18:01:01 -0800 (PST) (envelope-from jmg@hydrogen.funkthat.com) Received: (from jmg@localhost) by hydrogen.funkthat.com (8.13.6/8.13.3/Submit) id kBM211Ml031507; Thu, 21 Dec 2006 18:01:01 -0800 (PST) (envelope-from jmg) Date: Thu, 21 Dec 2006 18:01:01 -0800 From: John-Mark Gurney To: Robert Watson Message-ID: <20061222020101.GC4982@funkthat.com> Mail-Followup-To: Robert Watson , Julian Elischer , Daniel Eischen , David Xu , freebsd-arch@freebsd.org References: <32874.1165905843@critter.freebsd.dk> <20061220153126.G85384@fledge.watson.org> <200612210820.09955.davidxu@freebsd.org> <4589E7D2.9010608@ironport.com> <20061221152115.U83974@fledge.watson.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20061221152115.U83974@fledge.watson.org> User-Agent: Mutt/1.4.2.1i X-Operating-System: FreeBSD 5.4-RELEASE-p6 i386 X-PGP-Fingerprint: B7 EC EF F8 AE ED A7 31 96 7A 22 B3 D8 56 36 F4 X-Files: The truth is out there X-URL: http://resnet.uoregon.edu/~gurney_j/ X-Resume: http://resnet.uoregon.edu/~gurney_j/resume.html Cc: Daniel Eischen , Julian Elischer , David Xu , freebsd-arch@FreeBSD.org Subject: Re: close() of active socket does not work on FreeBSD 6 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: John-Mark Gurney List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Dec 2006 02:18:15 -0000 Robert Watson wrote this message on Thu, Dec 21, 2006 at 15:22 +0000: > >I think you are only intersted in treads that are sleeping.. so you allow > >a sleeping thread to save a pointer to the fd (or whatever) on which it is > >sleeping, along with the sleep address. > > > >items that are not sleeping are either already returning, or are going to > >sleep, in which case they can check at that time. > > Hence my question about select and poll: should they throw an exception > state when a file descriptor is closed out from under them? They often > sleep on hundreds or thousands of file descriptors, and not just one. IMO, your program is buggy if you close the file descriptor before everything is out of the kernel wrt the fd... It means that your close statement isn't waiting for things to be cleanly shut down, and that you still have dangling reference counts to the parts of the code that is in the kernel... I used to expect something similar w/ an kqueue based event driven web server, and found that I had bugs due to assuming that I could close it whenever I want... What happens if you close the fd between the time select returns and you process it? What happens if the fd gets closed, and another thread (or an earlier fd that accepts connections) reuses that fd? And then youre state machine isn't read to get an event since it isn't suppose to get one yet... The kernel isn't buggy wrt closing a fd when another thread is using it, it's the program that's buggy... -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not." From owner-freebsd-arch@FreeBSD.ORG Fri Dec 22 02:43:15 2006 Return-Path: X-Original-To: freebsd-arch@FreeBSD.org Delivered-To: freebsd-arch@FreeBSD.org Received: from [127.0.0.1] (localhost [127.0.0.1]) by hub.freebsd.org (Postfix) with ESMTP id 37E8816A407; Fri, 22 Dec 2006 02:43:11 +0000 (UTC) (envelope-from davidxu@freebsd.org) Message-ID: <458B4641.5080808@freebsd.org> Date: Fri, 22 Dec 2006 10:43:13 +0800 From: David Xu User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.7.13) Gecko/20061204 X-Accept-Language: en-us, en MIME-Version: 1.0 To: John-Mark Gurney References: <32874.1165905843@critter.freebsd.dk> <20061220153126.G85384@fledge.watson.org> <200612210820.09955.davidxu@freebsd.org> <4589E7D2.9010608@ironport.com> <20061221152115.U83974@fledge.watson.org> <20061222020101.GC4982@funkthat.com> In-Reply-To: <20061222020101.GC4982@funkthat.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Cc: Daniel Eischen , Julian Elischer , Robert Watson , freebsd-arch@FreeBSD.org Subject: Re: close() of active socket does not work on FreeBSD 6 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Dec 2006 02:43:15 -0000 John-Mark Gurney wrote: > Robert Watson wrote this message on Thu, Dec 21, 2006 at 15:22 +0000: > >>>I think you are only intersted in treads that are sleeping.. so you allow >>>a sleeping thread to save a pointer to the fd (or whatever) on which it is >>>sleeping, along with the sleep address. >>> >>>items that are not sleeping are either already returning, or are going to >>>sleep, in which case they can check at that time. >> >>Hence my question about select and poll: should they throw an exception >>state when a file descriptor is closed out from under them? They often >>sleep on hundreds or thousands of file descriptors, and not just one. > > > IMO, your program is buggy if you close the file descriptor before > everything is out of the kernel wrt the fd... It means that your close > statement isn't waiting for things to be cleanly shut down, and that > you still have dangling reference counts to the parts of the code that > is in the kernel... > > I used to expect something similar w/ an kqueue based event driven > web server, and found that I had bugs due to assuming that I could > close it whenever I want... What happens if you close the fd between > the time select returns and you process it? What happens if the fd > gets closed, and another thread (or an earlier fd that accepts > connections) reuses that fd? And then youre state machine isn't read > to get an event since it isn't suppose to get one yet... > > The kernel isn't buggy wrt closing a fd when another thread is using > it, it's the program that's buggy... > I agree with you here, as I said before, kernel may can do things correctly, but user code has to struggle with race condition between multiple threads, so if user code still has to work out a way to avoid many race conditions, why don't they just use a signal to interrupt target thread and do synchronization between threads. The requested extra close() feature seems to be a wrongly defined problem. Regards, David Xu From owner-freebsd-arch@FreeBSD.ORG Fri Dec 22 03:36:00 2006 Return-Path: X-Original-To: freebsd-arch@freebsd.org Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id B7ECC16A4A7; Fri, 22 Dec 2006 03:36:00 +0000 (UTC) (envelope-from deischen@freebsd.org) Received: from mail.ntplx.net (mail.ntplx.net [204.213.176.10]) by mx1.freebsd.org (Postfix) with ESMTP id 5EC7413C447; Fri, 22 Dec 2006 03:36:00 +0000 (UTC) (envelope-from deischen@freebsd.org) Received: from sea.ntplx.net (sea.ntplx.net [204.213.176.11]) by mail.ntplx.net (8.13.8/8.13.8/NETPLEX) with ESMTP id kBM3Zwao029546; Thu, 21 Dec 2006 22:35:59 -0500 (EST) Date: Thu, 21 Dec 2006 22:35:58 -0500 (EST) From: Daniel Eischen X-X-Sender: eischen@sea.ntplx.net To: John-Mark Gurney In-Reply-To: <20061222020101.GC4982@funkthat.com> Message-ID: References: <32874.1165905843@critter.freebsd.dk> <20061220153126.G85384@fledge.watson.org> <200612210820.09955.davidxu@freebsd.org> <4589E7D2.9010608@ironport.com> <20061221152115.U83974@fledge.watson.org> <20061222020101.GC4982@funkthat.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Greylist: Message whitelisted by DRAC access database, not delayed by milter-greylist-3.0 (mail.ntplx.net [204.213.176.10]); Thu, 21 Dec 2006 22:35:59 -0500 (EST) X-Virus-Scanned: by AMaViS and Clam AntiVirus (mail.ntplx.net) Cc: Julian Elischer , Robert Watson , David Xu , freebsd-arch@freebsd.org Subject: Re: close() of active socket does not work on FreeBSD 6 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Daniel Eischen List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Dec 2006 03:36:00 -0000 On Thu, 21 Dec 2006, John-Mark Gurney wrote: > Robert Watson wrote this message on Thu, Dec 21, 2006 at 15:22 +0000: >>> I think you are only intersted in treads that are sleeping.. so you allow >>> a sleeping thread to save a pointer to the fd (or whatever) on which it is >>> sleeping, along with the sleep address. >>> >>> items that are not sleeping are either already returning, or are going to >>> sleep, in which case they can check at that time. >> >> Hence my question about select and poll: should they throw an exception >> state when a file descriptor is closed out from under them? They often >> sleep on hundreds or thousands of file descriptors, and not just one. > > IMO, your program is buggy if you close the file descriptor before > everything is out of the kernel wrt the fd... It means that your close > statement isn't waiting for things to be cleanly shut down, and that > you still have dangling reference counts to the parts of the code that > is in the kernel... > > I used to expect something similar w/ an kqueue based event driven > web server, and found that I had bugs due to assuming that I could > close it whenever I want... What happens if you close the fd between > the time select returns and you process it? What happens if the fd > gets closed, and another thread (or an earlier fd that accepts > connections) reuses that fd? And then youre state machine isn't read > to get an event since it isn't suppose to get one yet... > > The kernel isn't buggy wrt closing a fd when another thread is using > it, it's the program that's buggy... I agree also, but hanging without return isn't very detectable. The best thing to do is to tell the programmer that he is doing something stupid, and returning with an error is the way that it is typically done. Solaris seems to have jumped through some hoops to achieve this behavior, so I doubt it is without merit. OTOH, I'm not going to argue that it is one of the more important things we should be worried about ;-) -- DE From owner-freebsd-arch@FreeBSD.ORG Fri Dec 22 04:07:40 2006 Return-Path: X-Original-To: freebsd-arch@freebsd.org Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 5795016A403; Fri, 22 Dec 2006 04:07:40 +0000 (UTC) (envelope-from jmg@hydrogen.funkthat.com) Received: from hydrogen.funkthat.com (gate.funkthat.com [69.17.45.168]) by mx1.freebsd.org (Postfix) with ESMTP id 130AF13C41A; Fri, 22 Dec 2006 04:07:40 +0000 (UTC) (envelope-from jmg@hydrogen.funkthat.com) Received: from hydrogen.funkthat.com (kci2bk4mc4426o7z@localhost.funkthat.com [127.0.0.1]) by hydrogen.funkthat.com (8.13.6/8.13.3) with ESMTP id kBM47d0G033271; Thu, 21 Dec 2006 20:07:39 -0800 (PST) (envelope-from jmg@hydrogen.funkthat.com) Received: (from jmg@localhost) by hydrogen.funkthat.com (8.13.6/8.13.3/Submit) id kBM47d8w033270; Thu, 21 Dec 2006 20:07:39 -0800 (PST) (envelope-from jmg) Date: Thu, 21 Dec 2006 20:07:38 -0800 From: John-Mark Gurney To: Daniel Eischen Message-ID: <20061222040738.GD4982@funkthat.com> Mail-Followup-To: Daniel Eischen , Julian Elischer , Robert Watson , David Xu , freebsd-arch@freebsd.org References: <32874.1165905843@critter.freebsd.dk> <20061220153126.G85384@fledge.watson.org> <200612210820.09955.davidxu@freebsd.org> <4589E7D2.9010608@ironport.com> <20061221152115.U83974@fledge.watson.org> <20061222020101.GC4982@funkthat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.1i X-Operating-System: FreeBSD 5.4-RELEASE-p6 i386 X-PGP-Fingerprint: B7 EC EF F8 AE ED A7 31 96 7A 22 B3 D8 56 36 F4 X-Files: The truth is out there X-URL: http://resnet.uoregon.edu/~gurney_j/ X-Resume: http://resnet.uoregon.edu/~gurney_j/resume.html Cc: Julian Elischer , Robert Watson , David Xu , freebsd-arch@freebsd.org Subject: Re: close() of active socket does not work on FreeBSD 6 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: John-Mark Gurney List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Dec 2006 04:07:40 -0000 Daniel Eischen wrote this message on Thu, Dec 21, 2006 at 22:35 -0500: > On Thu, 21 Dec 2006, John-Mark Gurney wrote: > > >Robert Watson wrote this message on Thu, Dec 21, 2006 at 15:22 +0000: > >>>I think you are only intersted in treads that are sleeping.. so you allow > >>>a sleeping thread to save a pointer to the fd (or whatever) on which it > >>>is > >>>sleeping, along with the sleep address. > >>> > >>>items that are not sleeping are either already returning, or are going to > >>>sleep, in which case they can check at that time. > >> > >>Hence my question about select and poll: should they throw an exception > >>state when a file descriptor is closed out from under them? They often > >>sleep on hundreds or thousands of file descriptors, and not just one. > > > >IMO, your program is buggy if you close the file descriptor before > >everything is out of the kernel wrt the fd... It means that your close > >statement isn't waiting for things to be cleanly shut down, and that > >you still have dangling reference counts to the parts of the code that > >is in the kernel... > > > >I used to expect something similar w/ an kqueue based event driven > >web server, and found that I had bugs due to assuming that I could > >close it whenever I want... What happens if you close the fd between > >the time select returns and you process it? What happens if the fd > >gets closed, and another thread (or an earlier fd that accepts > >connections) reuses that fd? And then youre state machine isn't read > >to get an event since it isn't suppose to get one yet... > > > >The kernel isn't buggy wrt closing a fd when another thread is using > >it, it's the program that's buggy... > > I agree also, but hanging without return isn't very detectable. It's a lot more detectable than working 99% more of the time and failing when things get correupted due to a race.. :) > The best thing to do is to tell the programmer that he is doing > something stupid, and returning with an error is the way that > it is typically done. Solaris seems to have jumped through As long as it's EDOOFUS... I don't see any other error that would be approriate... > some hoops to achieve this behavior, so I doubt it is without > merit. OTOH, I'm not going to argue that it is one of the > more important things we should be worried about ;-) As long as it doesn't cost much more to do it... Hanging is just as good of an indication as returning an error... And I'd say it's better as it forces the buggy software to be fixed as opposed to simply ignoring the error which is likely what the programmer will do... -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not." From owner-freebsd-arch@FreeBSD.ORG Fri Dec 22 04:16:45 2006 Return-Path: X-Original-To: freebsd-arch@freebsd.org Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id E028A16A403; Fri, 22 Dec 2006 04:16:45 +0000 (UTC) (envelope-from deischen@freebsd.org) Received: from mail.ntplx.net (mail.ntplx.net [204.213.176.10]) by mx1.freebsd.org (Postfix) with ESMTP id 8661E13C442; Fri, 22 Dec 2006 04:16:45 +0000 (UTC) (envelope-from deischen@freebsd.org) Received: from sea.ntplx.net (sea.ntplx.net [204.213.176.11]) by mail.ntplx.net (8.13.8/8.13.8/NETPLEX) with ESMTP id kBM4GiRX028600; Thu, 21 Dec 2006 23:16:44 -0500 (EST) Date: Thu, 21 Dec 2006 23:16:44 -0500 (EST) From: Daniel Eischen X-X-Sender: eischen@sea.ntplx.net To: John-Mark Gurney In-Reply-To: <20061222040738.GD4982@funkthat.com> Message-ID: References: <32874.1165905843@critter.freebsd.dk> <20061220153126.G85384@fledge.watson.org> <200612210820.09955.davidxu@freebsd.org> <4589E7D2.9010608@ironport.com> <20061221152115.U83974@fledge.watson.org> <20061222020101.GC4982@funkthat.com> <20061222040738.GD4982@funkthat.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Greylist: Message whitelisted by DRAC access database, not delayed by milter-greylist-3.0 (mail.ntplx.net [204.213.176.10]); Thu, 21 Dec 2006 23:16:44 -0500 (EST) X-Virus-Scanned: by AMaViS and Clam AntiVirus (mail.ntplx.net) Cc: Julian Elischer , Robert Watson , David Xu , freebsd-arch@freebsd.org Subject: Re: close() of active socket does not work on FreeBSD 6 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Daniel Eischen List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Dec 2006 04:16:46 -0000 On Thu, 21 Dec 2006, John-Mark Gurney wrote: > Daniel Eischen wrote this message on Thu, Dec 21, 2006 at 22:35 -0500: >> On Thu, 21 Dec 2006, John-Mark Gurney wrote: >> >>> I used to expect something similar w/ an kqueue based event driven >>> web server, and found that I had bugs due to assuming that I could >>> close it whenever I want... What happens if you close the fd between >>> the time select returns and you process it? What happens if the fd >>> gets closed, and another thread (or an earlier fd that accepts >>> connections) reuses that fd? And then youre state machine isn't read >>> to get an event since it isn't suppose to get one yet... >>> >>> The kernel isn't buggy wrt closing a fd when another thread is using >>> it, it's the program that's buggy... >> >> I agree also, but hanging without return isn't very detectable. > > It's a lot more detectable than working 99% more of the time and > failing when things get correupted due to a race.. :) I dunno, I think returning an appropriate error on the actual call(s) that are problematic is easier to detect than trying to figure out just what is causing the hang, corruption, whatever. Perhaps I mean "debug" instead of "detect". >> The best thing to do is to tell the programmer that he is doing >> something stupid, and returning with an error is the way that >> it is typically done. Solaris seems to have jumped through > > As long as it's EDOOFUS... I don't see any other error that would > be approriate... EBADF. That's what Solaris returns and makes more sense to me. >> some hoops to achieve this behavior, so I doubt it is without >> merit. OTOH, I'm not going to argue that it is one of the >> more important things we should be worried about ;-) > > As long as it doesn't cost much more to do it... Hanging is just as > good of an indication as returning an error... And I'd say it's better > as it forces the buggy software to be fixed as opposed to simply ignoring > the error which is likely what the programmer will do... Yes, unfortunately, ignoring the error would probably happen a lot. -- DE