From owner-freebsd-arch@FreeBSD.ORG Thu Dec 21 13:45:58 2006 Return-Path: X-Original-To: freebsd-arch@freebsd.org Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 984F716A509; Thu, 21 Dec 2006 13:45:58 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.freebsd.org (Postfix) with ESMTP id 4560C13C475; Thu, 21 Dec 2006 13:45:58 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id E39DC47112; Thu, 21 Dec 2006 05:38:33 -0500 (EST) Date: Thu, 21 Dec 2006 10:38:33 +0000 (GMT) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: David Xu In-Reply-To: <200612210820.09955.davidxu@freebsd.org> Message-ID: <20061221102909.O83974@fledge.watson.org> References: <32874.1165905843@critter.freebsd.dk> <20061220153126.G85384@fledge.watson.org> <200612210820.09955.davidxu@freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Daniel Eischen , freebsd-arch@freebsd.org Subject: Re: close() of active socket does not work on FreeBSD 6 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Dec 2006 13:45:58 -0000 On Thu, 21 Dec 2006, David Xu wrote: > On Thursday 21 December 2006 02:18, Daniel Eischen wrote: >> On Wed, 20 Dec 2006, Robert Watson wrote: >>> On Wed, 13 Dec 2006, Daniel Eischen wrote: >>>> Anyway, this was just a thought/idea. I don't mean to argue against any >>>> of the other reasons why this isn't a good idea. >>> >>> Whatever may be implemented to solve this issue will require a fairly >>> serious re-working of how we implement file descriptor reference counting >>> in the kernel. Do you propose similar "cancellation" of other system >>> calls blocked on the file descriptor, including select(), etc? Typically >>> these system calls interact with the underlying object associated with the >>> file descriptor, not the file descriptor itself, and often, they act >>> directly on the object and release the file descriptor before performing >>> their operation. I think before we can put any reasonable implementation >>> proposal on the table, we need a clear set of requirements: >> >> [ ... ] >> >>> While providing Solaris-like semantics here makes some amount of sense, >>> this is a very tricky area, and one where we're still refining performance >>> behavior, reference counting behavior, etc. I don't think there will be >>> any easy answers, and we need to think through the semantic and >>> performance implications of any change very carefully before starting to >>> implement. >> >> I don't think the behavior here has to be any different that what we >> currently (or desire to) do with regard to (unblocked) signals interrupting >> threads waiting on IO. You can spend a lot of time thinking about how >> close() should affect IO operations on the same file descriptor, but a very >> simple approach is to treat them the same as if the operations were >> interrupted by a signal. I'm not suggesting it is implemented the same >> way, just that it seems to make a lot of sense to me that the behavior is >> consistent between the two. > > I think the main concern is if we will record every thread using a fd, that > means, when you call read() on a fd, you record your thread pointer into the > fd's thread list, when one wants to close the fd, it has to notify all the > threads in the list, set a flag for each thread, the flag indicates a thread > is interrupted because the fd was closed, when the thread returns from deep > code path to read() syscall, it should check the flag, and return EBADF to > user if it was set. whatever, a reserved signal or TDF_INTERRUPT may > interrupt a thread. but since there are many file operations, I don't know > if we are willing to pay such overheads to every file syscall, extra locking > is not welcomed. Yes, as well as adding quite a bit of complexity and opening the door for some rather odd/unfortunate races. You can inspect the bulk of the Solaris implementation by looking at three spots: http://fxr.watson.org/fxr/ident?v=OPENSOLARIS;i=closeandsetf http://fxr.watson.org/fxr/ident?v=OPENSOLARIS;i=post_syscall http://fxr.watson.org/fxr/search?v=OPENSOLARIS&string=MUSTRETURN In closeandsetf(), you can see that an additional layer of indirection associated with the file descriptor is maintained in order to count consumers of a particular fd, not just the open file record, and the set of active fds for each thread is maintained. When a close() is performed and there are still other open consumers, the process is suspended and all threads are inspected to see if the fd is active for the thread, in which case a thread flag indicating that a stale fd is set. I believe that the interrupt here is an implicit part of the process suspend/restart, and in post_syscall() the EINTR returns are remapped to EBADF. That extra level of indirection and use tracking will be both complex and a performance hit in a critical kernel path. I'm not opposed to investigating implementing something along these lines, but I think we should defer this for some time while we sort out more pressing issues in our kernel file descriptor/socket/etc code and revist this in a few months. We will need to carefully evaluate the performance costs, and if they are significant, figure out how to avoid this causing a significant hit. It's worth observing that removing one level of reference counting from the socket send/receive paths (using the file descriptor reference instead of the socket reference) made a 5%+ difference in high speed send performance. Robert N M Watson Computer Laboratory University of Cambridge