Date: Thu, 21 Dec 2006 10:38:33 +0000 (GMT) From: Robert Watson <rwatson@FreeBSD.org> To: David Xu <davidxu@freebsd.org> Cc: Daniel Eischen <deischen@freebsd.org>, freebsd-arch@freebsd.org Subject: Re: close() of active socket does not work on FreeBSD 6 Message-ID: <20061221102909.O83974@fledge.watson.org> In-Reply-To: <200612210820.09955.davidxu@freebsd.org> References: <32874.1165905843@critter.freebsd.dk> <20061220153126.G85384@fledge.watson.org> <Pine.GSO.4.64.0612201308220.23942@sea.ntplx.net> <200612210820.09955.davidxu@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 21 Dec 2006, David Xu wrote: > On Thursday 21 December 2006 02:18, Daniel Eischen wrote: >> On Wed, 20 Dec 2006, Robert Watson wrote: >>> On Wed, 13 Dec 2006, Daniel Eischen wrote: >>>> Anyway, this was just a thought/idea. I don't mean to argue against any >>>> of the other reasons why this isn't a good idea. >>> >>> Whatever may be implemented to solve this issue will require a fairly >>> serious re-working of how we implement file descriptor reference counting >>> in the kernel. Do you propose similar "cancellation" of other system >>> calls blocked on the file descriptor, including select(), etc? Typically >>> these system calls interact with the underlying object associated with the >>> file descriptor, not the file descriptor itself, and often, they act >>> directly on the object and release the file descriptor before performing >>> their operation. I think before we can put any reasonable implementation >>> proposal on the table, we need a clear set of requirements: >> >> [ ... ] >> >>> While providing Solaris-like semantics here makes some amount of sense, >>> this is a very tricky area, and one where we're still refining performance >>> behavior, reference counting behavior, etc. I don't think there will be >>> any easy answers, and we need to think through the semantic and >>> performance implications of any change very carefully before starting to >>> implement. >> >> I don't think the behavior here has to be any different that what we >> currently (or desire to) do with regard to (unblocked) signals interrupting >> threads waiting on IO. You can spend a lot of time thinking about how >> close() should affect IO operations on the same file descriptor, but a very >> simple approach is to treat them the same as if the operations were >> interrupted by a signal. I'm not suggesting it is implemented the same >> way, just that it seems to make a lot of sense to me that the behavior is >> consistent between the two. > > I think the main concern is if we will record every thread using a fd, that > means, when you call read() on a fd, you record your thread pointer into the > fd's thread list, when one wants to close the fd, it has to notify all the > threads in the list, set a flag for each thread, the flag indicates a thread > is interrupted because the fd was closed, when the thread returns from deep > code path to read() syscall, it should check the flag, and return EBADF to > user if it was set. whatever, a reserved signal or TDF_INTERRUPT may > interrupt a thread. but since there are many file operations, I don't know > if we are willing to pay such overheads to every file syscall, extra locking > is not welcomed. Yes, as well as adding quite a bit of complexity and opening the door for some rather odd/unfortunate races. You can inspect the bulk of the Solaris implementation by looking at three spots: http://fxr.watson.org/fxr/ident?v=OPENSOLARIS;i=closeandsetf http://fxr.watson.org/fxr/ident?v=OPENSOLARIS;i=post_syscall http://fxr.watson.org/fxr/search?v=OPENSOLARIS&string=MUSTRETURN In closeandsetf(), you can see that an additional layer of indirection associated with the file descriptor is maintained in order to count consumers of a particular fd, not just the open file record, and the set of active fds for each thread is maintained. When a close() is performed and there are still other open consumers, the process is suspended and all threads are inspected to see if the fd is active for the thread, in which case a thread flag indicating that a stale fd is set. I believe that the interrupt here is an implicit part of the process suspend/restart, and in post_syscall() the EINTR returns are remapped to EBADF. That extra level of indirection and use tracking will be both complex and a performance hit in a critical kernel path. I'm not opposed to investigating implementing something along these lines, but I think we should defer this for some time while we sort out more pressing issues in our kernel file descriptor/socket/etc code and revist this in a few months. We will need to carefully evaluate the performance costs, and if they are significant, figure out how to avoid this causing a significant hit. It's worth observing that removing one level of reference counting from the socket send/receive paths (using the file descriptor reference instead of the socket reference) made a 5%+ difference in high speed send performance. Robert N M Watson Computer Laboratory University of Cambridge
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20061221102909.O83974>