From owner-freebsd-bugs@FreeBSD.ORG Thu Mar 23 02:20:17 2006 Return-Path: X-Original-To: freebsd-bugs@hub.freebsd.org Delivered-To: freebsd-bugs@hub.freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D7D6116A400 for ; Thu, 23 Mar 2006 02:20:17 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [216.136.204.21]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7CB0643D45 for ; Thu, 23 Mar 2006 02:20:17 +0000 (GMT) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.13.4/8.13.4) with ESMTP id k2N2KHCl052764 for ; Thu, 23 Mar 2006 02:20:17 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.13.4/8.13.4/Submit) id k2N2KHVV052763; Thu, 23 Mar 2006 02:20:17 GMT (envelope-from gnats) Date: Thu, 23 Mar 2006 02:20:17 GMT Message-Id: <200603230220.k2N2KHVV052763@freefall.freebsd.org> To: freebsd-bugs@FreeBSD.org From: Bruce Evans Cc: Subject: Re: kern/94772: FIFOs (named pipes) + select() == broken X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Bruce Evans List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Mar 2006 02:20:17 -0000 The following reply was made to PR kern/94772; it has been noted by GNATS. From: Bruce Evans To: Oliver Fromme Cc: bug-followup@freebsd.org Subject: Re: kern/94772: FIFOs (named pipes) + select() == broken Date: Thu, 23 Mar 2006 13:13:15 +1100 (EST) On Wed, 22 Mar 2006, Oliver Fromme wrote: > Oliver Fromme wrote: > > Bruce Evans wrote: > > > See also: > > > > > > PR 76125 (about the same bug) > The first one (76125) seems completely unrelated. Typo? Yes, it's actually 76525. > > (By the way, DEC UNIX 4.0D _does_ have a bug: If the FIFO > > has O_NONBLOCK set and no writer has opened the FIFO, then > > select() doesn't block. > > Actually, it's not a bug. I've read SUSv3 wrong. That > behaviour is perfectly fine. In fact, SUSv3 (a.k.a. > POSIX-2001) requires that select() doesn't block in that > case, and the behaviour of select() and poll() must be > independet of whether O_NONBLOCK is set or not. I have tried to find POSIX saying that many times since I think it is the correct behaviour, but I couldn't find it for either select() or poll() before today. Now I can find it for [p]select() but not for poll(). From POSIX.1-2001-draft7.txt for pselect(): %%% 31193 A descriptor shall be considered ready for reading when a call to an input function with 31194 O_NONBLOCK clear would not block, whether or not the function would transfer data 31195 successfully. (The function might return data, an end-of-file indication, or an error other than 31196 one indicating that it is blocked, and in each of these cases the descriptor shall be considered 31197 ready for reading.) %%% Other parts of POSIX make it clear that O_NONBLOCK reads must never block, so if O_NONBLOCK is set then pselect() for read must never block either. This requires the behaviour of pselect() dependent on O_NONBLOCK but not on the current or previous presence of a writer. I still can't find similar words for poll(). The spec for poll() seems to be fuzzier in general, and the closest I could find is: %%% 27931 POLLIN Data other than high-priority data may be read without blocking. %%% "Data" doesn't seem to be defined anywhere. Is null data (at EOF) data?... poll() presumably does depend on the previous presence of a writer, since POLLHUP only makes sense if there was a previous presence. But POLLHUP seems to be specified even more fuzzily than POLLIN. Clearly previous presences of writers shouldn't count if the previous set of readers, writers and data all went away, but this doesn't seem to be specified in detail, and what happens with multiple readers and/or writers living across sessions either intentionally or due to races or bugs? I intened to check the behaviour for this in my test programs but don't seem to have done it. I intended to follow Linux's behaviour even if this is nonstandard. Linux used to have some special cases including a gripe in a comment about having to have them to match Sun's behaviour, but I couldn't find these when I last checked. Perhaps the difference is precisely between select() and poll(), to follow the standard for select() and exploit the fuzziness for poll(). > > I would be happy to help out, but I'm not familiar with > > that part of the kernel code. > > Well, now (a few hours later) I'm a little bit more > familiar with it. Patch attached below, and also > available from this URL: > http://www.secnetix.de/~olli/tmp/fifodiff.txt > > With that patch, my own test program (the one from > the top of this PR) runs successfully, i.e. EOF is > recognized correctly in all cases that I have tested > (with and without select()), and it also behaves > standard-compliant when O_NONBLOCK is used (see > above). I'll add tests for the O_NONBLOCK behaviour before mailing the test for poll(). > The patch addresses the following things: > - implement POLLHUP in sopoll(). > - remove POLLINIGNEOF. > - selscan() doesn't need to be patched: it already > works as expected when fo_poll() returns POLLHUP. > - I don't think the comment is entirely incorrect, > but I'm not sure, so I left it alone. Ah, selscan() just uses the result of fo_poll() as a boolean to decide whether a descriptor is ready (I thought that it checked only for the bits that it asked for). fo_poll() returns revents. Thus selscan() returns for when one of the output-only parameter bits like POLLHUP is set even if none of the input-output parameter bits are set. I think the comment should say this more directly. > --- src/sys/fs/fifofs/fifo_vnops.c.orig Tue Mar 21 09:42:32 2006 > +++ src/sys/fs/fifofs/fifo_vnops.c Wed Mar 22 18:49:27 2006 > @@ -661,31 +661,11 @@ > ... Good. > --- src/sys/kern/uipc_socket.c.orig Wed Dec 28 19:05:13 2005 > +++ src/sys/kern/uipc_socket.c Wed Mar 22 18:44:08 2006 > @@ -2036,23 +2036,26 @@ > if (soreadable(so)) > revents |= events & (POLLIN | POLLRDNORM); > > - if (events & POLLINIGNEOF) > - if (so->so_rcv.sb_cc >= so->so_rcv.sb_lowat || > - !TAILQ_EMPTY(&so->so_comp) || so->so_error) > - revents |= POLLINIGNEOF; > - Good. > - if (events & (POLLOUT | POLLWRNORM)) > - if (sowriteable(so)) > - revents |= events & (POLLOUT | POLLWRNORM); > + if (events & (POLLOUT | POLLWRNORM) && sowriteable(so)) > + revents |= events & (POLLOUT | POLLWRNORM); > + else { > + /* > + * POLLOUT and POLLHUP shall not both be set. > + * Therefore check only for POLLHUP if POLLOUT > + * has not been set. (Note that POLLHUP need > + * not be in events; it's always checked.) > + */ > + if (so->so_rcv.sb_state & SBS_CANTRCVMORE && > + so->so_rcv.sb_cc == 0) > + revents |= POLLHUP; > + } I think SBS_CANTSENDMORE in so_snd should be checked here. This flag has already been checked to be clear for in sowritable() in most cases. I think the receiver count shouldn't be checked here. I'm surprised that my test succeeds with this -- doesn't it prevent POLLHUP being set in the hangup+ case? Old versions of fifo_vnops.c had bugs from confusing these flags and/or the sender/receiver. I hope these are all fixed now. This might be clearer with SBS_CANTSENDMORE checked first. SBS_CANTSENDMORE set implies !sowriteable() so the behaviour is the same, and I think it is clearer to not even look at the output bits in `events' in the hangup case. I think that none of the other checks in sowriteable() is related to hangup, but I don't understand the PR_CONNREQUIRED one. >... The rest looks good. This also fixes poll() on sockets. Sockets are more often used than named pipes so the change needs a few weeks of testing before MFC. Applications might be confused by poll() actually setting POLLHUP. It sets only POLLIN for hangup now (this is because SBS_CANTRCVMORE implies soreadable()). Bruce