From owner-freebsd-bugs@FreeBSD.ORG Wed Mar 22 01:09:24 2006 Return-Path: X-Original-To: freebsd-bugs@freebsd.org Delivered-To: freebsd-bugs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3E03C16A422; Wed, 22 Mar 2006 01:09:24 +0000 (UTC) (envelope-from bde@zeta.org.au) Received: from mailout2.pacific.net.au (mailout2.pacific.net.au [61.8.0.115]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6FDBE43D48; Wed, 22 Mar 2006 01:09:23 +0000 (GMT) (envelope-from bde@zeta.org.au) Received: from mailproxy1.pacific.net.au (mailproxy1.pacific.net.au [61.8.0.86]) by mailout2.pacific.net.au (Postfix) with ESMTP id 23DBFADCCB; Wed, 22 Mar 2006 12:09:22 +1100 (EST) Received: from epsplex.bde.org (katana.zip.com.au [61.8.7.246]) by mailproxy1.pacific.net.au (8.13.4/8.13.4/Debian-3) with ESMTP id k2M19H5d016998; Wed, 22 Mar 2006 12:09:18 +1100 Date: Wed, 22 Mar 2006 12:09:16 +1100 (EST) From: Bruce Evans X-X-Sender: bde@epsplex.bde.org To: Oliver Fromme In-Reply-To: <200603210956.k2L9ungW094654@lurza.secnetix.de> Message-ID: <20060322081616.X83498@epsplex.bde.org> References: <200603210956.k2L9ungW094654@lurza.secnetix.de> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-bugs@freebsd.org, FreeBSD-gnats-submit@freebsd.org Subject: Re: kern/94772: FIFOs (named pipes) + select() == broken X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 22 Mar 2006 01:09:24 -0000 On Tue, 21 Mar 2006, Oliver Fromme wrote: >> Description: > > I recently wondered why several of my scripts that use > a named pipe (FIFO) don't work on FreeBSD. > > After some debugging it turned out that select() seems > to be broken when used with FIFOs on FreeBSD 6. > Particularly, this is the bug I'm suffering from: > > When a FIFO had been opened for reading and the writing > process closes it, the reading process blocks in select(), > even though the descriptor is ready for read(). If the > select() call is omitted, read() returns 0 immediately > indicating EOF. But as soon as you use select(), it blocks > and there is no chance to detect the EOF condition. See also: PR 76125 (about the same bug) PR 76144 (about a related bug in poll()) PR 53447 (about a another or the same related bug in poll()) PR 34020 (about the inverse of the bug (return on non-EOF) for select() and poll()) The fix for PR 34020 inverted (a symptom of) a bug for poll() to give a worse bug, and gave the (inverted) bug for select() where there was no bug before. fifo_poll() now instructs lower layers to ignore EOF by setting POLLINIGNEOF. This moves the bug for poll() and isn't directly harmful for poll(), but it is directly harmful for select() and it makes the real bug for poll() more harmful. The real bug for poll() is that POLLHUP needs to be set to indicate EOF, but it isn't actually set for many file types including named pipes. So we now always get no indication of EOF where we should get POLLHUP or POLLIN | POLLHUP. Previously, we always got EOF indicated by POLLIN, and we also got EOF indicated by POLLIN in the tricky case where other systems don't indicate EOF. The tricky case is for a named pipe that has not had any writers during the lifetime of current readers or thereabouts (races seem to be a problem). Such a pipe can never have had a connection on it, so it cannot be in the hangup state and it is clear that poll() should not return POLLHUP for it. It is less clear that select() and poll() should block waiting for a writer, but that is what other systems do for poll() at least. I fixed FreeBSD long ago to not always block in read() on a named piped when there are no current writers, since blocking in read() is just wrong in the O_NONBLOCK case. This had the side effect of making select() never block when there are no current writers. poll() inherited this behaviour from select(). This behaviour is wrong since select()/poll are the only reasonable ways to block waiting for a writer, but it doesn't cause many problems. >> How-To-Repeat: > > Please see the small test program below. Compile it like this: > cc -O -o fifotest fifotest.c > ... > The same test program (with "err()" replaced by a small > self-made function) runs without error on all other UNIX > systems that I've tried: Linux 2.4.32, Solaris 10, and > DEC UNIX 4.0 (predecessor of Tru64). By the way, it's > even sufficient to do "cat /dev/null > fifo", i.e. not > writing anything at all, but issuing EOF immediately. > Under FreeBSD, nothing happens at all in that case. > All other UNIX systems recognize EOF (select() returns). Here is a program that tests more cases. I made it give no output (for no errors) under Linux-2.6.10. It also gives no output for the nameless pipe case under FreeBSD-4.10 and FreeBSD-oldcurrent and for the named piped case under FreeBSD-4.10, but it fails with (only) the error in this PR under FreeBSD-oldcurrent. Please test it on Solaris etc. Compile it with -DNAMEDPIPE for the named pipe case. In this case, it creates and leaves a fifo "p" in the current directory (or doesn't handle the error if "p" exists but is not an accessible fifo) but shouldn't have any other side effects. %%% #include #include #include #include #include #include #include static pid_t cpid; static pid_t ppid; static volatile sig_atomic_t state; static void catch(int sig) { state++; } static void child(int fd) { fd_set rfds; struct timeval tv; char buf[256]; #ifdef NAMEDPIPE fd = open("p", O_RDONLY | O_NONBLOCK); if (fd < 0) err(1, "open for read"); #endif kill(ppid, SIGUSR1); /* XXX should check that fd fits in rfds. */ usleep(1); while (state != 1) ; #ifndef NAMEDPIPE /* * The connection cannot be restablished. Use the code that delays * the read until after the writer disconnects since that case is * more interesting. */ state = 4; goto state4; #endif FD_ZERO(&rfds); FD_SET(fd, &rfds); tv.tv_sec = 0; tv.tv_usec = 0; if (select(fd + 1, &rfds, NULL, NULL, &tv) < 0) err(1, "select"); if (FD_ISSET(fd, &rfds)) warnx("state 1: expected clear; got set"); kill(ppid, SIGUSR1); usleep(1); while (state != 2) ; FD_ZERO(&rfds); FD_SET(fd, &rfds); tv.tv_sec = 0; tv.tv_usec = 0; if (select(fd + 1, &rfds, NULL, NULL, &tv) < 0) err(1, "select"); if (!FD_ISSET(fd, &rfds)) warnx("state 2: expected set; got clear"); if (read(fd, buf, sizeof buf) != 1) err(1, "read"); FD_ZERO(&rfds); FD_SET(fd, &rfds); tv.tv_sec = 0; tv.tv_usec = 0; if (select(fd + 1, &rfds, NULL, NULL, &tv) < 0) err(1, "select"); if (FD_ISSET(fd, &rfds)) warnx("state 2a: expected clear; got set"); kill(ppid, SIGUSR1); usleep(1); while (state != 3) ; FD_ZERO(&rfds); FD_SET(fd, &rfds); tv.tv_sec = 0; tv.tv_usec = 0; if (select(fd + 1, &rfds, NULL, NULL, &tv) < 0) err(1, "select"); if (!FD_ISSET(fd, &rfds)) warnx("state 3: expected set; got clear"); kill(ppid, SIGUSR1); /* * Now we expect a new writer, and a new connection too since * we read all the data. The only new point is that we didn't * start quite from scratch since the read fd is not new. Check * startup state as above, but don't do the read as above. */ usleep(1); while (state != 4) ; state4: FD_ZERO(&rfds); FD_SET(fd, &rfds); tv.tv_sec = 0; tv.tv_usec = 0; if (select(fd + 1, &rfds, NULL, NULL, &tv) < 0) err(1, "select"); if (FD_ISSET(fd, &rfds)) warnx("state 4: expected clear; got set"); kill(ppid, SIGUSR1); usleep(1); while (state != 5) ; FD_ZERO(&rfds); FD_SET(fd, &rfds); tv.tv_sec = 0; tv.tv_usec = 0; if (select(fd + 1, &rfds, NULL, NULL, &tv) < 0) err(1, "select"); if (!FD_ISSET(fd, &rfds)) warnx("state 5: expected set; got clear"); kill(ppid, SIGUSR1); usleep(1); while (state != 6) ; /* * Now we have no writer, but should still have data from the old * writer. Check that we have both a data condition and a hangup * condition, and that the data can read the data in the usual way. * Since Linux does this, programs must not quite reading when they * see POLLHUP; they must see POLLHUP without POLLIN (or another * input condition) before they decide that there is EOF. gdb-6.1.1 * is an example of a broken program that quits on POLLHUP only -- * see its event-loop.c. */ FD_ZERO(&rfds); FD_SET(fd, &rfds); tv.tv_sec = 0; tv.tv_usec = 0; if (select(fd + 1, &rfds, NULL, NULL, &tv) < 0) err(1, "select"); if (!FD_ISSET(fd, &rfds)) warnx("state 6: expected set; got clear"); if (read(fd, buf, sizeof buf) != 1) err(1, "read"); FD_ZERO(&rfds); FD_SET(fd, &rfds); tv.tv_sec = 0; tv.tv_usec = 0; if (select(fd + 1, &rfds, NULL, NULL, &tv) < 0) err(1, "select"); if (!FD_ISSET(fd, &rfds)) warnx("state 6a: expected set; got clear"); close(fd); kill(ppid, SIGUSR1); exit(0); } static void parent(int fd) { usleep(1); while (state != 1) ; #ifdef NAMEDPIPE fd = open("p", O_WRONLY | O_NONBLOCK); if (fd < 0) err(1, "open for write"); #endif kill(cpid, SIGUSR1); usleep(1); while (state != 2) ; if (write(fd, "", 1) != 1) err(1, "write"); kill(cpid, SIGUSR1); usleep(1); while (state != 3) ; if (close(fd) != 0) err(1, "close for write"); kill(cpid, SIGUSR1); usleep(1); while (state != 4) ; #ifndef NAMEDPIPE return; #endif fd = open("p", O_WRONLY | O_NONBLOCK); if (fd < 0) err(1, "open for write"); kill(cpid, SIGUSR1); usleep(1); while (state != 5) ; if (write(fd, "", 1) != 1) err(1, "write"); kill(cpid, SIGUSR1); usleep(1); while (state != 6) ; if (close(fd) != 0) err(1, "close for write"); kill(cpid, SIGUSR1); usleep(1); while (state != 7) ; } int main(void) { int fd[2]; int i; #ifdef NAMEDPIPE if (mkfifo("p", 0666) != 0 && errno != EEXIST) err(1, "mkfifo"); #endif signal(SIGUSR1, catch); ppid = getpid(); for (i = 0; i < 2; i++) { #ifndef NAMEDPIPE if (pipe(fd) != 0) err(1, "pipe"); #else fd[0] = -1; fd[1] = -1; #endif state = 0; switch (cpid = fork()) { case -1: err(1, "fork"); case 0: (void)close(fd[1]); child(fd[0]); break; default: (void)close(fd[0]); parent(fd[1]); break; } } return (0); } %%% The error output from this is: %%% selectp: state 3: expected set; got clear selectp: state 6a: expected set; got clear selectp: state 3: expected set; got clear selectp: state 6a: expected set; got clear %%% These messages are all caused by the same bug. "state 3" is EOF without any data ever having been readable. "state 6" is EOF with data readable (the test for this passed so there is no output for it). "state 6a" is EOF after having read the data available in state 6. It is good that state 6a fails in the same way as state 3 -- at least the bug doesn't seem to involve races. The duplicate messages are caused by iterating the test to see if the bug depends on previous activity on the pipe (but the program is probably too careful cleaning up for iteration to show problems). A similar test program (not enclosed) shows many more bugs for poll(): - no output under Linux-2.6.10 - FreeBSD-4.10 on nameless pipes: % poll: state 6a: expected POLLHUP; got 0x11 % poll: state 6a: expected POLLHUP; got 0x11 0x11 is POLLIN | POLLHUP. Nameless pipes are one of the few file types for which POLLHUP is actually implemented (POLLHUP is also implemented (not quite right) for ttys but isn't implemented for any other important file type). Linux returns only POLLHUP here. This is best, since it allows distinguishing the case of pure EOF from the case of EOF with data reasable. However, buggy applications like gdb don't actually understand the difference between EOF-with-data and pure EOF (see the comment in the test program). Also, select() depends on pipe_poll() returning POLLIN to work. - FreeBSD-oldcurrent on nameless pipes: % poll: state 6a: expected POLLHUP; got 0x11 % poll: state 6a: expected POLLHUP; got 0x11 No change. The kernel code for select() and poll() hasn't been either fixed or broken for nameless pipes. - FreeBSD-4.10 on named pipes: % pollp: state 3: expected POLLHUP; got 0x1 % pollp: state 6: expected POLLIN | POLLHUP; got 0x1 % pollp: state 6a: expected POLLHUP; got 0x1 % pollp: state 3: expected POLLHUP; got 0x1 % pollp: state 6: expected POLLIN | POLLHUP; got 0x1 % pollp: state 6a: expected POLLHUP; got 0x1 0x1 is POLLIN. FreeBSD-4.10 never returns POLLHUP for named pipes. This and/or returning POLLIN in too many cases causes the problem in PRs 34020, 53447 and 76144. - FreeBSD-oldcurrent on nameless pipes: % pollp: state 6: expected POLLIN | POLLHUP; got 0x1 % pollp: state 6: expected POLLIN | POLLHUP; got 0x1 FreeBSD-current still never returns POLLHUP for nameless pipes. However, my test program determines whether POLLHUP should have been returned in some cases and reduces the bugs to the above. It uses FreeBSD(>4)'s (my) POLLINIGNEOF to do this. POLLINIGNEOF was supposed to be usable for this to limit the damage caused by the fix for PR34020, since I knew that this fix would break EOF handling in some cases. However, POLLINIGNEOF doesn't really work. To use it, the test program has to use nonblocking syscalls for everything including poll() (it gets nonblocking polls using a timeout of 0). Even with this, EOF can't be detected in state 6 (EOF-with-data). But the bug in state 6 is small (it even compensates for the bug in gdb). Here is the code to determine POLLHUP: %%% #ifdef POLLINIGNEOF /* * FreeBSD's POLLINIGNEOF (which causes half of the bugs when the kernel * uses it) can be used to fix up the broken cases 3 and 6a if the kernel * uses it, i.e., for named pipes but not for pipes. Note that the sense * of POLLINIGNEOF is reversed when passed to the kernel -- it means * don't-ignore-EOF in .events and if it is set there then it means * not-POLLHUP in .revents. */ int mypoll(struct pollfd *fds, nfds_t nfds, int timeout) { struct pollfd mypfd; int r; r = poll(fds, nfds, timeout); if (nfds != 1 || timeout != 0 || fds[0].revents & POLLIN) return (r); mypfd = fds[0]; mypfd.events |= POLLINIGNEOF; r = poll(&mypfd, 1, 0); if (r >= 0) { if (mypfd.revents &= POLLIN) { mypfd.revents &= ~POLLIN; mypfd.revents |= POLLHUP; } fds[0].revents = mypfd.revents; } return (r); } #define poll(fds, nfds, timeout) mypoll((fds), (nfds), (timeout)) #endif %%% With this userland fixup for the missing POLLHUP, the above shows that states 3 and 6a have been fixed for poll() on named pipes. These are precisely the states that have been broken for select() on named pipes. Toggling the seting of POLLIN for these states toggles the location of one of the bugs. Without this userland fixup, the output for FreeBSD-oldcurrent on named pipes is: % state 3: expected POLLHUP; got 0 % state 6: expected POLLIN | POLLHUP; got 0x1 % state 6a: expected POLLHUP; got 0 % state 3: expected POLLHUP; got 0 % state 6: expected POLLIN | POLLHUP; got 0x1 % state 6a: expected POLLHUP; got 0 The POLLHUP flag is now never set, so states 3 and 6a aren't actually fixed; in fact they are more broken than before, just like for select() -- now no poll flag is set for these cases, so poll() and select() don't even see normal hangups unless they are used with a timeout and/or with the negative-logic POLLINIGNEOF as in my test program. IIRC, PRs 53447 and 76144 are about this problem. Quick fix (?): #defining POLLINIGNEOF as 0 in should give the FreeBSD-4.10 behaviour. Fix (?): - actually implement returning POLLHUP in sopoll() and other places. Return POLLHUP but not POLLIN for the pure-EOF case. Return POLLIN* | POLLHUP for EOF-with-data. - remove POLLINIGNEOF and associated complications in sopoll(), fifo_poll() and - change selscan() to check for POLLHUP so that POLLIN, POLLIN | POLLHUP and POLLHUP all act the same for select() - remove POLLHUP from the comment in selscan(). Fix the rest of this comment or remove it (most backends are too broken to return poll flags if appropriate, and the comment only mentions one of the other poll flags that selscan() ignores) - remove the corresponding comment in pollscan() since it is wrong and says nothing relevant (pollscan() just accepts whatever flags the backends set). Bruce