From owner-freebsd-current@FreeBSD.ORG Mon Jun 16 03:48:53 2003 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1F4F637B401; Mon, 16 Jun 2003 03:48:53 -0700 (PDT) Received: from mailman.zeta.org.au (mailman.zeta.org.au [203.26.10.16]) by mx1.FreeBSD.org (Postfix) with ESMTP id 5B84943F93; Mon, 16 Jun 2003 03:48:51 -0700 (PDT) (envelope-from bde@zeta.org.au) Received: from katana.zip.com.au (katana.zip.com.au [61.8.7.246]) by mailman.zeta.org.au (8.9.3p2/8.8.7) with ESMTP id UAA14000; Mon, 16 Jun 2003 20:48:48 +1000 Date: Mon, 16 Jun 2003 20:48:47 +1000 (EST) From: Bruce Evans X-X-Sender: bde@gamplex.bde.org To: Don Lewis In-Reply-To: <200306160817.h5G8HZM7048277@gw.catspoiler.org> Message-ID: <20030616200848.U27906@gamplex.bde.org> References: <200306160817.h5G8HZM7048277@gw.catspoiler.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: current@freebsd.org cc: tjr@freebsd.org Subject: Re: qmail uses 100% cpu after FreeBSD-5.0 to 5.1 upgrade X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 16 Jun 2003 10:48:53 -0000 On Mon, 16 Jun 2003, Don Lewis wrote: > On 16 Jun, I wrote: > > On 16 Jun, Tim Robbins wrote: > > >>> This looks like a bug in the named pipe code. Reverting > >>> sys/fs/fifofs/fifo_vnops.c to the RELENG_5_0 version makes the problem go > >>> away. I haven't tracked down exactly what change between RELENG_5_0 and > >>> RELENG_5_1 caused the problem. > >> > >> Looks like revision 1.86 works, but it stops working with 1.87. Moving the > >> soclose() calls to fifo_inactive() may have caused it. > > > > This is an interesting observation, but I'm not sure why it would make a > > difference. I haven't looked at the qmail source, but it looks like it > > is doing a non-blocking open on the fifo, calling select() on the fd, > > and hoping that select() waits for a writer to open the fifo before > > returning with an indication that the descriptor is readable. In my review of 1.87, I forgot to ask you how atomic the close is with part of it moved out to fifo_inactive(). I think it's important that all traces of the old open have gone away (as far as applications can tell) when the last close returns. > > It looks like the select code is calling the soreadable() macro to > > determine if the fifo descriptor is readable, and the soreadable() macro > > returns a true value if the SS_CANTRCVMORE socket flag is set, which > > would indicate an EOF condition. fifo_close() sets this flag and the corresponding send flag on last close, so there is no direct problem here. > > ... > > The posted qmail syscall trace looks like what I would expect to see in > > the present implementation. I can't explain why it would behave any > > differently prior to 1.87 ... > > The plot thickens ... > > I ran this bit of code on both 5.1 current with version 1.88 of > fifo_vnops.c, and 4.8-stable: > > #include > #include > #include > #include > main() > { > int fd; > fd_set readfds; > > fd = open("myfifo", O_RDONLY | O_NONBLOCK); > > printf("before the loop\n"); > while (1) { > FD_ZERO(&readfds); > FD_SET(fd, &readfds); > printf("%d %d\n", fd, select(20, &readfds, NULL, NULL, NULL)); > } > exit(0); > } > > On 4.8-stable, select() immediately returns a "1", whether or not the > fifo has ever been opened for writing. > > On 5.1-current, select() waits forever, even if the fifo has been opened > for writing by another process. Select() only returns when something > has actually been written to the fifo, and since this process doesn't > read anything from the fifo, it spins on select() forever. > > If some data is getting written to the fifo, it doesn't look like qmail > consumes it, and since fifo_close in 1.87 doesn't destroy the sockets, > it looks like the data is hanging around in the fifo while neither end > is open, and qmail stumbles across this data when it calls select() > after re-opening the fifo. > > Now there are two questions that I can't answer: > > Why is my analysis of select() and the SS_CANTRCVMORE flag > incorrect in 5.1-current with version 1.87 or 1.88 of > fifo_vnops.c. I think it is correct, assuming that something writes to the fifo. Writing might be part of synchronization but actually reading the data should not be necessary since the last close must discard the data (POSIX spec). > Why doesn't qmail get stuck in a similar loop in 4.8-stable, > since select always returns true for reading on a fifo with no > writers? Don't know. Maybe it uses autoconfig to handle the 4.8 behaviour. The 4.8 behaviour is normal compared with the buggy behaviour of not discarding data on last close, so applications should handle it better :-). Maybe qmain spins under 4.8 too, but only until synchronization is achieved. Bruce