From owner-freebsd-current@FreeBSD.ORG  Mon Jun 16 05:07:28 2003
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 659BE37B401; Mon, 16 Jun 2003 05:07:28 -0700 (PDT)
Received: from mailman.zeta.org.au (mailman.zeta.org.au [203.26.10.16])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id A409343FBF; Mon, 16 Jun 2003 05:07:26 -0700 (PDT)
	(envelope-from bde@zeta.org.au)
Received: from katana.zip.com.au (katana.zip.com.au [61.8.7.246])
	by mailman.zeta.org.au (8.9.3p2/8.8.7) with ESMTP id WAA22387;
	Mon, 16 Jun 2003 22:07:23 +1000
Date: Mon, 16 Jun 2003 22:07:22 +1000 (EST)
From: Bruce Evans <bde@zeta.org.au>
X-X-Sender: bde@gamplex.bde.org
To: Don Lewis <truckman@freebsd.org>
In-Reply-To: <200306161109.h5GB9MM7048819@gw.catspoiler.org>
Message-ID: <20030616212958.O28213@gamplex.bde.org>
References: <200306161109.h5GB9MM7048819@gw.catspoiler.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
cc: current@freebsd.org
cc: tjr@freebsd.org
Subject: Re: qmail uses 100% cpu after FreeBSD-5.0 to 5.1 upgrade
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 16 Jun 2003 12:07:28 -0000

On Mon, 16 Jun 2003, Don Lewis wrote:

> On 16 Jun, Bruce Evans wrote:
> > In my review of 1.87, I forgot to ask you how atomic the close is with part
> > of it moved out to fifo_inactive().  I think it's important that all
> > traces of the old open have gone away (as far as applications can tell)
> > when the last close returns.
>
> I hadn't taken queued data into consideration.  Now that I've looked at
> this more closely, there are other problems in both the old and new
> code.  If a process calls fcntl(fd, F_SETOWN, ...) on one end of the
> fifo, that should be undone when that end of the fifo is closed.  In the
> old implementation, that only happens when both ends of the fifo are
> closed and the sockets are deleted.

F_SETOWN (and associated signal delivery) is even more broken than that :-].
This fcntl() should applied to the file (though not just the file descriptor),
so its effect should be limited to fd's open in the file instance and go
away when all thse are closed.  However, F_SETOWN (and associated signal
delivery) actually applies to the socket for fifos.  It doesn't work quite
right for ttys either.  F_SETOWN apparently isn't used in ways complicated
enough to require it to work right.

> >> Now there are two questions that I can't answer:
> >>
> >> 	Why is my analysis of select() and the SS_CANTRCVMORE flag
> >>         incorrect in 5.1-current with version 1.87 or 1.88 of
> >>         fifo_vnops.c.
> >
> > I think it is correct, assuming that something writes to the fifo.
> > Writing might be part of synchronization but actually reading the
> > data should not be necessary since the last close must discard the
> > data (POSIX spec).
>
> It sure looks to me like SS_CANTRCVMORE is always set when the write end
> of the fifo is closed, no matter whether the the sockets were freshly
> allocated by a fifo_open() call on the read end of the fifo, or because
> the the last writer closed the write end of the fifo.  It sure looks
> like select() should immediately return if this flag is set, but it is
> not returning ...

Alfred changed the semantics for 5.x.  I thought that you knew this.
I finally gave up resisting this change after a lot of email :-).  In
5.x, SS_CANTRCVMORE often has no effect for fifos (it still works
normally for sockets).  fifo_poll() normally calls soo_poll() with
POLLIN converted to POLLINIGNEOF.  This causes soo_poll() (sopoll())
to skip the usual SS_CANTRCVMORE check (which is inside soreadable())
and check the watermark instead, so that select() on a fifo normally
waits for data even when the fifo is open in nonblocking mode and
SS_CANTRCVMORE is set.  Blocking in select() even in nonblocking mode
is usually what is wanted, but is not what is wanted for detecting
EOF.  4.8 handles EOF detection (== all writers going away in the context
of fifos) better at a cost of providing no good way to wait for the
first writer.  We changed it since all other systems seem to do it like
5.x and few applications understand this.

> Actually, something seems broken.  I modified my little test program to
> actually read the data, which works just fine, but select() still blocks
> when the writer closes the fifo, so there doesn't seem to be a way to
> detect the EOF.

Hmm, we may have changed too much.  EOF can be detected using poll() instead
of select() and seting POLLIN and POLLINIGNEOF in the poll flags (this stops
fifo_poll() clearing POLLIN -- see the comment), but the POLLINIGNEOF is
not documented at the application level and is probably never used there.
I suspect that other systems have more magic to handle EOF.  I tried to
avoid such magic since I think the state of the fifo should be the same
when there are no writers (and no data) no matter how the state of having
no writers was reached (otherwise I think the state depends too much on
races between open() for reading and close() by the last writer).  POSIX
is clear enough on this for read/write but fuzzy for select/poll.

Bruce