From owner-freebsd-arch@FreeBSD.ORG  Tue Jan 10 12:04:27 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 6C8A4106566B;
	Tue, 10 Jan 2012 12:04:27 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail01.syd.optusnet.com.au (mail01.syd.optusnet.com.au
	[211.29.132.182])
	by mx1.freebsd.org (Postfix) with ESMTP id BC8958FC08;
	Tue, 10 Jan 2012 12:04:25 +0000 (UTC)
Received: from c211-30-171-136.carlnfd1.nsw.optusnet.com.au
	(c211-30-171-136.carlnfd1.nsw.optusnet.com.au [211.30.171.136])
	by mail01.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	q0AC4Img016496
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Tue, 10 Jan 2012 23:04:19 +1100
Date: Tue, 10 Jan 2012 23:04:18 +1100 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Bruce Evans <brde@optusnet.com.au>
In-Reply-To: <20120110153807.H943@besplex.bde.org>
Message-ID: <20120110211510.T1676@besplex.bde.org>
References: <CACfq093o9iVZKxCj58OR2hpCLDYTUTdxg_re_bEMYn2SrNrLCQ@mail.gmail.com>
	<20120110005155.S2378@besplex.bde.org>
	<CACfq09225iMYLe6p8jSiVhsDw_rqTyEHsvPdtZXLrQYT0-skzg@mail.gmail.com>
	<20120110153807.H943@besplex.bde.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: flo@freebsd.org, Giovanni Trematerra <gianni@freebsd.org>,
	Attilio Rao <attilio@freebsd.org>,
	Konstantin Belousov <kib@freebsd.org>, freebsd-arch@freebsd.org,
	jilles@freebsd.org
Subject: Re: pipe/fifo code merged.
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 10 Jan 2012 12:04:27 -0000

On Tue, 10 Jan 2012, Bruce Evans wrote:

> I think you don't want me to read the patch, since I would see too much
> detail starting with style bugs.  Anyway..
> ...

One more set of details.

% -static int
% -fifo_poll_f(struct file *fp, int events, struct ucred *cred, struct thread *td)
% -{
% -	struct fifoinfo *fip;
% -	struct file filetmp;
% -	int levents, revents = 0;
% -
% -	fip = fp->f_data;
% -	levents = events &
% -	    (POLLIN | POLLINIGNEOF | POLLPRI | POLLRDNORM | POLLRDBAND);
% -	if ((fp->f_flag & FREAD) && levents) {
% -		filetmp.f_data = fip->fi_readsock;
% -		filetmp.f_cred = cred;
% -		mtx_lock(&fifo_mtx);
% -		if (fp->f_seqcount == fip->fi_wgen)
% -			levents |= POLLINIGNEOF;
% -		mtx_unlock(&fifo_mtx);
% -		revents |= soo_poll(&filetmp, levents, cred, td);
% -	}
% -	levents = events & (POLLOUT | POLLWRNORM | POLLWRBAND);
% -	if ((fp->f_flag & FWRITE) && levents) {
% -		filetmp.f_data = fip->fi_writesock;
% -		filetmp.f_cred = cred;
% -		revents |= soo_poll(&filetmp, levents, cred, td);
% -	}
% -	return (revents);
% -}

This was reasonably clean.  My version is cleaner:
- POLLIGNEOF is an old mistake of mine.  I tried to kill it, but kib@
   propagated it to sys_pipe.c too, where it has survived another release
   or two.  In my version, I still have it in the call to soo_poll() but
   don't have it in the `levents = events & ...' mask.  Thus it is a
   pure kernel flag, and acts the same as your isfifo flag -- it tells
   the socket layer to do something unusual because this is a fifo.  It
   is not needed any more, since the pipe layer is close to the fifo
   layer so it can just do something unusual.  It can determine whether
   the pipe is a fifo without passing around flags (the flag should be
   in pipe_state).
- My version is missing the FREAD and FWRITE checks.  These seem to be
   necessary, but I think they don't belong at this level.  Also, the
   error handling for them seems quite broken (nonexistent).  I think
   POLLERR is supposed to be returned for attempts to poll for an
   impossible condition, but the FREAD and FWRITE checks give a return
   of 0.  And returning 0 is much worse than returning success, since
   it will cause at least poll() to block() when it should return,
   Here is the commit that added these checks:

% ----------------------------
% revision 1.118
% date: 2005/09/12 10:16:18;  author: rwatson;  state: Exp;  lines: +2 -2
% Only poll the fifo for read events if the fifo is attached to a readable
% file descriptor.  Otherwise, the read end of a fifo might return that it
% is writable (which it isn't).

But it should return (with POLLERR).  This is an error condition and
should be detected.

POSIX is fuzzy about this.  It only says that POLLERR is for when an error
occurred.  It defines the POLLNVAL error clearly as meaning that the fd
is invalid.  Well that is not so clear.  A non-open fd is clearly invalid.
This is handled in upper layers.  Polling for a direction that can't work
can be considered as an invalid fd too, unless "invalid" has its technical
meaning.  Linux-2.6.10 sets POLLERR for reading from a pipe or fifo with
no readers, and has an XXX comment saying that most Unices don't do this
for fifos.  This seems wrong to me, and FreeBSD doesn't do it for any of
pipes, fifos or sockets.  But for pipes, there is tricky EOF handling
associated with this condition.  I can't see anywhere where Linux gives
this based on the open mode.

% 
% Only poll the fifo for write events if the fifo attached to a writable
% file descriptor.  Otherwise, the write end of a fifo might return that
% it is readable (which it isn't).

Seems to be necessary too.  I can't see anywhere where Linux returns
POLLERR for i/o errors or unwritable files.

% 
% In the event that a file is FREAD|FWRITE (which is allowed by POSIX, but
% has undefined behavior), we poll for both.
% 
% MFC after:	3 days
% ----------------------------

select() is interestingly different than poll().  It can't return POLLERR.
Thus, the old broken behaviour gave the best close to possible behaviour
for select() at the usual level.  The POLLERR's should make it return
success, and the false successes in the kernel would have done the same.
Only cases where there were no false successes in the kernel were broken.

% @@ -1326,58 +1549,66 @@ pipe_poll(fp, events, active_cred, td)
%  	struct ucred *active_cred;
%  	struct thread *td;
%  {
% -	struct pipe *rpipe = fp->f_data;
% +	struct pipeinfo *pip = fp->f_data;
% +	struct pipe *rpipe;
%  	struct pipe *wpipe;
%  	int revents = 0;
%  #ifdef MAC
%  	int error;
%  #endif
% 
% -	wpipe = rpipe->pipe_peer;
% +	rpipe = pip->pi_rpipe;
% +	wpipe = pip->pi_wpipe->pipe_peer;
%  	PIPE_LOCK(rpipe);
%  #ifdef MAC
%  	error = mac_pipe_check_poll(active_cred, rpipe->pipe_pair);
%  	if (error)
% -		goto locked_error;
% +		return (0);

Seems to be broken.  The unlock is now missing.

I don't like defaults set by initializations in declarations 'revents
= 0'.  Both the default and the return of 0 here seem to be wrong.
This is an error condition, so I think POLLERR should be returned, as
about.  Otherwise, poll() will probably block.  And the block is not
just transient, at least in the above since the error condition can
never go away.  You will only be saved from blocking forever if there
is success on some other file descriptor or event.

%  #endif
% -	if (events & (POLLIN | POLLRDNORM))
% -		if ((rpipe->pipe_state & PIPE_DIRECTW) ||
% -		    (rpipe->pipe_buffer.cnt > 0))
% -			revents |= events & (POLLIN | POLLRDNORM);
% +	if (fp->f_flag & FREAD) {
% +		if (events & (POLLIN | POLLRDNORM))
% +			if ((rpipe->pipe_state & PIPE_DIRECTW) ||
% +			    (rpipe->pipe_buffer.cnt > 0))
% +				revents |= events & (POLLIN | POLLRDNORM);

The change in fifos_vnops.c was done cleanly by adding the FREAD check
to the events mask check.  With fifos now polled here, it is needed
(modulo bugs) here too.  But here it makes the important changes for
fifos, if any, unreadable by indenting everything.

% -	if (events & (POLLOUT | POLLWRNORM))
% -		if (wpipe->pipe_present != PIPE_ACTIVE ||
% -		    (wpipe->pipe_state & PIPE_EOF) ||
% -		    (((wpipe->pipe_state & PIPE_DIRECTW) == 0) &&
% -		     ((wpipe->pipe_buffer.size - wpipe->pipe_buffer.cnt) >= PIPE_BUF ||
% -			 wpipe->pipe_buffer.size == 0)))
% -			revents |= events & (POLLOUT | POLLWRNORM);
% +		PIPE_UNLOCK(rpipe);
% +		if (fifo_iseof(fp))
% +			events |= POLLINIGNEOF;
% +		PIPE_LOCK(rpipe);
% 
% -	if ((events & POLLINIGNEOF) == 0) {
% -		if (rpipe->pipe_state & PIPE_EOF) {
% -			revents |= (events & (POLLIN | POLLRDNORM));
% -			if (wpipe->pipe_present != PIPE_ACTIVE ||
% -			    (wpipe->pipe_state & PIPE_EOF))
% -				revents |= POLLHUP;
% +		if ((events & POLLINIGNEOF) == 0) {
% +			if (rpipe->pipe_state & PIPE_EOF) {
% +				revents |= (events & (POLLIN | POLLRDNORM));
% +				if (wpipe->pipe_present != PIPE_ACTIVE ||
% +				    (wpipe->pipe_state & PIPE_EOF))
% +					revents |= POLLHUP;
% +			}
%  		}
%  	}
% +	if (fp->f_flag & FWRITE)
% +		if (events & (POLLOUT | POLLWRNORM))
% +			if (wpipe->pipe_present != PIPE_ACTIVE ||
% +			    (wpipe->pipe_state & PIPE_EOF) || 
% +		        (((wpipe->pipe_state & PIPE_DIRECTW) == 0) &&
% +		        ((wpipe->pipe_buffer.size - wpipe->pipe_buffer.cnt) >=
% +				PIPE_BUF || wpipe->pipe_buffer.size == 0)))
% +				revents |= events & (POLLOUT | POLLWRNORM);
% 
%  	if (revents == 0) {
% -		if (events & (POLLIN | POLLRDNORM)) {
% -			selrecord(td, &rpipe->pipe_sel);
% -			if (SEL_WAITING(&rpipe->pipe_sel))
% -				rpipe->pipe_state |= PIPE_SEL;
% -		}
% +		if (fp->f_flag & FREAD)
% +			if (events & (POLLIN | POLLRDNORM)) {
% +				selrecord(td, &rpipe->pipe_sel);
% +				if (SEL_WAITING(&rpipe->pipe_sel))
% +					rpipe->pipe_state |= PIPE_SEL;
% +			}
% 
% -		if (events & (POLLOUT | POLLWRNORM)) {
% -			selrecord(td, &wpipe->pipe_sel);
% -			if (SEL_WAITING(&wpipe->pipe_sel))
% -				wpipe->pipe_state |= PIPE_SEL;
% -		}
% +		if (fp->f_flag & FWRITE)
% +			if (events & (POLLOUT | POLLWRNORM)) {
% +				selrecord(td, &wpipe->pipe_sel);
% +				if (SEL_WAITING(&wpipe->pipe_sel))
% +					wpipe->pipe_state |= PIPE_SEL;
% +			}
%  	}
% -#ifdef MAC
% -locked_error:
% -#endif
%  	PIPE_UNLOCK(rpipe);
% 
%  	return (revents);

It seems that not much really changed here.  To avoid indentation and
fix bugs, the FREAD and FWRITE checks should be done up front.  I think
they can be done before locking and mac checking.  Something like:

 	if ((fp->f_flag & FREAD) && (events & (POLLIN | POLLRDNORM))
 		return (POLLERR);
 	if ((fp->f_flag & FWRITE) && (events & (POLLOUT | POLLWRNORM))
 		return (POLLERR);
 	if (events & POLLINIGNEOF)
 		return (POLLER);	/* try to kill this too */

Since the diff for pipe_poll() was unreadable, here it is again with
the old lines removed.  A few more problems are now obvious:

% @@ -1326,58 +1549,66 @@ pipe_poll(fp, events, active_cred, td)
%  	struct ucred *active_cred;
%  	struct thread *td;
%  {
% +	struct pipeinfo *pip = fp->f_data;
% +	struct pipe *rpipe;
%  	struct pipe *wpipe;
%  	int revents = 0;
%  #ifdef MAC
%  	int error;
%  #endif
% 
% +	rpipe = pip->pi_rpipe;
% +	wpipe = pip->pi_wpipe->pipe_peer;
%  	PIPE_LOCK(rpipe);
%  #ifdef MAC
%  	error = mac_pipe_check_poll(active_cred, rpipe->pipe_pair);
%  	if (error)
% +		return (0);
%  #endif
% +	if (fp->f_flag & FREAD) {
% +		if (events & (POLLIN | POLLRDNORM))
% +			if ((rpipe->pipe_state & PIPE_DIRECTW) ||
% +			    (rpipe->pipe_buffer.cnt > 0))
% +				revents |= events & (POLLIN | POLLRDNORM);
%

This style bug (extra blank line) was common in old code.  It helps make
the diffs unreadable too.

% +		PIPE_UNLOCK(rpipe);
% +		if (fifo_iseof(fp))
% +			events |= POLLINIGNEOF;
% +		PIPE_LOCK(rpipe);

This is new code (needed to force POLLIGNEOF for fifos).

It is a layering violation to call the fifo code for non-fifos here.
fifo_iseof() handles this internally by checking fp->vnode->v_fifoinfo.
The pipe layer should know if it is dealing with a fifo in a better way
than that.

I don't like unlocking in the middle in general, and here it gives
races.  We will miss setting POLLIN | POLLRDNORM for certain changes
if they weren't set earlier and the state changed while unlocked.  Why
unlock anyway or lock in fifo_iseof()?  Only fi_seqcount == fi_wgen
is checked under the lock there.  Races in that check are probably
just as harmless as races here.  And locking doesn't even prevent them,
since if fi_seqcount or fi_wgen can change underneath us, they can also
change just after we check them.  They rarely change compared with the
buffer count raced with above.

% 
% +		if ((events & POLLINIGNEOF) == 0) {
% +			if (rpipe->pipe_state & PIPE_EOF) {
% +				revents |= (events & (POLLIN | POLLRDNORM));
% +				if (wpipe->pipe_present != PIPE_ACTIVE ||
% +				    (wpipe->pipe_state & PIPE_EOF))
% +					revents |= POLLHUP;
% +			}

This is old code, reindented.  It was not needed, since it used to
just check for the POLLINIGNEOF mistake in the user events.  Now it
is needed to give the modified (POLLINIGNEOF) semantics from the kernel
flag for fifos.  It is much uglier than the corresponding code in the
old fifo_poll_f().  That begins with putting the relevant user events
in levents.  So that it doesn't have to repeat the long mask expressions.
Well, that's about the limits of the cleanups.  Something like the
above is still needed to give the semantics change.

The socket layer still has code that corresponds exactly to the above.
It is now not needed, since it now only supports the POLLINIGNEOF
mistake in the user events.  One copy of this code is bad enough.

Bruce