From owner-freebsd-arch@FreeBSD.ORG Tue Jan 10 12:04:27 2012 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6C8A4106566B; Tue, 10 Jan 2012 12:04:27 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail01.syd.optusnet.com.au (mail01.syd.optusnet.com.au [211.29.132.182]) by mx1.freebsd.org (Postfix) with ESMTP id BC8958FC08; Tue, 10 Jan 2012 12:04:25 +0000 (UTC) Received: from c211-30-171-136.carlnfd1.nsw.optusnet.com.au (c211-30-171-136.carlnfd1.nsw.optusnet.com.au [211.30.171.136]) by mail01.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id q0AC4Img016496 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 10 Jan 2012 23:04:19 +1100 Date: Tue, 10 Jan 2012 23:04:18 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Bruce Evans In-Reply-To: <20120110153807.H943@besplex.bde.org> Message-ID: <20120110211510.T1676@besplex.bde.org> References: <20120110005155.S2378@besplex.bde.org> <20120110153807.H943@besplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: flo@freebsd.org, Giovanni Trematerra , Attilio Rao , Konstantin Belousov , freebsd-arch@freebsd.org, jilles@freebsd.org Subject: Re: pipe/fifo code merged. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 10 Jan 2012 12:04:27 -0000 On Tue, 10 Jan 2012, Bruce Evans wrote: > I think you don't want me to read the patch, since I would see too much > detail starting with style bugs. Anyway.. > ... One more set of details. % -static int % -fifo_poll_f(struct file *fp, int events, struct ucred *cred, struct thread *td) % -{ % - struct fifoinfo *fip; % - struct file filetmp; % - int levents, revents = 0; % - % - fip = fp->f_data; % - levents = events & % - (POLLIN | POLLINIGNEOF | POLLPRI | POLLRDNORM | POLLRDBAND); % - if ((fp->f_flag & FREAD) && levents) { % - filetmp.f_data = fip->fi_readsock; % - filetmp.f_cred = cred; % - mtx_lock(&fifo_mtx); % - if (fp->f_seqcount == fip->fi_wgen) % - levents |= POLLINIGNEOF; % - mtx_unlock(&fifo_mtx); % - revents |= soo_poll(&filetmp, levents, cred, td); % - } % - levents = events & (POLLOUT | POLLWRNORM | POLLWRBAND); % - if ((fp->f_flag & FWRITE) && levents) { % - filetmp.f_data = fip->fi_writesock; % - filetmp.f_cred = cred; % - revents |= soo_poll(&filetmp, levents, cred, td); % - } % - return (revents); % -} This was reasonably clean. My version is cleaner: - POLLIGNEOF is an old mistake of mine. I tried to kill it, but kib@ propagated it to sys_pipe.c too, where it has survived another release or two. In my version, I still have it in the call to soo_poll() but don't have it in the `levents = events & ...' mask. Thus it is a pure kernel flag, and acts the same as your isfifo flag -- it tells the socket layer to do something unusual because this is a fifo. It is not needed any more, since the pipe layer is close to the fifo layer so it can just do something unusual. It can determine whether the pipe is a fifo without passing around flags (the flag should be in pipe_state). - My version is missing the FREAD and FWRITE checks. These seem to be necessary, but I think they don't belong at this level. Also, the error handling for them seems quite broken (nonexistent). I think POLLERR is supposed to be returned for attempts to poll for an impossible condition, but the FREAD and FWRITE checks give a return of 0. And returning 0 is much worse than returning success, since it will cause at least poll() to block() when it should return, Here is the commit that added these checks: % ---------------------------- % revision 1.118 % date: 2005/09/12 10:16:18; author: rwatson; state: Exp; lines: +2 -2 % Only poll the fifo for read events if the fifo is attached to a readable % file descriptor. Otherwise, the read end of a fifo might return that it % is writable (which it isn't). But it should return (with POLLERR). This is an error condition and should be detected. POSIX is fuzzy about this. It only says that POLLERR is for when an error occurred. It defines the POLLNVAL error clearly as meaning that the fd is invalid. Well that is not so clear. A non-open fd is clearly invalid. This is handled in upper layers. Polling for a direction that can't work can be considered as an invalid fd too, unless "invalid" has its technical meaning. Linux-2.6.10 sets POLLERR for reading from a pipe or fifo with no readers, and has an XXX comment saying that most Unices don't do this for fifos. This seems wrong to me, and FreeBSD doesn't do it for any of pipes, fifos or sockets. But for pipes, there is tricky EOF handling associated with this condition. I can't see anywhere where Linux gives this based on the open mode. % % Only poll the fifo for write events if the fifo attached to a writable % file descriptor. Otherwise, the write end of a fifo might return that % it is readable (which it isn't). Seems to be necessary too. I can't see anywhere where Linux returns POLLERR for i/o errors or unwritable files. % % In the event that a file is FREAD|FWRITE (which is allowed by POSIX, but % has undefined behavior), we poll for both. % % MFC after: 3 days % ---------------------------- select() is interestingly different than poll(). It can't return POLLERR. Thus, the old broken behaviour gave the best close to possible behaviour for select() at the usual level. The POLLERR's should make it return success, and the false successes in the kernel would have done the same. Only cases where there were no false successes in the kernel were broken. % @@ -1326,58 +1549,66 @@ pipe_poll(fp, events, active_cred, td) % struct ucred *active_cred; % struct thread *td; % { % - struct pipe *rpipe = fp->f_data; % + struct pipeinfo *pip = fp->f_data; % + struct pipe *rpipe; % struct pipe *wpipe; % int revents = 0; % #ifdef MAC % int error; % #endif % % - wpipe = rpipe->pipe_peer; % + rpipe = pip->pi_rpipe; % + wpipe = pip->pi_wpipe->pipe_peer; % PIPE_LOCK(rpipe); % #ifdef MAC % error = mac_pipe_check_poll(active_cred, rpipe->pipe_pair); % if (error) % - goto locked_error; % + return (0); Seems to be broken. The unlock is now missing. I don't like defaults set by initializations in declarations 'revents = 0'. Both the default and the return of 0 here seem to be wrong. This is an error condition, so I think POLLERR should be returned, as about. Otherwise, poll() will probably block. And the block is not just transient, at least in the above since the error condition can never go away. You will only be saved from blocking forever if there is success on some other file descriptor or event. % #endif % - if (events & (POLLIN | POLLRDNORM)) % - if ((rpipe->pipe_state & PIPE_DIRECTW) || % - (rpipe->pipe_buffer.cnt > 0)) % - revents |= events & (POLLIN | POLLRDNORM); % + if (fp->f_flag & FREAD) { % + if (events & (POLLIN | POLLRDNORM)) % + if ((rpipe->pipe_state & PIPE_DIRECTW) || % + (rpipe->pipe_buffer.cnt > 0)) % + revents |= events & (POLLIN | POLLRDNORM); The change in fifos_vnops.c was done cleanly by adding the FREAD check to the events mask check. With fifos now polled here, it is needed (modulo bugs) here too. But here it makes the important changes for fifos, if any, unreadable by indenting everything. % - if (events & (POLLOUT | POLLWRNORM)) % - if (wpipe->pipe_present != PIPE_ACTIVE || % - (wpipe->pipe_state & PIPE_EOF) || % - (((wpipe->pipe_state & PIPE_DIRECTW) == 0) && % - ((wpipe->pipe_buffer.size - wpipe->pipe_buffer.cnt) >= PIPE_BUF || % - wpipe->pipe_buffer.size == 0))) % - revents |= events & (POLLOUT | POLLWRNORM); % + PIPE_UNLOCK(rpipe); % + if (fifo_iseof(fp)) % + events |= POLLINIGNEOF; % + PIPE_LOCK(rpipe); % % - if ((events & POLLINIGNEOF) == 0) { % - if (rpipe->pipe_state & PIPE_EOF) { % - revents |= (events & (POLLIN | POLLRDNORM)); % - if (wpipe->pipe_present != PIPE_ACTIVE || % - (wpipe->pipe_state & PIPE_EOF)) % - revents |= POLLHUP; % + if ((events & POLLINIGNEOF) == 0) { % + if (rpipe->pipe_state & PIPE_EOF) { % + revents |= (events & (POLLIN | POLLRDNORM)); % + if (wpipe->pipe_present != PIPE_ACTIVE || % + (wpipe->pipe_state & PIPE_EOF)) % + revents |= POLLHUP; % + } % } % } % + if (fp->f_flag & FWRITE) % + if (events & (POLLOUT | POLLWRNORM)) % + if (wpipe->pipe_present != PIPE_ACTIVE || % + (wpipe->pipe_state & PIPE_EOF) || % + (((wpipe->pipe_state & PIPE_DIRECTW) == 0) && % + ((wpipe->pipe_buffer.size - wpipe->pipe_buffer.cnt) >= % + PIPE_BUF || wpipe->pipe_buffer.size == 0))) % + revents |= events & (POLLOUT | POLLWRNORM); % % if (revents == 0) { % - if (events & (POLLIN | POLLRDNORM)) { % - selrecord(td, &rpipe->pipe_sel); % - if (SEL_WAITING(&rpipe->pipe_sel)) % - rpipe->pipe_state |= PIPE_SEL; % - } % + if (fp->f_flag & FREAD) % + if (events & (POLLIN | POLLRDNORM)) { % + selrecord(td, &rpipe->pipe_sel); % + if (SEL_WAITING(&rpipe->pipe_sel)) % + rpipe->pipe_state |= PIPE_SEL; % + } % % - if (events & (POLLOUT | POLLWRNORM)) { % - selrecord(td, &wpipe->pipe_sel); % - if (SEL_WAITING(&wpipe->pipe_sel)) % - wpipe->pipe_state |= PIPE_SEL; % - } % + if (fp->f_flag & FWRITE) % + if (events & (POLLOUT | POLLWRNORM)) { % + selrecord(td, &wpipe->pipe_sel); % + if (SEL_WAITING(&wpipe->pipe_sel)) % + wpipe->pipe_state |= PIPE_SEL; % + } % } % -#ifdef MAC % -locked_error: % -#endif % PIPE_UNLOCK(rpipe); % % return (revents); It seems that not much really changed here. To avoid indentation and fix bugs, the FREAD and FWRITE checks should be done up front. I think they can be done before locking and mac checking. Something like: if ((fp->f_flag & FREAD) && (events & (POLLIN | POLLRDNORM)) return (POLLERR); if ((fp->f_flag & FWRITE) && (events & (POLLOUT | POLLWRNORM)) return (POLLERR); if (events & POLLINIGNEOF) return (POLLER); /* try to kill this too */ Since the diff for pipe_poll() was unreadable, here it is again with the old lines removed. A few more problems are now obvious: % @@ -1326,58 +1549,66 @@ pipe_poll(fp, events, active_cred, td) % struct ucred *active_cred; % struct thread *td; % { % + struct pipeinfo *pip = fp->f_data; % + struct pipe *rpipe; % struct pipe *wpipe; % int revents = 0; % #ifdef MAC % int error; % #endif % % + rpipe = pip->pi_rpipe; % + wpipe = pip->pi_wpipe->pipe_peer; % PIPE_LOCK(rpipe); % #ifdef MAC % error = mac_pipe_check_poll(active_cred, rpipe->pipe_pair); % if (error) % + return (0); % #endif % + if (fp->f_flag & FREAD) { % + if (events & (POLLIN | POLLRDNORM)) % + if ((rpipe->pipe_state & PIPE_DIRECTW) || % + (rpipe->pipe_buffer.cnt > 0)) % + revents |= events & (POLLIN | POLLRDNORM); % This style bug (extra blank line) was common in old code. It helps make the diffs unreadable too. % + PIPE_UNLOCK(rpipe); % + if (fifo_iseof(fp)) % + events |= POLLINIGNEOF; % + PIPE_LOCK(rpipe); This is new code (needed to force POLLIGNEOF for fifos). It is a layering violation to call the fifo code for non-fifos here. fifo_iseof() handles this internally by checking fp->vnode->v_fifoinfo. The pipe layer should know if it is dealing with a fifo in a better way than that. I don't like unlocking in the middle in general, and here it gives races. We will miss setting POLLIN | POLLRDNORM for certain changes if they weren't set earlier and the state changed while unlocked. Why unlock anyway or lock in fifo_iseof()? Only fi_seqcount == fi_wgen is checked under the lock there. Races in that check are probably just as harmless as races here. And locking doesn't even prevent them, since if fi_seqcount or fi_wgen can change underneath us, they can also change just after we check them. They rarely change compared with the buffer count raced with above. % % + if ((events & POLLINIGNEOF) == 0) { % + if (rpipe->pipe_state & PIPE_EOF) { % + revents |= (events & (POLLIN | POLLRDNORM)); % + if (wpipe->pipe_present != PIPE_ACTIVE || % + (wpipe->pipe_state & PIPE_EOF)) % + revents |= POLLHUP; % + } This is old code, reindented. It was not needed, since it used to just check for the POLLINIGNEOF mistake in the user events. Now it is needed to give the modified (POLLINIGNEOF) semantics from the kernel flag for fifos. It is much uglier than the corresponding code in the old fifo_poll_f(). That begins with putting the relevant user events in levents. So that it doesn't have to repeat the long mask expressions. Well, that's about the limits of the cleanups. Something like the above is still needed to give the semantics change. The socket layer still has code that corresponds exactly to the above. It is now not needed, since it now only supports the POLLINIGNEOF mistake in the user events. One copy of this code is bad enough. Bruce