Date: Fri, 3 Jan 2003 22:21:31 +1100 (EST) From: Bruce Evans <bde@zeta.org.au> To: Alfred Perlstein <bright@mu.org> Cc: Poul-Henning Kamp <phk@FreeBSD.org>, <cvs-committers@FreeBSD.org>, <cvs-all@FreeBSD.org> Subject: Re: cvs commit: src/sys/fs/fifofs fifo_vnops.c Message-ID: <20030103213903.I2945-100000@gamplex.bde.org> In-Reply-To: <20030102204913.GM26140@elvis.mu.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 2 Jan 2003, Alfred Perlstein wrote:
> 216 if (ap->a_mode & FWRITE) {
> 217 fip->fi_writers++;
> 218 if (fip->fi_writers == 1) {
> 219 fip->fi_readsock->so_state &= ~SS_CANTRCVMORE;
> 220 if (fip->fi_readers > 0) {
> 221 wakeup((caddr_t)&fip->fi_readers);
> 222 sorwakeup(fip->fi_writesock);
> 223 }
> 224 }
> 225 }
>
> This this code is confusing because we have a producer consumer
> relationship between readers and writers. I think that the problem
> is that in this section of code the wakeup should be on fi_writers.
Certainly not. There is only 1 writer (ourself), and we must wake up
the readers.
The bug may in sendmail, or a race with readers coming and going. It
can probably be localized by fixing the error handling in the timeout:
% while (fip->fi_readers == 0) {
% VOP_UNLOCK(vp, 0, td);
% /*
% * XXX: Some race I havn't located is solved
% * by timing out after a sec. Race seen when
% * sendmail hangs here during boot /phk
% */
% error = tsleep((caddr_t)&fip->fi_writers,
% PCATCH | PSOCK, "fifoow", hz);
% vn_lock(vp, LK_EXCLUSIVE | LK_RETRY, td);
% if (error)
% goto bad;
% }
This should handle the case of a timeout specially. tlseep() returns
EWOULDBLOCK for timeouts. If we have a timeout then we should continue
so as to check for the loop condition (fip->fi_readers == 0). However,
this makes a race obvious: if a reader comes and goes while the writer
is in the above tsleep(), then the writer will never see the reader --
it will wake up normally and then go back to sleep, except with the
timeout it will wake up abnormally 1 second later and return the bogus
result EWOULDBLOCK (== EAGAIN, which can't happen for blocking writes).
If there is no reader within the first second, then the timeout just
breaks blocking writes to fifos.
I think the loop condition needs to involve a has-been-a-reader-since-
this-write-started flag, or the loop simply shouldn't exist (provided
the wakeup only occurs if there has been a reader or a signal; then
we know that we had a reader if tsleep() returns 0). SS_CANTSENDMORE
is a (negative logic) has-been-a-reader flag, but IIRC it is perfectly
unsuitable here because it must be cleared (negatively) on close so
that new writers aren't affected by previous readers.
Bruce
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe cvs-all" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20030103213903.I2945-100000>
