Date: Fri, 3 Jan 2003 22:21:31 +1100 (EST) From: Bruce Evans <bde@zeta.org.au> To: Alfred Perlstein <bright@mu.org> Cc: Poul-Henning Kamp <phk@FreeBSD.org>, <cvs-committers@FreeBSD.org>, <cvs-all@FreeBSD.org> Subject: Re: cvs commit: src/sys/fs/fifofs fifo_vnops.c Message-ID: <20030103213903.I2945-100000@gamplex.bde.org> In-Reply-To: <20030102204913.GM26140@elvis.mu.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 2 Jan 2003, Alfred Perlstein wrote: > 216 if (ap->a_mode & FWRITE) { > 217 fip->fi_writers++; > 218 if (fip->fi_writers == 1) { > 219 fip->fi_readsock->so_state &= ~SS_CANTRCVMORE; > 220 if (fip->fi_readers > 0) { > 221 wakeup((caddr_t)&fip->fi_readers); > 222 sorwakeup(fip->fi_writesock); > 223 } > 224 } > 225 } > > This this code is confusing because we have a producer consumer > relationship between readers and writers. I think that the problem > is that in this section of code the wakeup should be on fi_writers. Certainly not. There is only 1 writer (ourself), and we must wake up the readers. The bug may in sendmail, or a race with readers coming and going. It can probably be localized by fixing the error handling in the timeout: % while (fip->fi_readers == 0) { % VOP_UNLOCK(vp, 0, td); % /* % * XXX: Some race I havn't located is solved % * by timing out after a sec. Race seen when % * sendmail hangs here during boot /phk % */ % error = tsleep((caddr_t)&fip->fi_writers, % PCATCH | PSOCK, "fifoow", hz); % vn_lock(vp, LK_EXCLUSIVE | LK_RETRY, td); % if (error) % goto bad; % } This should handle the case of a timeout specially. tlseep() returns EWOULDBLOCK for timeouts. If we have a timeout then we should continue so as to check for the loop condition (fip->fi_readers == 0). However, this makes a race obvious: if a reader comes and goes while the writer is in the above tsleep(), then the writer will never see the reader -- it will wake up normally and then go back to sleep, except with the timeout it will wake up abnormally 1 second later and return the bogus result EWOULDBLOCK (== EAGAIN, which can't happen for blocking writes). If there is no reader within the first second, then the timeout just breaks blocking writes to fifos. I think the loop condition needs to involve a has-been-a-reader-since- this-write-started flag, or the loop simply shouldn't exist (provided the wakeup only occurs if there has been a reader or a signal; then we know that we had a reader if tsleep() returns 0). SS_CANTSENDMORE is a (negative logic) has-been-a-reader flag, but IIRC it is perfectly unsuitable here because it must be cleared (negatively) on close so that new writers aren't affected by previous readers. Bruce To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe cvs-all" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20030103213903.I2945-100000>