From owner-cvs-all Fri Jan 3 3:21:38 2003 Delivered-To: cvs-all@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id B308237B480; Fri, 3 Jan 2003 03:21:33 -0800 (PST) Received: from mailman.zeta.org.au (mailman.zeta.org.au [203.26.10.16]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6A8B543E4A; Fri, 3 Jan 2003 03:21:32 -0800 (PST) (envelope-from bde@zeta.org.au) Received: from katana.zip.com.au (katana.zip.com.au [61.8.7.246]) by mailman.zeta.org.au (8.9.3/8.8.7) with ESMTP id WAA21041; Fri, 3 Jan 2003 22:21:21 +1100 Date: Fri, 3 Jan 2003 22:21:31 +1100 (EST) From: Bruce Evans X-X-Sender: bde@gamplex.bde.org To: Alfred Perlstein Cc: Poul-Henning Kamp , , Subject: Re: cvs commit: src/sys/fs/fifofs fifo_vnops.c In-Reply-To: <20030102204913.GM26140@elvis.mu.org> Message-ID: <20030103213903.I2945-100000@gamplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-cvs-all@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Thu, 2 Jan 2003, Alfred Perlstein wrote: > 216 if (ap->a_mode & FWRITE) { > 217 fip->fi_writers++; > 218 if (fip->fi_writers == 1) { > 219 fip->fi_readsock->so_state &= ~SS_CANTRCVMORE; > 220 if (fip->fi_readers > 0) { > 221 wakeup((caddr_t)&fip->fi_readers); > 222 sorwakeup(fip->fi_writesock); > 223 } > 224 } > 225 } > > This this code is confusing because we have a producer consumer > relationship between readers and writers. I think that the problem > is that in this section of code the wakeup should be on fi_writers. Certainly not. There is only 1 writer (ourself), and we must wake up the readers. The bug may in sendmail, or a race with readers coming and going. It can probably be localized by fixing the error handling in the timeout: % while (fip->fi_readers == 0) { % VOP_UNLOCK(vp, 0, td); % /* % * XXX: Some race I havn't located is solved % * by timing out after a sec. Race seen when % * sendmail hangs here during boot /phk % */ % error = tsleep((caddr_t)&fip->fi_writers, % PCATCH | PSOCK, "fifoow", hz); % vn_lock(vp, LK_EXCLUSIVE | LK_RETRY, td); % if (error) % goto bad; % } This should handle the case of a timeout specially. tlseep() returns EWOULDBLOCK for timeouts. If we have a timeout then we should continue so as to check for the loop condition (fip->fi_readers == 0). However, this makes a race obvious: if a reader comes and goes while the writer is in the above tsleep(), then the writer will never see the reader -- it will wake up normally and then go back to sleep, except with the timeout it will wake up abnormally 1 second later and return the bogus result EWOULDBLOCK (== EAGAIN, which can't happen for blocking writes). If there is no reader within the first second, then the timeout just breaks blocking writes to fifos. I think the loop condition needs to involve a has-been-a-reader-since- this-write-started flag, or the loop simply shouldn't exist (provided the wakeup only occurs if there has been a reader or a signal; then we know that we had a reader if tsleep() returns 0). SS_CANTSENDMORE is a (negative logic) has-been-a-reader flag, but IIRC it is perfectly unsuitable here because it must be cleared (negatively) on close so that new writers aren't affected by previous readers. Bruce To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe cvs-all" in the body of the message