From owner-freebsd-bugs@FreeBSD.ORG Mon Oct 13 10:57:58 2003 Return-Path: Delivered-To: freebsd-bugs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 90C7116A4BF; Mon, 13 Oct 2003 10:57:58 -0700 (PDT) Received: from mailman.zeta.org.au (mailman.zeta.org.au [203.26.10.16]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7203543FBD; Mon, 13 Oct 2003 10:57:52 -0700 (PDT) (envelope-from bde@zeta.org.au) Received: from gamplex.bde.org (katana.zip.com.au [61.8.7.246]) by mailman.zeta.org.au (8.9.3p2/8.8.7) with ESMTP id DAA07663; Tue, 14 Oct 2003 03:57:40 +1000 Date: Tue, 14 Oct 2003 03:56:19 +1000 (EST) From: Bruce Evans X-X-Sender: bde@gamplex.bde.org To: Stefan Farfeleder In-Reply-To: <20031013101556.7FFCA482@frog.fafoe.narf.at> Message-ID: <20031014031151.V32155@gamplex.bde.org> References: <20031013101556.7FFCA482@frog.fafoe.narf.at> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: freebsd-bugs@freebsd.org cc: FreeBSD-gnats-submit@freebsd.org Subject: Re: kern/57945: [patch] Add locking to kqueue to make it MP-safe X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 13 Oct 2003 17:57:58 -0000 On Mon, 13 Oct 2003, Stefan Farfeleder wrote: > >Description: > The current kqueue implementation does not seem to be MP-safe. The > kqueue facility does not having its own locks, I believe it was > intended to rely on Giant. As Giant gets locked down, kqueue was > apparently forgotten. The KNOTE() macro is spread over the whole > kernel, sometimes called with Giant hold, sometimes not. The function > knote_enqueue() (called via knote() and KNOTE_ACTIVATE()) inserts a node > into a TAILQ, this operation isn't atomic. If another processor > executes KNOTE() or kevent() at the same time, the queue might get > corrupted. > ... > >Fix: > I'm using patches similar to this one since about a year, the > kqueue-enabled make(1) runs without problems with it. Unfortunately the > mutex is (I'm not sure it has to be) acquired and released rather often > in the loop in kqueue_scan() to avoid deadlocks due to LORs. Maybe a > spin lock would increase the performance? (I didn't do any benchmarks.) The small critical sections seem to be mostly required. Spin mutexes are actually less efficient, since in the usual uncontested case they do the same things plus extra calls to critical_enter() and critical exit. > --- kqueue_lock.diff begins here --- > Index: src/sys/kern/kern_event.c > =================================================================== > RCS file: /usr/home/ncvs/src/sys/kern/kern_event.c,v > retrieving revision 1.60 > diff -u -r1.60 kern_event.c > --- src/sys/kern/kern_event.c 18 Jun 2003 18:16:39 -0000 1.60 > +++ src/sys/kern/kern_event.c 13 Oct 2003 00:35:42 -0000 > ... > @@ -704,15 +705,16 @@ > > start: > kevp = kq->kq_kev; > - s = splhigh(); > + mtx_lock(&kq->kq_mtx); > if (kq->kq_count == 0) { > if (timeout < 0) { > error = EWOULDBLOCK; > + mtx_unlock(&kq->kq_mtx); > } else { > kq->kq_state |= KQ_SLEEP; > - error = tsleep(kq, PSOCK | PCATCH, "kqread", timeout); > + error = msleep(kq, &kq->kq_mtx, PSOCK | PCATCH | PDROP, > + "kqread", timeout); > } > - splx(s); Not copying the spl locking seems to give a bug here. msleep() returns with the mutex held. > if (error == 0) > goto retry; > /* don't restart after signals... */ > @@ -728,20 +730,22 @@ > kn = TAILQ_FIRST(&kq->kq_head); > TAILQ_REMOVE(&kq->kq_head, kn, kn_tqe); > if (kn == &marker) { > - splx(s); > + mtx_unlock(&kq->kq_mtx); > if (count == maxevents) > goto retry; > goto done; > } > + kq->kq_count--; > + mtx_unlock(&kq->kq_mtx); > if (kn->kn_status & KN_DISABLED) { > kn->kn_status &= ~KN_QUEUED; > - kq->kq_count--; > + mtx_lock(&kq->kq_mtx); > continue; > } I don't see any reason to unlock/lock here. There were no splx()/splhigh()'s. splhigh() is very global but kq_mtx is very local so keeping the critical sections short should be much less needed than before. > if ((kn->kn_flags & EV_ONESHOT) == 0 && > kn->kn_fop->f_event(kn, 0) == 0) { > kn->kn_status &= ~(KN_QUEUED | KN_ACTIVE); > - kq->kq_count--; > + mtx_lock(&kq->kq_mtx); > continue; > } > *kevp = kn->kn_kevent; Here dropping the mutex seems to be be necessary for calling the function, but the function call was locked by splhigh() before. BTW, what locks kn? Bruce