From owner-freebsd-current@FreeBSD.ORG Wed Sep 3 01:40:51 2008 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id F1680106755A for ; Wed, 3 Sep 2008 01:40:50 +0000 (UTC) (envelope-from peter@wemm.org) Received: from rv-out-0506.google.com (rv-out-0506.google.com [209.85.198.229]) by mx1.freebsd.org (Postfix) with ESMTP id CFE208FC16 for ; Wed, 3 Sep 2008 01:40:50 +0000 (UTC) (envelope-from peter@wemm.org) Received: by rv-out-0506.google.com with SMTP id b25so3568537rvf.43 for ; Tue, 02 Sep 2008 18:40:50 -0700 (PDT) Received: by 10.142.255.14 with SMTP id c14mr2791445wfi.296.1220406049982; Tue, 02 Sep 2008 18:40:49 -0700 (PDT) Received: by 10.142.76.14 with HTTP; Tue, 2 Sep 2008 18:40:49 -0700 (PDT) Message-ID: Date: Tue, 2 Sep 2008 18:40:49 -0700 From: "Peter Wemm" To: "John Baldwin" In-Reply-To: <200809021608.57542.jhb@freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <200808230003.44081.jhb@freebsd.org> <48B6BC81.5060300@clearchain.com> <20080901.013117.74700691.Tor.Egge@cvsup.no.freebsd.org> <200809021608.57542.jhb@freebsd.org> Cc: Benjamin.Close@clearchain.com, attilio@freebsd.org, freebsd-current@freebsd.org, kib@freebsd.org, kevinxlinuz@163.com Subject: Re: [BUG] I think sleepqueue need to be protected in sleepq_broadcast X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Sep 2008 01:40:51 -0000 On Tue, Sep 2, 2008 at 1:08 PM, John Baldwin wrote: > On Sunday 31 August 2008 09:31:17 pm Tor Egge wrote: >> >> sleepq_resume_thread() contains an ownership handover of sq if the resumed >> thread is the last one blocked on the wait channel. After the handover, sq > is >> no longer protected by the sleep queue chain lock and should no longer be >> accessed by sleepq_broadcast(). >> >> Normally, when sleepq_broadcast() incorrectly accesses sq after the > handover, >> it will find the sq->sq_blocked queue to be empty, and the code appears to >> work. >> >> If the last correctly woken thread manages to go to sleep again very quickly > on >> another wait channel, sleepq_broadcast() might incorrectly determine that > the >> sq->sq_blocked queue isn't empty, and start doing the wrong thing. > > So disregard my earlier e-mail. Here is a simple fix for the sleepq case: > > Index: subr_sleepqueue.c > =================================================================== > --- subr_sleepqueue.c (revision 182679) > +++ subr_sleepqueue.c (working copy) > @@ -779,7 +779,7 @@ > sleepq_broadcast(void *wchan, int flags, int pri, int queue) > { > struct sleepqueue *sq; > - struct thread *td; > + struct thread *td, *tdn; > int wakeup_swapper; > > CTR2(KTR_PROC, "sleepq_broadcast(%p, %d)", wchan, flags); > @@ -793,8 +793,7 @@ > > /* Resume all blocked threads on the sleep queue. */ > wakeup_swapper = 0; > - while (!TAILQ_EMPTY(&sq->sq_blocked[queue])) { > - td = TAILQ_FIRST(&sq->sq_blocked[queue]); > + TAILQ_FOREACH_SAFE(td, &sq->sq_blocked[queue], td_slpq, tdn) { > thread_lock(td); > if (sleepq_resume_thread(sq, td, pri)) > wakeup_swapper = 1; > > This only uses 'sq' to fetch the head of the queue once up front. It won't > use it again once it has started waking up threads. I don't know if it is the same problem, but mx2.freebsd.org, running today's 6.4-PRERELEASE just died with: Sep 3 00:20:11 mx2 sshd[15333]: fatal: Read from socket failed: Connection resr panic: Assertion td->td_flags & TDF_SINTR failed at ../../../kern/subr_sleepque5 cpuid = 2 KDB: enter: panic FreeBSD 6.4-PRERELEASE #7: Tue Sep 2 19:43:27 UTC 2008 This was after about 3 hours of uptime. It has previously run happily for months at a time before today's rebuild. -- Peter Wemm - peter@wemm.org; peter@FreeBSD.org; peter@yahoo-inc.com; KI6FJV "All of this is for nothing if we don't go to the stars" - JMS/B5 "If Java had true garbage collection, most programs would delete themselves upon execution." -- Robert Sewell