From owner-freebsd-current@FreeBSD.ORG Mon Sep 8 20:13:05 2008 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6B5E0106566B; Mon, 8 Sep 2008 20:13:05 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from server.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net [IPv6:2001:470:1f10:75::2]) by mx1.freebsd.org (Postfix) with ESMTP id 127C48FC1A; Mon, 8 Sep 2008 20:13:04 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from zion.baldwin.cx (zion.baldwin.cx [IPv6:2001:470:1f11:75:2a0:d2ff:fe18:8b38]) (authenticated bits=0) by server.baldwin.cx (8.14.2/8.14.2) with ESMTP id m88KBvuJ095950; Mon, 8 Sep 2008 16:12:37 -0400 (EDT) (envelope-from jhb@freebsd.org) From: John Baldwin To: "Peter Wemm" Date: Mon, 8 Sep 2008 15:56:02 -0400 User-Agent: KMail/1.9.7 References: <200808230003.44081.jhb@freebsd.org> <200809021608.57542.jhb@freebsd.org> In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200809081556.02732.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by milter-greylist-2.0.2 (server.baldwin.cx [IPv6:2001:470:1f11:75::1]); Mon, 08 Sep 2008 16:12:40 -0400 (EDT) X-Virus-Scanned: ClamAV 0.93.1/8162/Thu Sep 4 12:38:45 2008 on server.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-2.6 required=4.2 tests=AWL,BAYES_00,NO_RELAYS autolearn=ham version=3.1.3 X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on server.baldwin.cx Cc: Benjamin.Close@clearchain.com, attilio@freebsd.org, freebsd-current@freebsd.org, kib@freebsd.org, kevinxlinuz@163.com Subject: Re: [BUG] I think sleepqueue need to be protected in sleepq_broadcast X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 08 Sep 2008 20:13:05 -0000 On Tuesday 02 September 2008 09:40:49 pm Peter Wemm wrote: > On Tue, Sep 2, 2008 at 1:08 PM, John Baldwin wrote: > > On Sunday 31 August 2008 09:31:17 pm Tor Egge wrote: > >> sleepq_resume_thread() contains an ownership handover of sq if the > >> resumed thread is the last one blocked on the wait channel. After the > >> handover, sq > > > > is > > > >> no longer protected by the sleep queue chain lock and should no longer > >> be accessed by sleepq_broadcast(). > >> > >> Normally, when sleepq_broadcast() incorrectly accesses sq after the > > > > handover, > > > >> it will find the sq->sq_blocked queue to be empty, and the code appears > >> to work. > >> > >> If the last correctly woken thread manages to go to sleep again very > >> quickly > > > > on > > > >> another wait channel, sleepq_broadcast() might incorrectly determine > >> that > > > > the > > > >> sq->sq_blocked queue isn't empty, and start doing the wrong thing. > > > > So disregard my earlier e-mail. Here is a simple fix for the sleepq > > case: > > > > Index: subr_sleepqueue.c > > =================================================================== > > --- subr_sleepqueue.c (revision 182679) > > +++ subr_sleepqueue.c (working copy) > > @@ -779,7 +779,7 @@ > > sleepq_broadcast(void *wchan, int flags, int pri, int queue) > > { > > struct sleepqueue *sq; > > - struct thread *td; > > + struct thread *td, *tdn; > > int wakeup_swapper; > > > > CTR2(KTR_PROC, "sleepq_broadcast(%p, %d)", wchan, flags); > > @@ -793,8 +793,7 @@ > > > > /* Resume all blocked threads on the sleep queue. */ > > wakeup_swapper = 0; > > - while (!TAILQ_EMPTY(&sq->sq_blocked[queue])) { > > - td = TAILQ_FIRST(&sq->sq_blocked[queue]); > > + TAILQ_FOREACH_SAFE(td, &sq->sq_blocked[queue], td_slpq, tdn) { > > thread_lock(td); > > if (sleepq_resume_thread(sq, td, pri)) > > wakeup_swapper = 1; > > > > This only uses 'sq' to fetch the head of the queue once up front. It > > won't use it again once it has started waking up threads. > > I don't know if it is the same problem, but mx2.freebsd.org, running > today's 6.4-PRERELEASE just died with: > Sep 3 00:20:11 mx2 sshd[15333]: fatal: Read from socket failed: Connection > resr panic: Assertion td->td_flags & TDF_SINTR failed at > ../../../kern/subr_sleepque5 cpuid = 2 > KDB: enter: panic > FreeBSD 6.4-PRERELEASE #7: Tue Sep 2 19:43:27 UTC 2008 > This was after about 3 hours of uptime. It has previously run happily > for months at a time before today's rebuild. So I think what happened is that the thread was woken up while the sleepq chain was unlocked while the thread unlocks the sx lock. The code handles this fine already since the same race can happen when dropping the lock while checking for signals. However, in this case TDF_SINTR won't be true anymore. The assertion just needs to be updated. Try this: Index: subr_sleepqueue.c =================================================================== --- subr_sleepqueue.c (revision 182874) +++ subr_sleepqueue.c (working copy) @@ -382,7 +382,7 @@ CTR3(KTR_PROC, "sleepq catching signals: thread %p (pid %ld, %s)", (void *)td, (long)p->p_pid, p->p_comm); - MPASS(td->td_flags & TDF_SINTR); + MPASS((td->td_sleepqueue != NULL) ^ (td->td_flags & TDF_SINTR)); mtx_unlock_spin(&sc->sc_lock); /* See if there are any pending signals for this thread. */ -- John Baldwin