From owner-freebsd-current@FreeBSD.ORG Tue Sep 2 20:09:32 2008 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1B6CC1065930; Tue, 2 Sep 2008 20:09:32 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from server.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net [IPv6:2001:470:1f10:75::2]) by mx1.freebsd.org (Postfix) with ESMTP id A0B7A8FC12; Tue, 2 Sep 2008 20:09:31 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from localhost.corp.yahoo.com (john@localhost [IPv6:::1]) (authenticated bits=0) by server.baldwin.cx (8.14.2/8.14.2) with ESMTP id m82K92uO017934; Tue, 2 Sep 2008 16:09:09 -0400 (EDT) (envelope-from jhb@freebsd.org) From: John Baldwin To: Tor Egge Date: Tue, 2 Sep 2008 16:08:56 -0400 User-Agent: KMail/1.9.7 References: <200808230003.44081.jhb@freebsd.org> <48B6BC81.5060300@clearchain.com> <20080901.013117.74700691.Tor.Egge@cvsup.no.freebsd.org> In-Reply-To: <20080901.013117.74700691.Tor.Egge@cvsup.no.freebsd.org> MIME-Version: 1.0 Content-Disposition: inline Message-Id: <200809021608.57542.jhb@freebsd.org> Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by milter-greylist-2.0.2 (server.baldwin.cx [IPv6:::1]); Tue, 02 Sep 2008 16:09:10 -0400 (EDT) X-Virus-Scanned: ClamAV 0.93.1/8141/Tue Sep 2 11:52:21 2008 on server.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-2.6 required=4.2 tests=AWL,BAYES_00,NO_RELAYS autolearn=ham version=3.1.3 X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on server.baldwin.cx Cc: attilio@freebsd.org, kevinxlinuz@163.com, freebsd-current@freebsd.org, Benjamin.Close@clearchain.com, kib@freebsd.org Subject: Re: [BUG] I think sleepqueue need to be protected in sleepq_broadcast X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 02 Sep 2008 20:09:32 -0000 On Sunday 31 August 2008 09:31:17 pm Tor Egge wrote: > > sleepq_resume_thread() contains an ownership handover of sq if the resumed > thread is the last one blocked on the wait channel. After the handover, sq is > no longer protected by the sleep queue chain lock and should no longer be > accessed by sleepq_broadcast(). > > Normally, when sleepq_broadcast() incorrectly accesses sq after the handover, > it will find the sq->sq_blocked queue to be empty, and the code appears to > work. > > If the last correctly woken thread manages to go to sleep again very quickly on > another wait channel, sleepq_broadcast() might incorrectly determine that the > sq->sq_blocked queue isn't empty, and start doing the wrong thing. So disregard my earlier e-mail. Here is a simple fix for the sleepq case: Index: subr_sleepqueue.c =================================================================== --- subr_sleepqueue.c (revision 182679) +++ subr_sleepqueue.c (working copy) @@ -779,7 +779,7 @@ sleepq_broadcast(void *wchan, int flags, int pri, int queue) { struct sleepqueue *sq; - struct thread *td; + struct thread *td, *tdn; int wakeup_swapper; CTR2(KTR_PROC, "sleepq_broadcast(%p, %d)", wchan, flags); @@ -793,8 +793,7 @@ /* Resume all blocked threads on the sleep queue. */ wakeup_swapper = 0; - while (!TAILQ_EMPTY(&sq->sq_blocked[queue])) { - td = TAILQ_FIRST(&sq->sq_blocked[queue]); + TAILQ_FOREACH_SAFE(td, &sq->sq_blocked[queue], td_slpq, tdn) { thread_lock(td); if (sleepq_resume_thread(sq, td, pri)) wakeup_swapper = 1; This only uses 'sq' to fetch the head of the queue once up front. It won't use it again once it has started waking up threads. > A similar (but probably much more difficult to trigger) issue is present with > regards to thread_lock() and turnstiles. > > The caller of thread_lock() might have performed sufficient locking to ensure > that the thread to be locked doesn't go away, but any turnstile spin lock > pointed to by td->td_lock isn't protected. Making turnstiles type stable > (setting UMA_ZONE_NOFREE flag for turnstile_zone) should fix that issue. Note that unlike the sleepq case, turnstiles are not made runnable until all of them are dequeued from the turnstile and assigned a new turnstile. Only after all that is settled are the threads made runnable in turnstile_unpend(). However, that doesn't fix this specific race (though it means the turnstile code is not subject to the same exact race as the sleepq code above). Making turnstiles type-stable is indeed probably the only fix for this. :-/ -- John Baldwin