From owner-freebsd-current@FreeBSD.ORG Mon Mar 1 15:09:38 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id F1F6116A553 for ; Mon, 1 Mar 2004 15:09:37 -0800 (PST) Received: from mail5.speakeasy.net (mail5.speakeasy.net [216.254.0.205]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9FBC643D39 for ; Mon, 1 Mar 2004 15:09:35 -0800 (PST) (envelope-from jhb@FreeBSD.org) Received: (qmail 25178 invoked from network); 1 Mar 2004 23:09:35 -0000 Received: from dsl027-160-063.atl1.dsl.speakeasy.net (HELO server.baldwin.cx) ([216.27.160.63]) (envelope-sender ) encrypted SMTP for ; 1 Mar 2004 23:09:35 -0000 Received: from 10.50.40.205 (gw1.twc.weather.com [216.133.140.1]) by server.baldwin.cx (8.12.10/8.12.10) with ESMTP id i21N9M2B076770; Mon, 1 Mar 2004 18:09:30 -0500 (EST) (envelope-from jhb@FreeBSD.org) From: John Baldwin To: "Danny J. Zerkel" Date: Mon, 1 Mar 2004 18:02:51 -0500 User-Agent: KMail/1.6 References: <404399A6.8080303@columbus.rr.com> In-Reply-To: <404399A6.8080303@columbus.rr.com> MIME-Version: 1.0 Content-Disposition: inline Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <200403011802.51028.jhb@FreeBSD.org> X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on server.baldwin.cx cc: threads@FreeBSD.org cc: current@freebsd.org Subject: Re: Mozilla problems on current -- fix X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 01 Mar 2004 23:09:38 -0000 On Monday 01 March 2004 03:14 pm, Danny J. Zerkel wrote: > Looks like John's sys/kern/kern_thread.c 1.171 broke Mozilla (and > threading in general). He appears to have left out a test that > the sleeping thread is interruptable in one spot: > > --- kern_thread.c.orig Mon Mar 1 15:08:01 2004 > +++ kern_thread.c Mon Mar 1 14:26:30 2004 > @@ -646,7 +646,8 @@ > } else if (TD_ON_SLEEPQ(td2) && > ((td2->td_wchan == &kg->kg_completed) || > (td2->td_wchan == &p->p_siglist && > - (ku->ku_mflags & KMF_WAITSIGEVENT)))) { > + (ku->ku_mflags & KMF_WAITSIGEVENT))) && > + (td2->td_flags & TDF_SINTR)) { > sleepq_abort(td2); > } else { > ku->ku_flags |= KUF_DOUPCALL; > > This change fixes my Mozilla hangs and other panics on current. This should not matter. This is the only place that doesn't check TDF_SINTR, true, but it should always be set in these cases. First, look in kern_thread.c where we msleep() on these addresses (p_siglist and kg_completed): if (!(p->p_flag & P_SIGEVENT) && !(ku->ku_flags & KUF_DOUPCALL)) error = msleep(&p->p_siglist, &p->p_mtx, PPAUSE| PCATCH, "ksesigwait", (uap->timeout ? tvtohz(&tv) : 0)); p->p_flag &= ~P_SIGEVENT; Note that PCATCH is set. error = msleep(&kg->kg_completed, &p->p_mtx, PPAUSE|PCATCH, "kserel", (uap->timeout ? tvtohz(&tv) : 0)); kg->kg_upsleeps--; Again, PCATCH is set. Now onto kern_synch.c and msleep(): catch = priority & PCATCH; ... sq = sleepq_lookup(ident); ... sleepq_add(sq, ident, mtx, wmesg, 0); if (timo) sleepq_set_timeout(sq, ident, timo); if (catch) { sig = sleepq_catch_signals(ident); if (sig == 0 && !TD_ON_SLEEPQ(td)) { mtx_lock_spin(&sched_lock); td->td_flags &= ~TDF_SINTR; mtx_unlock_spin(&sched_lock); catch = 0; } } else sig = 0; ... sleepq_wait_foo(ident); Thus, we see that if PCATCH is set, we call sleepq_add() and then sleepq_catch_signals(). Bah, I see. Only the sleep queue lock is atomic for the whole operation and not sched_lock. While your patch might fix the assertion for this KSE case, there might be a larger set or problems with the race here, which is that sched_lock can be dropped in between td_wchan being set and TDF_SINTR being set. I.e, if you get an abort in between the sleepq_add() and sleepq_catch_signals() and just use sched_lock it will get lost when it shouldn't. For signals, I think the call to cursig() will still work ok. I'm not sure if sleepq_catch_signals() calls anything that checks for KUF_DOUPCALL. Let me check. Ugh, this code is too hard to follow. I have no idea if this is actually safe or not. :( I can commit the above as a band-aid I guess. I don't know why the KSE code doesn't just use a wakeup here, the sleeps are in top-level functions and not buried deep. -- John Baldwin <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve" = http://www.FreeBSD.org