Date: Mon, 1 Mar 2004 18:02:51 -0500 From: John Baldwin <jhb@FreeBSD.org> To: "Danny J. Zerkel" <dzerkel@columbus.rr.com> Cc: current@freebsd.org Subject: Re: Mozilla problems on current -- fix Message-ID: <200403011802.51028.jhb@FreeBSD.org> In-Reply-To: <404399A6.8080303@columbus.rr.com> References: <404399A6.8080303@columbus.rr.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Monday 01 March 2004 03:14 pm, Danny J. Zerkel wrote: > Looks like John's sys/kern/kern_thread.c 1.171 broke Mozilla (and > threading in general). He appears to have left out a test that > the sleeping thread is interruptable in one spot: > > --- kern_thread.c.orig Mon Mar 1 15:08:01 2004 > +++ kern_thread.c Mon Mar 1 14:26:30 2004 > @@ -646,7 +646,8 @@ > } else if (TD_ON_SLEEPQ(td2) && > ((td2->td_wchan == &kg->kg_completed) || > (td2->td_wchan == &p->p_siglist && > - (ku->ku_mflags & KMF_WAITSIGEVENT)))) { > + (ku->ku_mflags & KMF_WAITSIGEVENT))) && > + (td2->td_flags & TDF_SINTR)) { > sleepq_abort(td2); > } else { > ku->ku_flags |= KUF_DOUPCALL; > > This change fixes my Mozilla hangs and other panics on current. This should not matter. This is the only place that doesn't check TDF_SINTR, true, but it should always be set in these cases. First, look in kern_thread.c where we msleep() on these addresses (p_siglist and kg_completed): if (!(p->p_flag & P_SIGEVENT) && !(ku->ku_flags & KUF_DOUPCALL)) error = msleep(&p->p_siglist, &p->p_mtx, PPAUSE| PCATCH, "ksesigwait", (uap->timeout ? tvtohz(&tv) : 0)); p->p_flag &= ~P_SIGEVENT; Note that PCATCH is set. error = msleep(&kg->kg_completed, &p->p_mtx, PPAUSE|PCATCH, "kserel", (uap->timeout ? tvtohz(&tv) : 0)); kg->kg_upsleeps--; Again, PCATCH is set. Now onto kern_synch.c and msleep(): catch = priority & PCATCH; ... sq = sleepq_lookup(ident); ... sleepq_add(sq, ident, mtx, wmesg, 0); if (timo) sleepq_set_timeout(sq, ident, timo); if (catch) { sig = sleepq_catch_signals(ident); if (sig == 0 && !TD_ON_SLEEPQ(td)) { mtx_lock_spin(&sched_lock); td->td_flags &= ~TDF_SINTR; mtx_unlock_spin(&sched_lock); catch = 0; } } else sig = 0; ... sleepq_wait_foo(ident); Thus, we see that if PCATCH is set, we call sleepq_add() and then sleepq_catch_signals(). Bah, I see. Only the sleep queue lock is atomic for the whole operation and not sched_lock. While your patch might fix the assertion for this KSE case, there might be a larger set or problems with the race here, which is that sched_lock can be dropped in between td_wchan being set and TDF_SINTR being set. I.e, if you get an abort in between the sleepq_add() and sleepq_catch_signals() and just use sched_lock it will get lost when it shouldn't. For signals, I think the call to cursig() will still work ok. I'm not sure if sleepq_catch_signals() calls anything that checks for KUF_DOUPCALL. Let me check. Ugh, this code is too hard to follow. I have no idea if this is actually safe or not. :( I can commit the above as a band-aid I guess. I don't know why the KSE code doesn't just use a wakeup here, the sleeps are in top-level functions and not buried deep. -- John Baldwin <jhb@FreeBSD.org> <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve" = http://www.FreeBSD.org
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200403011802.51028.jhb>