Date: Tue, 23 Sep 1997 19:08:53 -0700 (PDT) From: Archie Cobbs <archie@whistle.com> To: freebsd-hackers@freebsd.org Subject: is this a bug? Message-ID: <199709240208.TAA04263@bubba.whistle.com>
next in thread | raw e-mail | index | archive | help
We're trying to track down a very elusive bug that involves processes ending up on two queues at once, or having runnable state SRUN and being on a sleep queue, ... etc.. Question: is there a bug in this code from tsleep() ? The code does this: 1. Puts process on a sleep queue 2. Calls CURSIG(), which can result in a context switch 3. Sets p->p_stat to SSLEEP It seems steps #2 and #3 are out of order... couldn't this result in a process being on a sleep queue yet having state SRUN? > #ifdef DIAGNOSTIC > if( p == NULL ) > panic("tsleep1"); > if (ident == NULL || p->p_stat != SRUN) > panic("tsleep"); > /* XXX This is not exhaustive, just the most common case */ > if ((p->p_procq.tqe_prev != NULL) && (*p->p_procq.tqe_prev == p)) > panic("sleeping process on run queue"); > #endif > p->p_wchan = ident; > p->p_wmesg = wmesg; > p->p_slptime = 0; > p->p_priority = priority & PRIMASK; > TAILQ_INSERT_TAIL(&slpque[LOOKUP(ident)], p, p_procq); > if (timo) > timeout(endtsleep, (void *)p, timo); > /* > * We put ourselves on the sleep queue and start our timeout > * before calling CURSIG, as we could stop there, and a wakeup > * or a SIGCONT (or both) could occur while we were stopped. > * A SIGCONT would cause us to be marked as SSLEEP > * without resuming us, thus we must be ready for sleep > * when CURSIG is called. If the wakeup happens while we're > * stopped, p->p_wchan will be 0 upon return from CURSIG. > */ > if (catch) { > p->p_flag |= P_SINTR; > if ((sig = CURSIG(p))) { > if (p->p_wchan) > unsleep(p); > p->p_stat = SRUN; > goto resume; > } > if (p->p_wchan == 0) { > catch = 0; > goto resume; > } > } else > sig = 0; > p->p_stat = SSLEEP; > p->p_stats->p_ru.ru_nvcsw++; > mi_switch(); > resume: > curpriority = p->p_usrpri; > splx(s); In fact, one panic() we got was the "wakeup_one" panic, where this test failed: > qp = &slpque[LOOKUP(ident)]; > > for (p = qp->tqh_first; p != NULL; p = p->p_procq.tqe_next) { > #ifdef DIAGNOSTIC > if (p->p_stat != SSLEEP && p->p_stat != SSTOP) > panic("wakeup_one"); > #endif Also: would there be any problems moving the "p->p_stat = SSLEEP" statement to before the "if (catch)" statement? We've tried it, and things seem to work, except for an occasional: calcru: negative time: -1318 usec which may or may not be related... Thanks, -Archie ___________________________________________________________________________ Archie Cobbs * Whistle Communications, Inc. * http://www.whistle.com
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199709240208.TAA04263>