Date: Tue, 23 Sep 1997 19:08:53 -0700 (PDT) From: Archie Cobbs <archie@whistle.com> To: freebsd-hackers@freebsd.org Subject: is this a bug? Message-ID: <199709240208.TAA04263@bubba.whistle.com>
next in thread | raw e-mail | index | archive | help
We're trying to track down a very elusive bug that involves processes
ending up on two queues at once, or having runnable state SRUN and
being on a sleep queue, ... etc..
Question: is there a bug in this code from tsleep() ? The code does this:
1. Puts process on a sleep queue
2. Calls CURSIG(), which can result in a context switch
3. Sets p->p_stat to SSLEEP
It seems steps #2 and #3 are out of order... couldn't this result
in a process being on a sleep queue yet having state SRUN?
> #ifdef DIAGNOSTIC
> if( p == NULL )
> panic("tsleep1");
> if (ident == NULL || p->p_stat != SRUN)
> panic("tsleep");
> /* XXX This is not exhaustive, just the most common case */
> if ((p->p_procq.tqe_prev != NULL) && (*p->p_procq.tqe_prev == p))
> panic("sleeping process on run queue");
> #endif
> p->p_wchan = ident;
> p->p_wmesg = wmesg;
> p->p_slptime = 0;
> p->p_priority = priority & PRIMASK;
> TAILQ_INSERT_TAIL(&slpque[LOOKUP(ident)], p, p_procq);
> if (timo)
> timeout(endtsleep, (void *)p, timo);
> /*
> * We put ourselves on the sleep queue and start our timeout
> * before calling CURSIG, as we could stop there, and a wakeup
> * or a SIGCONT (or both) could occur while we were stopped.
> * A SIGCONT would cause us to be marked as SSLEEP
> * without resuming us, thus we must be ready for sleep
> * when CURSIG is called. If the wakeup happens while we're
> * stopped, p->p_wchan will be 0 upon return from CURSIG.
> */
> if (catch) {
> p->p_flag |= P_SINTR;
> if ((sig = CURSIG(p))) {
> if (p->p_wchan)
> unsleep(p);
> p->p_stat = SRUN;
> goto resume;
> }
> if (p->p_wchan == 0) {
> catch = 0;
> goto resume;
> }
> } else
> sig = 0;
> p->p_stat = SSLEEP;
> p->p_stats->p_ru.ru_nvcsw++;
> mi_switch();
> resume:
> curpriority = p->p_usrpri;
> splx(s);
In fact, one panic() we got was the "wakeup_one" panic, where
this test failed:
> qp = &slpque[LOOKUP(ident)];
>
> for (p = qp->tqh_first; p != NULL; p = p->p_procq.tqe_next) {
> #ifdef DIAGNOSTIC
> if (p->p_stat != SSLEEP && p->p_stat != SSTOP)
> panic("wakeup_one");
> #endif
Also: would there be any problems moving the "p->p_stat = SSLEEP"
statement to before the "if (catch)" statement?
We've tried it, and things seem to work, except for an occasional:
calcru: negative time: -1318 usec
which may or may not be related...
Thanks,
-Archie
___________________________________________________________________________
Archie Cobbs * Whistle Communications, Inc. * http://www.whistle.com
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199709240208.TAA04263>
