Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 25 Feb 2019 14:46:55 -0800
From:      Mark Millard <marklmi@yahoo.com>
To:        FreeBSD PowerPC ML <freebsd-ppc@freebsd.org>
Cc:        freebsd-hackers Hackers <freebsd-hackers@freebsd.org>
Subject:   head -r344018 based powerpc64 pmac_thermal hangup (stuck sleeping): some preliminary evidence 
Message-ID:  <40D1DDA1-10FB-4F2C-B38B-C7FED5795542@yahoo.com>

next in thread | raw e-mail | index | archive | help
I adjusted what KTR_PROC does to just show just some of its pid 26
(pmac_thermal) messages and adding some extra output as well. I'll
list some of that output later. I'll note that beyond pmac_thermal
the buf*daemon* threads also seem to be subject to being stuck
sleeping in (offsets are for a specific build of mine):

mi_switch+0x134 sleepq_switch+0x2ec sleepq_timedwait+0x48 _sleep+0x41c


So far for pmac_thermal I've seen that until the failing case:

sleepq_set_timeout_sbt was being given: sbt=3D=3D0xcccccbe0 pr=3D=3D0x0 =
flags=3D=3D0x100
and in turn was using: prec=3D=3D0xcccccbe flags=3D=3D0x501 (of course =
the used
td_sleeptimo varies).

[I note that 16*0xcccccbe =3D=3D 0xcccccbe0, the original sbt value,
not that I know yet if this matters.]

But the sequence leading to failures is different:

sleepq_set_timeout_sbt was given: sbt=3D=3D0xfffffed8 pr=3D=3D0x0 =
flags=3D=3D0x100
and in turn was using: prec=3D=3D0xfffffed flags=3D=3D0x501

[I note that 16*0xfffffed =3D=3D 0xfffffed0, so less than the original
sbt value, not that I know this matters at this point.]

For sbt=3D=3D0xfffffed8, the callout to sleepq_timeout ends up with =
values
like (a particular example):

td_sleeptimo=3D0x470d360fe5 sbinuptime=3D0x46c869f6aa

where the reporting code looks like:

static void
sleepq_timeout(void *arg)
{
        struct sleepqueue_chain *sc __unused;
        struct sleepqueue *sq;
        struct thread *td;
        void *wchan;
        int wakeup_swapper;
        sbintime_t sbut; // HACK!!!

        td =3D arg;
        wakeup_swapper =3D 0;
        if (26 =3D=3D td->td_proc->p_pid) // HACK!!!
        CTR3(KTR_PROC, "sleepq_timeout: thread %p (pid %ld, %s)",
            (void *)td, (long)td->td_proc->p_pid, (void *)td->td_name);
       =20
        thread_lock(td);
       =20
        sbut=3D sbinuptime(); // HACK!!!
        if (td->td_sleeptimo > sbut || td->td_sleeptimo =3D=3D 0) {
                /*
                 * The thread does not want a timeout (yet).
                 */
                if (26 =3D=3D td->td_proc->p_pid) // HACK!!!
                        CTR5(KTR_PROC, "sleepq_timeout thread not want =
timeout yet: thread %p (pid %ld, %s) td_sleeptimo=3D%jx sbinuptime=3D%jx",=

                            (void *)td, (long)td->td_proc->p_pid, (void =
*)td->td_name, (uintmax_t)td->td_sleeptimo, (uintmax_t)sbut);
. . .

So far sleepq_set_timeout_sbt being given sbt=3D=3D0xfffffed8 instead of
sbt=3D=3D0xcccccbe0 seems to be an accurate indicator of if the problem =
will
happen in sleepq_timeout. (But I've only a few examples so far.)

I'll note that the sleepq_timeout code for this case does not set up
another callout to itself for later and the sleep then continues
indefinitely.

I've not yet gotten into finding evidence for why the callout to
sleepq_timeout itself happens. Hopefully I can find some.


An example of some modified KTR_PROC output is:

493 (0xc000000001eb45a0:cpu3) 3278339934 =
/usr/src/sys/kern/subr_sleepqueue.c.1026: sleepq_timeout thread not want =
timeout yet: thread 0xc00000000a8315a0 (pid 26, pmac_thermal) =
td_sleeptimo=3D470d360fe5 sbinuptime=3D46c869f6aa
492 (0xc000000001eb45a0:cpu3) 3278339919 =
/usr/src/sys/kern/subr_sleepqueue.c.1015: sleepq_timeout: thread =
0xc00000000a8315a0 (pid 26, pmac_thermal)
491 (0xc00000000a8315a0:cpu3) 3253979006 =
/usr/src/sys/kern/kern_synch.c.435: mi_switch: old thread 100078 =
(td_sched 0xc00000000a831ad8, pid 26, pmac_thermal)
490 (0xc00000000a8315a0:cpu3) 3253978972 =
/usr/src/sys/kern/subr_sleepqueue.c.408: sleepq_set_timeout_sbt: =
sbt=3Dfffffed8 pr=3D0 flags=3D100 td_sleeptimo=3D470d360fe5 prec=3Dfffffed=
 flags=3D501
489 (0xc00000000a8315a0:cpu3) 3253978965 =
/usr/src/sys/kern/subr_sleepqueue.c.406: sleepq_set_timeout_sbt =
installing sleepq_timeout: thread 0xc00000000a8315a0 (pid 26, =
pmac_thermal)
488 (0xc00000000a8315a0:cpu3) 3253978940 =
/usr/src/sys/kern/kern_synch.c.180: sleep: thread 100078 (pid 26, =
pmac_thermal) on pmac_therm (0xc0000000015082c3)
487 (0xc00000000a8315a0:cpu3) 3253978890 =
/usr/src/sys/kern/subr_sleepqueue.c.630: sleepq resume: thread =
0xc00000000a8315a0 (pid 26, pmac_thermal)
486 (0xc00000000a8315a0:cpu3) 3253978883 =
/usr/src/sys/kern/kern_synch.c.445: mi_switch: new thread 100078 =
(td_sched 0xc00000000a831ad8, pid 26, pmac_thermal)
485 (0xc000000001f6e000:cpu2) 3253978598 =
/usr/src/sys/kern/subr_sleepqueue.c.836: sleepq_wakeup: thread =
0xc00000000a8315a0 (pid 26, pmac_thermal)
484 (0xc000000001f6e000:cpu2) 3253978580 =
/usr/src/sys/kern/subr_sleepqueue.c.988: sleepq_remove_matching calling =
sleepq_resume_thread: thread 0xc00000000a8315a0 (pid 26, pmac_thermal)
483 (0xc00000000a8315a0:cpu3) 3252269011 =
/usr/src/sys/kern/kern_synch.c.435: mi_switch: old thread 100078 =
(td_sched 0xc00000000a831ad8, pid 26, pmac_thermal)
482 (0xc00000000a8315a0:cpu3) 3252268968 =
/usr/src/sys/kern/subr_sleepqueue.c.408: sleepq_set_timeout_sbt: =
sbt=3Dcccccbe0 pr=3D0 flags=3D100 td_sleeptimo=3D46ccf486f8 prec=3Dcccccbe=
 flags=3D501
481 (0xc00000000a8315a0:cpu3) 3252268962 =
/usr/src/sys/kern/subr_sleepqueue.c.406: sleepq_set_timeout_sbt =
installing sleepq_timeout: thread 0xc00000000a8315a0 (pid 26, =
pmac_thermal)
480 (0xc00000000a8315a0:cpu3) 3252268935 =
/usr/src/sys/kern/kern_synch.c.180: sleep: thread 100078 (pid 26, =
pmac_thermal) on smu (0xe000000087fd1670)
479 (0xc00000000a8315a0:cpu3) 3252268793 =
/usr/src/sys/kern/subr_sleepqueue.c.630: sleepq resume: thread =
0xc00000000a8315a0 (pid 26, pmac_thermal)
478 (0xc00000000a8315a0:cpu3) 3252268778 =
/usr/src/sys/kern/kern_synch.c.445: mi_switch: new thread 100078 =
(td_sched 0xc00000000a831ad8, pid 26, pmac_thermal)
477 (0xc000000001f6e000:cpu2) 3252268391 =
/usr/src/sys/kern/subr_sleepqueue.c.836: sleepq_wakeup: thread =
0xc00000000a8315a0 (pid 26, pmac_thermal)
476 (0xc000000001f6e000:cpu2) 3252268385 =
/usr/src/sys/kern/subr_sleepqueue.c.988: sleepq_remove_matching calling =
sleepq_resume_thread: thread 0xc00000000a8315a0 (pid 26, pmac_thermal)
475 (0xc00000000a8315a0:cpu2) 3250546514 =
/usr/src/sys/kern/kern_synch.c.435: mi_switch: old thread 100078 =
(td_sched 0xc00000000a831ad8, pid 26, pmac_thermal)
474 (0xc00000000a8315a0:cpu2) 3250546468 =
/usr/src/sys/kern/subr_sleepqueue.c.408: sleepq_set_timeout_sbt: =
sbt=3Dcccccbe0 pr=3D0 flags=3D100 td_sleeptimo=3D46bfa4a7cc prec=3Dcccccbe=
 flags=3D501
473 (0xc00000000a8315a0:cpu2) 3250546462 =
/usr/src/sys/kern/subr_sleepqueue.c.406: sleepq_set_timeout_sbt =
installing sleepq_timeout: thread 0xc00000000a8315a0 (pid 26, =
pmac_thermal)
472 (0xc00000000a8315a0:cpu2) 3250546436 =
/usr/src/sys/kern/kern_synch.c.180: sleep: thread 100078 (pid 26, =
pmac_thermal) on smu (0xe000000087fd1670)
471 (0xc00000000a8315a0:cpu2) 3250546295 =
/usr/src/sys/kern/subr_sleepqueue.c.630: sleepq resume: thread =
0xc00000000a8315a0 (pid 26, pmac_thermal)
470 (0xc00000000a8315a0:cpu2) 3250546286 =
/usr/src/sys/kern/kern_synch.c.445: mi_switch: new thread 100078 =
(td_sched 0xc00000000a831ad8, pid 26, pmac_thermal)
469 (0xc000000001f6e000:cpu0) 3250545941 =
/usr/src/sys/kern/subr_sleepqueue.c.836: sleepq_wakeup: thread =
0xc00000000a8315a0 (pid 26, pmac_thermal)
468 (0xc000000001f6e000:cpu0) 3250545934 =
/usr/src/sys/kern/subr_sleepqueue.c.988: sleepq_remove_matching calling =
sleepq_resume_thread: thread 0xc00000000a8315a0 (pid 26, pmac_thermal)
. . .


=3D=3D=3D
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?40D1DDA1-10FB-4F2C-B38B-C7FED5795542>