Date: Mon, 25 Feb 2019 15:49:56 -0800 From: Mark Millard <marklmi@yahoo.com> To: FreeBSD PowerPC ML <freebsd-ppc@freebsd.org> Cc: freebsd-hackers Hackers <freebsd-hackers@freebsd.org> Subject: Re: head -r344018 based powerpc64 pmac_thermal hangup (stuck sleeping): some preliminary evidence [not as uniform as I initially saw] Message-ID: <AADCB7B3-C970-48B4-B20A-F6F78D848F86@yahoo.com> In-Reply-To: <40D1DDA1-10FB-4F2C-B38B-C7FED5795542@yahoo.com> References: <40D1DDA1-10FB-4F2C-B38B-C7FED5795542@yahoo.com>
next in thread | previous in thread | raw e-mail | index | archive | help
[I've now seen examples of sbt=3D=3D0xfffffed8 that did not lead to
a hangup.]
On 2019-Feb-25, at 14:46, Mark Millard <marklmi@yahoo.com> wrote:
> I adjusted what KTR_PROC does to just show just some of its pid 26
> (pmac_thermal) messages and adding some extra output as well. I'll
> list some of that output later. I'll note that beyond pmac_thermal
> the buf*daemon* threads also seem to be subject to being stuck
> sleeping in (offsets are for a specific build of mine):
>=20
> mi_switch+0x134 sleepq_switch+0x2ec sleepq_timedwait+0x48 _sleep+0x41c
>=20
>=20
> So far for pmac_thermal I've seen that until the failing case:
>=20
> sleepq_set_timeout_sbt was being given: sbt=3D=3D0xcccccbe0 pr=3D=3D0x0 =
flags=3D=3D0x100
> and in turn was using: prec=3D=3D0xcccccbe flags=3D=3D0x501 (of course =
the used
> td_sleeptimo varies).
>=20
> [I note that 16*0xcccccbe =3D=3D 0xcccccbe0, the original sbt value,
> not that I know yet if this matters.]
>=20
> But the sequence leading to failures is different:
I've now seen examples of sbt=3D=3D0xfffffed8 that did not lead to
a hangup. So it is not a reliable predictor of the hang-up in
sleep. I'm trying to see if I can observe a failure with
different value.
> sleepq_set_timeout_sbt was given: sbt=3D=3D0xfffffed8 pr=3D=3D0x0 =
flags=3D=3D0x100
> and in turn was using: prec=3D=3D0xfffffed flags=3D=3D0x501
>=20
> [I note that 16*0xfffffed =3D=3D 0xfffffed0, so less than the original
> sbt value, not that I know this matters at this point.]
>=20
> For sbt=3D=3D0xfffffed8, the callout to sleepq_timeout ends up with =
values
> like (a particular example):
>=20
> td_sleeptimo=3D0x470d360fe5 sbinuptime=3D0x46c869f6aa
>=20
> where the reporting code looks like:
>=20
> static void
> sleepq_timeout(void *arg)
> {
> struct sleepqueue_chain *sc __unused;
> struct sleepqueue *sq;
> struct thread *td;
> void *wchan;
> int wakeup_swapper;
> sbintime_t sbut; // HACK!!!
>=20
> td =3D arg;
> wakeup_swapper =3D 0;
> if (26 =3D=3D td->td_proc->p_pid) // HACK!!!
> CTR3(KTR_PROC, "sleepq_timeout: thread %p (pid %ld, %s)",
> (void *)td, (long)td->td_proc->p_pid, (void *)td->td_name);
>=20
> thread_lock(td);
>=20
> sbut=3D sbinuptime(); // HACK!!!
> if (td->td_sleeptimo > sbut || td->td_sleeptimo =3D=3D 0) {
> /*
> * The thread does not want a timeout (yet).
> */
> if (26 =3D=3D td->td_proc->p_pid) // HACK!!!
> CTR5(KTR_PROC, "sleepq_timeout thread not want =
timeout yet: thread %p (pid %ld, %s) td_sleeptimo=3D%jx sbinuptime=3D%jx",=
> (void *)td, (long)td->td_proc->p_pid, (void =
*)td->td_name, (uintmax_t)td->td_sleeptimo, (uintmax_t)sbut);
> . . .
>=20
> So far sleepq_set_timeout_sbt being given sbt=3D=3D0xfffffed8 instead =
of
> sbt=3D=3D0xcccccbe0 seems to be an accurate indicator of if the =
problem will
> happen in sleepq_timeout. (But I've only a few examples so far.)
>=20
I've now seen examples of sbt=3D=3D0xfffffed8 that did not lead to
a hangup. So it is not a reliable predictor of the hang-up in
sleep.
It looks like the values are sometimes more varied than I'd
seen before as well.
> I'll note that the sleepq_timeout code for this case does not set up
> another callout to itself for later and the sleep then continues
> indefinitely.
>=20
> I've not yet gotten into finding evidence for why the callout to
> sleepq_timeout itself happens. Hopefully I can find some.
>=20
>=20
> An example of some modified KTR_PROC output is:
>=20
> . . .
=3D=3D=3D
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?AADCB7B3-C970-48B4-B20A-F6F78D848F86>
