Date: Mon, 25 Feb 2019 15:49:56 -0800 From: Mark Millard <marklmi@yahoo.com> To: FreeBSD PowerPC ML <freebsd-ppc@freebsd.org> Cc: freebsd-hackers Hackers <freebsd-hackers@freebsd.org> Subject: Re: head -r344018 based powerpc64 pmac_thermal hangup (stuck sleeping): some preliminary evidence [not as uniform as I initially saw] Message-ID: <AADCB7B3-C970-48B4-B20A-F6F78D848F86@yahoo.com> In-Reply-To: <40D1DDA1-10FB-4F2C-B38B-C7FED5795542@yahoo.com> References: <40D1DDA1-10FB-4F2C-B38B-C7FED5795542@yahoo.com>
next in thread | previous in thread | raw e-mail | index | archive | help
[I've now seen examples of sbt=3D=3D0xfffffed8 that did not lead to a hangup.] On 2019-Feb-25, at 14:46, Mark Millard <marklmi@yahoo.com> wrote: > I adjusted what KTR_PROC does to just show just some of its pid 26 > (pmac_thermal) messages and adding some extra output as well. I'll > list some of that output later. I'll note that beyond pmac_thermal > the buf*daemon* threads also seem to be subject to being stuck > sleeping in (offsets are for a specific build of mine): >=20 > mi_switch+0x134 sleepq_switch+0x2ec sleepq_timedwait+0x48 _sleep+0x41c >=20 >=20 > So far for pmac_thermal I've seen that until the failing case: >=20 > sleepq_set_timeout_sbt was being given: sbt=3D=3D0xcccccbe0 pr=3D=3D0x0 = flags=3D=3D0x100 > and in turn was using: prec=3D=3D0xcccccbe flags=3D=3D0x501 (of course = the used > td_sleeptimo varies). >=20 > [I note that 16*0xcccccbe =3D=3D 0xcccccbe0, the original sbt value, > not that I know yet if this matters.] >=20 > But the sequence leading to failures is different: I've now seen examples of sbt=3D=3D0xfffffed8 that did not lead to a hangup. So it is not a reliable predictor of the hang-up in sleep. I'm trying to see if I can observe a failure with different value. > sleepq_set_timeout_sbt was given: sbt=3D=3D0xfffffed8 pr=3D=3D0x0 = flags=3D=3D0x100 > and in turn was using: prec=3D=3D0xfffffed flags=3D=3D0x501 >=20 > [I note that 16*0xfffffed =3D=3D 0xfffffed0, so less than the original > sbt value, not that I know this matters at this point.] >=20 > For sbt=3D=3D0xfffffed8, the callout to sleepq_timeout ends up with = values > like (a particular example): >=20 > td_sleeptimo=3D0x470d360fe5 sbinuptime=3D0x46c869f6aa >=20 > where the reporting code looks like: >=20 > static void > sleepq_timeout(void *arg) > { > struct sleepqueue_chain *sc __unused; > struct sleepqueue *sq; > struct thread *td; > void *wchan; > int wakeup_swapper; > sbintime_t sbut; // HACK!!! >=20 > td =3D arg; > wakeup_swapper =3D 0; > if (26 =3D=3D td->td_proc->p_pid) // HACK!!! > CTR3(KTR_PROC, "sleepq_timeout: thread %p (pid %ld, %s)", > (void *)td, (long)td->td_proc->p_pid, (void *)td->td_name); >=20 > thread_lock(td); >=20 > sbut=3D sbinuptime(); // HACK!!! > if (td->td_sleeptimo > sbut || td->td_sleeptimo =3D=3D 0) { > /* > * The thread does not want a timeout (yet). > */ > if (26 =3D=3D td->td_proc->p_pid) // HACK!!! > CTR5(KTR_PROC, "sleepq_timeout thread not want = timeout yet: thread %p (pid %ld, %s) td_sleeptimo=3D%jx sbinuptime=3D%jx",= > (void *)td, (long)td->td_proc->p_pid, (void = *)td->td_name, (uintmax_t)td->td_sleeptimo, (uintmax_t)sbut); > . . . >=20 > So far sleepq_set_timeout_sbt being given sbt=3D=3D0xfffffed8 instead = of > sbt=3D=3D0xcccccbe0 seems to be an accurate indicator of if the = problem will > happen in sleepq_timeout. (But I've only a few examples so far.) >=20 I've now seen examples of sbt=3D=3D0xfffffed8 that did not lead to a hangup. So it is not a reliable predictor of the hang-up in sleep. It looks like the values are sometimes more varied than I'd seen before as well. > I'll note that the sleepq_timeout code for this case does not set up > another callout to itself for later and the sleep then continues > indefinitely. >=20 > I've not yet gotten into finding evidence for why the callout to > sleepq_timeout itself happens. Hopefully I can find some. >=20 >=20 > An example of some modified KTR_PROC output is: >=20 > . . . =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?AADCB7B3-C970-48B4-B20A-F6F78D848F86>