Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 19 Feb 2020 15:17:41 -0800
From:      Mark Millard <marklmi@yahoo.com>
To:        FreeBSD PowerPC ML <freebsd-ppc@freebsd.org>
Subject:   head -r358094 and -r357948 vs. powerpc64 or 32-bit powerpc multi-socket TB (time) mismatches: lots of temporary system hangs
Message-ID:  <F5C60F3D-9C5D-45B0-A525-576AC4E3CECD@yahoo.com>
References:  <F5C60F3D-9C5D-45B0-A525-576AC4E3CECD.ref@yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help
[The below is mostly for normal FreeBSD, without any patching
to cause TB values to approximately respect some cause->effect
ordering sequence across example sockets/cores.]

Head -r358094 checked in the fix for head -r357549 breaking booting
on (some?) powerpc machines, such as PowerMac G5 dual-socket ones.

This means one can again grab artifact.ci kernels and test them,
for example. (For me, that avoids my patches being involved.)

That in turn exposes kib's -r357948 check-in:

QUOTE
Consolidate read code for timecounters and fix possible overflow in
  bintime()/binuptime().
END QUOTE

This leads to mismatched power/powerpc TB values across sockets/cores
causing the system to temporarily hang/wait from the overflow being
handled and leading to longer times being involved. Temporary here is
not necessarily momentary but possibly minutes and it may only be
minutes or less between such hangups. The hangups can start before
the login prompt is reached or while typing in the username to
log into.

I see this both on multi-socket PowerMac G4s and multi-socket PowerMac
G5s using the matching -r358094 kernels from artifact.ci .

It is the same multi-socket PowerMac behavior that I saw for a trail
versions of kib's patch back around 2019-Mar/Apr or so when I then
tested without a patch for the TB values. (So I was expecting such a
result from such a check-in.)


I've not (yet?) figured out how to fit a variant of my TB patch into
the code base as requested or to cover things like suspend/resume if
there is such for multi-core powerpc systems that also get the TB
value relationship problem. (I only fairly recently learned that the
TB value relationships issue is not historically limited to PowerMacs.)
Without the long-sustained available time like I had when I came
up with the existing patch (weeks back then), it is not clear how soon
it would be for me to have a more general and more acceptable patch for
the FreeBSD code base.

My test context also does not span lots of sockets/cores or NUMA
variability in memory access timing --or any suspend/resume contexts.
Even for what I did cover, I'm not sure how well it generalizes for
such issues.


I, of course, normally run with my existing PowerMac patch. So far
I've not seen problems from -r357948 for that context for the
G4s or G5s (but have seen the problem for not using such a patch).

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?F5C60F3D-9C5D-45B0-A525-576AC4E3CECD>