Date: Fri, 9 Sep 2016 13:21:31 -0700 From: Mark Millard <markmi@dsl-only.net> To: Krzysztof Parzyszek <kristof@swissmail.org> Cc: Jukka Ukkonen <jau789@gmail.com>, freebsd-ppc@freebsd.org Subject: Re: PowerMac G5 hangs/crashes on boot: 10.2, 11.0-RCx Message-ID: <0A9EB3C7-F430-4F82-9B09-632754BB82C8@dsl-only.net> In-Reply-To: <db0aa91b-aa79-689a-e901-437e18b49b81@swissmail.org> References: <6ad00a2d-4213-18b8-7974-534aa3758837@swissmail.org> <E90BB066-47C9-4626-BE6C-5D15ECA0E4EE@gmail.com> <db0aa91b-aa79-689a-e901-437e18b49b81@swissmail.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On 2016-Sep-9, at 11:36 AM, Krzysztof Parzyszek <kristof T = swissmail.org> wrote: >=20 > On 9/9/2016 6:35 AM, Jukka Ukkonen wrote: >>=20 >> The story apparently goes such that the interrupt code shown can be >> pretty much anything. The interrupts might simply be enabled way = before >> the system is ready to handle them. >=20 > I've had similar issues for quite some time. Previous releases would = boot only sometimes, otherwise I'd be getting a hang or a crash. The = frequency of the boot problems seems to increase dramatically when I = boot from the hard-drive, but with 11 it has never booted correctly. >=20 > I wasn't the only one seeing this type of a problem and I remember = seeing a thread about it a while back. Mark Millard reported it, and = someone has tracked it down to some register getting (unexpectedly) = clobbered by the open firmware. I was hoping this had been fixed, but = it seems that things have only gotten worse... :( >=20 > CCing Mark---maybe he will know more about this. >=20 > -Krzysztof Unfortunately relative to powerpc and powerpc64: I've not had powerpc or = powerpc64 access since very early 2016-June and will not for a few more = weeks. (And, yes, the context is PowerMac's specifically.) So I've done no testing of if my personal kernel hack (that made the = PowerMac G5's boot reliably in my use) helps in any more modern FreeBSD = variants. It is unlikely that I'll get to that point before October = sometime. Until then I'll not be much direct help. I'm the one that isolated memory and register corruption examples on = PowerMac G5's before identifying my specific hack that I used to avoid = them. Beyond my reporting the hack in the lists I did submit a bugzilla report = documenting what change made the observed difference in boot reliability = (in the older context, anyway): https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D205458 (from = 2015-Dec-20) It reports as the technique: > The change is in ofw_sprg_prepare of sys/powerpc/ofw/ofw_machdep.c and = could look something like (presented in a form to show = new/PowerMacG5-Specific code and old general code): >=20 > #ifdef POWERMAC_G5_SPECIFIC_BUILD > __asm __volatile("mfsprg0 %0\n\t" > "mtsprg1 %1\n\t" > "mtsprg2 %2\n\t" > "mtsprg3 %3\n\t" > : "=3D&r"(ofw_sprg0_save) > : "r"(ofmsr[2]), > "r"(ofmsr[3]), > "r"(ofmsr[4])); > #else > // The historical code: > __asm __volatile("mfsprg0 %0\n\t" > "mtsprg0 %1\n\t" > "mtsprg1 %2\n\t" > "mtsprg2 %3\n\t" > "mtsprg3 %4\n\t" > : "=3D&r"(ofw_sprg0_save) > : "r"(ofmsr[1]), > "r"(ofmsr[2]), > "r"(ofmsr[3]), > "r"(ofmsr[4])); > #endif >=20 > In other words: for PowerMac G5's omit the mtsprg0 from ofmsr[1]: = leave the register as it already is instead of resetting it. The value = in ofmsr[1] is inappropriate to the context. I deliberately kept the = change minimal and left in all other code related to the register. All the evidence for this hack is observational. I've never figured out = a reasonable way to find out what Apple's openfirmware does with the = register involved and in what contexts. I wish I had better evidence for = what is going on without the hack. The type of evidence that I have = makes this purely a hack for now, even if it has a theory of operation = justification (that is not known yet). But as for the degree of observations: in isolating this I did well over = 10,000 failing boots (spread over months, although not continuous = activity). Frequently I'd have to try booting over a dozen times in a = row before it would make it through. That is part of why the total is so = large. After the hack I've not had any such failing boots up --but I = boot far less frequently since I do not need to force a reboot. (I = always buildworld buildkernel from source and my source has the hack.) I've no post-early-2016-June evidence relative to the hack. The lists have more information from as I investigated the issue, such = as the memory and register corruptions that I observed prior to = isolating the small change. But it is a mess to go through those notes = in any detail. Not likely without a strong motivation. I've no evidence that the change would be appropriate outside a PowerMac = G5 at all. This alone would keep FreeBSD from adopting it in a generic = build (even if there was a PowerMac G5 theory of operation justification = known). The submittal only suggested having a pre-made hook for manually = building from source for a PowerMac G5. Part of the issue is that I do not know a way to identify the context as = a PowerMac G5 context without use of openfirmware. Any use of = openfirmware to figure that out would re-create the problem as far as I = can tell. It appears that the build needs to be PowerMac G5 specific to = avoid the problem. I will note that I've never needed or used the hack on Powermac G4's or = a PowerMac G3. But, again, my evidence ends in early-2016-June. =3D=3D=3D Mark Millard markmi at dsl-only.net
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?0A9EB3C7-F430-4F82-9B09-632754BB82C8>