From owner-freebsd-ppc@freebsd.org Fri Sep 9 20:48:16 2016 Return-Path: Delivered-To: freebsd-ppc@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 83E45BD2D13 for ; Fri, 9 Sep 2016 20:48:16 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: from asp.reflexion.net (outbound-mail-211-173.reflexion.net [208.70.211.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 44A4082C for ; Fri, 9 Sep 2016 20:48:15 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: (qmail 2695 invoked from network); 9 Sep 2016 20:21:25 -0000 Received: from unknown (HELO mail-cs-02.app.dca.reflexion.local) (10.81.19.2) by 0 (rfx-qmail) with SMTP; 9 Sep 2016 20:21:25 -0000 Received: by mail-cs-02.app.dca.reflexion.local (Reflexion email security v8.00.0) with SMTP; Fri, 09 Sep 2016 16:21:25 -0400 (EDT) Received: (qmail 28994 invoked from network); 9 Sep 2016 20:21:24 -0000 Received: from unknown (HELO iron2.pdx.net) (69.64.224.71) by 0 (rfx-qmail) with (AES256-SHA encrypted) SMTP; 9 Sep 2016 20:21:24 -0000 Received: from [192.168.0.104] (ip70-189-131-151.lv.lv.cox.net [70.189.131.151]) by iron2.pdx.net (Postfix) with ESMTPSA id 278CBEC904C; Fri, 9 Sep 2016 13:21:32 -0700 (PDT) Content-Type: text/plain; charset=windows-1252 Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\)) Subject: Re: PowerMac G5 hangs/crashes on boot: 10.2, 11.0-RCx From: Mark Millard In-Reply-To: Date: Fri, 9 Sep 2016 13:21:31 -0700 Cc: Jukka Ukkonen , freebsd-ppc@freebsd.org Content-Transfer-Encoding: quoted-printable Message-Id: <0A9EB3C7-F430-4F82-9B09-632754BB82C8@dsl-only.net> References: <6ad00a2d-4213-18b8-7974-534aa3758837@swissmail.org> To: Krzysztof Parzyszek X-Mailer: Apple Mail (2.3124) X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 09 Sep 2016 20:48:16 -0000 On 2016-Sep-9, at 11:36 AM, Krzysztof Parzyszek wrote: >=20 > On 9/9/2016 6:35 AM, Jukka Ukkonen wrote: >>=20 >> The story apparently goes such that the interrupt code shown can be >> pretty much anything. The interrupts might simply be enabled way = before >> the system is ready to handle them. >=20 > I've had similar issues for quite some time. Previous releases would = boot only sometimes, otherwise I'd be getting a hang or a crash. The = frequency of the boot problems seems to increase dramatically when I = boot from the hard-drive, but with 11 it has never booted correctly. >=20 > I wasn't the only one seeing this type of a problem and I remember = seeing a thread about it a while back. Mark Millard reported it, and = someone has tracked it down to some register getting (unexpectedly) = clobbered by the open firmware. I was hoping this had been fixed, but = it seems that things have only gotten worse... :( >=20 > CCing Mark---maybe he will know more about this. >=20 > -Krzysztof Unfortunately relative to powerpc and powerpc64: I've not had powerpc or = powerpc64 access since very early 2016-June and will not for a few more = weeks. (And, yes, the context is PowerMac's specifically.) So I've done no testing of if my personal kernel hack (that made the = PowerMac G5's boot reliably in my use) helps in any more modern FreeBSD = variants. It is unlikely that I'll get to that point before October = sometime. Until then I'll not be much direct help. I'm the one that isolated memory and register corruption examples on = PowerMac G5's before identifying my specific hack that I used to avoid = them. Beyond my reporting the hack in the lists I did submit a bugzilla report = documenting what change made the observed difference in boot reliability = (in the older context, anyway): https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D205458 (from = 2015-Dec-20) It reports as the technique: > The change is in ofw_sprg_prepare of sys/powerpc/ofw/ofw_machdep.c and = could look something like (presented in a form to show = new/PowerMacG5-Specific code and old general code): >=20 > #ifdef POWERMAC_G5_SPECIFIC_BUILD > __asm __volatile("mfsprg0 %0\n\t" > "mtsprg1 %1\n\t" > "mtsprg2 %2\n\t" > "mtsprg3 %3\n\t" > : "=3D&r"(ofw_sprg0_save) > : "r"(ofmsr[2]), > "r"(ofmsr[3]), > "r"(ofmsr[4])); > #else > // The historical code: > __asm __volatile("mfsprg0 %0\n\t" > "mtsprg0 %1\n\t" > "mtsprg1 %2\n\t" > "mtsprg2 %3\n\t" > "mtsprg3 %4\n\t" > : "=3D&r"(ofw_sprg0_save) > : "r"(ofmsr[1]), > "r"(ofmsr[2]), > "r"(ofmsr[3]), > "r"(ofmsr[4])); > #endif >=20 > In other words: for PowerMac G5's omit the mtsprg0 from ofmsr[1]: = leave the register as it already is instead of resetting it. The value = in ofmsr[1] is inappropriate to the context. I deliberately kept the = change minimal and left in all other code related to the register. All the evidence for this hack is observational. I've never figured out = a reasonable way to find out what Apple's openfirmware does with the = register involved and in what contexts. I wish I had better evidence for = what is going on without the hack. The type of evidence that I have = makes this purely a hack for now, even if it has a theory of operation = justification (that is not known yet). But as for the degree of observations: in isolating this I did well over = 10,000 failing boots (spread over months, although not continuous = activity). Frequently I'd have to try booting over a dozen times in a = row before it would make it through. That is part of why the total is so = large. After the hack I've not had any such failing boots up --but I = boot far less frequently since I do not need to force a reboot. (I = always buildworld buildkernel from source and my source has the hack.) I've no post-early-2016-June evidence relative to the hack. The lists have more information from as I investigated the issue, such = as the memory and register corruptions that I observed prior to = isolating the small change. But it is a mess to go through those notes = in any detail. Not likely without a strong motivation. I've no evidence that the change would be appropriate outside a PowerMac = G5 at all. This alone would keep FreeBSD from adopting it in a generic = build (even if there was a PowerMac G5 theory of operation justification = known). The submittal only suggested having a pre-made hook for manually = building from source for a PowerMac G5. Part of the issue is that I do not know a way to identify the context as = a PowerMac G5 context without use of openfirmware. Any use of = openfirmware to figure that out would re-create the problem as far as I = can tell. It appears that the build needs to be PowerMac G5 specific to = avoid the problem. I will note that I've never needed or used the hack on Powermac G4's or = a PowerMac G3. But, again, my evidence ends in early-2016-June. =3D=3D=3D Mark Millard markmi at dsl-only.net