From owner-freebsd-ppc@FreeBSD.ORG Mon Oct 13 00:53:59 2014 Return-Path: Delivered-To: freebsd-ppc@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id A999B9AA for ; Mon, 13 Oct 2014 00:53:59 +0000 (UTC) Received: from asp.reflexion.net (outbound-241.asp.reflexion.net [69.84.129.241]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 50BE81000 for ; Mon, 13 Oct 2014 00:53:58 +0000 (UTC) Received: (qmail 5456 invoked from network); 13 Oct 2014 00:53:51 -0000 Received: from unknown (HELO mail-cs-01.app.dca.reflexion.local) (10.81.19.1) by 0 (rfx-qmail) with SMTP; 13 Oct 2014 00:53:51 -0000 Received: by mail-cs-01.app.dca.reflexion.local (Reflexion email security v7.30.7) with SMTP; Sun, 12 Oct 2014 20:53:51 -0400 (EDT) Received: (qmail 5003 invoked from network); 13 Oct 2014 00:53:51 -0000 Received: from unknown (HELO iron2.pdx.net) (69.64.224.71) by 0 (rfx-qmail) with (DHE-RSA-AES256-SHA encrypted) SMTP; 13 Oct 2014 00:53:51 -0000 X-No-Relay: not in my network X-No-Relay: not in my network X-No-Relay: not in my network Received: from [192.168.1.8] (c-98-246-178-138.hsd1.or.comcast.net [98.246.178.138]) by iron2.pdx.net (Postfix) with ESMTPSA id 7C47E1C4015; Sun, 12 Oct 2014 17:53:45 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\)) Subject: My PowerMac G5's no longer crash at boot: PowerMac G5 specific ofwcall changes with justifying evidence From: Mark Millard X-Priority: 1 Date: Sun, 12 Oct 2014 17:53:49 -0700 Content-Transfer-Encoding: quoted-printable Message-Id: <76F704FD-BB74-4439-8318-DB4C167B420F@dsl-only.net> To: FreeBSD PowerPC ML , Nathan Whitehorn X-Mailer: Apple Mail (2.1878.6) Cc: Justin Hibbits X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 13 Oct 2014 00:53:59 -0000 NOTE: I make no claim that any of the below hacks for ofwcall are = appropriate code for FreeBSD's general context. I only claim that it = seems to make the specific PowerMac G5 problem go away, gives solid = evidence for at least some of what is going on (justifying the = investigative and testing hacks) and so gives evidence for an = appropriate, more general FreeBSD solution. The big issue is: The PowerMac G5 openfirmware does not always preserve = the %r1 value (the stack pointer contents) that it is initially given, = at least when the early "before copyright" crash problem is happening = but possibly other times as well. I had the following investigative code in ofwcall, snapshotting the = value of %r1 before and after openfirmware's code is used: lis %r4,openfirmware_entry@ha ld %r4,openfirmware_entry@l(%r4) ... mr %r17,%r1 /* ADDED HACK TO RECORD %r1 before... /* Finally, branch to OF */ mtctr %r4 bctrl mr %r18,%r1 /* ADDED HACK TO RECORD %r1 after... then the DDB show registers from the crash that I'd hacked in would show = these values instead of the zeros they otherwise always display, in = addition to what the show registers has always shown for r1. The results were like the following example for every such crash: r17 =3D 0xC31400 ofwstk+0xfe0 r18 =3D 0xd24450 r1 =3D 0xd24450 Because of that %r1 value the later code such as: /* Reload stack pointer and MSR from the OFW stack */ ld %r6,24(%r1) ld %r2,16(%r1) ld %r1,8(%r1) gets garbage-in/garbage-out results, including %r6 being values like = 0xbc0568 instead of the value saved msr to later be restored: = 0x9000000000001032. So one PowerMac G5 specific hack involved in my working-boots context is = to force the original %r1 value to be used (based on %r17 being a = before-call copy, similar to the above): ld %r6,24(%r17) ld %r2,16(%r17) ld %r1,8(%r17) But the exception report from DDB has had problems in part because sprg0 = still has the openfirmware value at the time even though the exception = is after openfirmware returned (the wrong value results in the register = for GET_CPUINFO(). So I hacked in a before-exception restore = of FreeBSD's sprg0 inside ofwcall to make the exception handler code = have that much FreeBSD context available at the exception (if it occurs, = anyway). This was really just to help with information gathering, = although I've not tested only having the %r17 changes. So overall PowerMac G5 specific hacking the ofwcall code to have instead = (based on what was reported above): root@FBSDG5M1:~ # svnlite diff /usr/src/sys/powerpc/ofw/ofwcall64.S Index: /usr/src/sys/powerpc/ofw/ofwcall64.S =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- /usr/src/sys/powerpc/ofw/ofwcall64.S (revision 272558) +++ /usr/src/sys/powerpc/ofw/ofwcall64.S (working copy) @@ -52,6 +52,12 @@ GLOBAL(rtas_entry) .llong 0 /* RTAS entry point */ =20 + /* HACK: part of having sprg0 in place for trap */ +ofwsprg0save: + .space 8 /* sizeof(register_t) */ +GLOBAL(ofw_sprg0_save) + .llong 0 + /* * Open Firmware Real-mode Entry Point. This is a huge pain. */ @@ -97,6 +103,10 @@ lis %r4,openfirmware_entry@ha ld %r4,openfirmware_entry@l(%r4) =20 + /* HACK: part of having FreeBSD's sprg0 in place for the = exception problem */ + lis %r14,ofw_sprg0_save@ha + ld %r14,ofw_sprg0_save@l(%r14) + /* * Set the MSR to the OF value. This has the side effect of = disabling * exceptions, which is important for the next few steps. @@ -123,14 +133,27 @@ stw %r5,4(%r1) stw %r5,0(%r1) =20 + /* HACK: part of having FreeBSD's sprg0 in place for the = exception problem */ + lis %r6,ofwsprg0save@ha + std %r14,ofwsprg0save@l(%r6) + + /* HACK: part of IGNORING the later %r1 value from openfirmware = */ + mr %r17,%r1 + /* Finally, branch to OF */ mtctr %r4 bctrl =20 + /* HACK: part of having FreeBSD's sprg0 in place for the = exception problem */ + lis %r6,ofwsprg0save@ha + ld %r6,ofwsprg0save@l(%r6) + mtsprg0 %r6 + /* Reload stack pointer and MSR from the OFW stack */ - ld %r6,24(%r1) - ld %r2,16(%r1) - ld %r1,8(%r1) + /* HACKED to ignore the %r1 value that results from = openfirmware's call */ + ld %r6,24(%r17) + ld %r2,16(%r17) + ld %r1,8(%r17) =20 /* Now set the real MSR */ mtmsrd %r6 This results in no crashes happening so far in my testing, not even the = 16 GByte RAM machine that crashed so much. NOTE: owf_machdep.c was changed to use "extern register_t = ofw_sprg0_save;" to match the above. I still have ps3 disabled in GENERIC64 so that I can also have the sc = options in GENERIC64. And the DDB and GDB options are still present as = well. And I still have my hack to force a DDB script that does show registers = and shows the ofwcall history information that I hacked in, even for the = very early crashes before input is possible. Not that I'm now getting = such executions of the script. (A before possible-crash backtrace is = also shown by the added code. That still shows up.) I'll probably next switch to reverting the DDB related code changes and = to removing the DDB/GDB options and see how that goes. =3D=3D=3D Mark Millard markmi at dsl-only.net