From owner-freebsd-ppc@FreeBSD.ORG Mon Oct 13 02:25:54 2014 Return-Path: Delivered-To: freebsd-ppc@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id EE29BAD8 for ; Mon, 13 Oct 2014 02:25:54 +0000 (UTC) Received: from c.mail.sonic.net (c.mail.sonic.net [64.142.111.80]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id D3147A61 for ; Mon, 13 Oct 2014 02:25:54 +0000 (UTC) Received: from zeppelin.tachypleus.net (polaris.tachypleus.net [75.101.50.44]) (authenticated bits=0) by c.mail.sonic.net (8.14.9/8.14.9) with ESMTP id s9D2Pifw020508 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES128-SHA bits=128 verify=NOT); Sun, 12 Oct 2014 19:25:45 -0700 Message-ID: <543B3828.8070806@freebsd.org> Date: Sun, 12 Oct 2014 19:25:44 -0700 From: Nathan Whitehorn User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:31.0) Gecko/20100101 Thunderbird/31.1.2 MIME-Version: 1.0 To: Mark Millard , FreeBSD PowerPC ML Subject: Re: My PowerMac G5's no longer crash at boot: PowerMac G5 specific ofwcall changes with justifying evidence References: <76F704FD-BB74-4439-8318-DB4C167B420F@dsl-only.net> In-Reply-To: <76F704FD-BB74-4439-8318-DB4C167B420F@dsl-only.net> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-Sonic-CAuth: UmFuZG9tSVag0IIaP3xZSHVekSlxQORoItEIPEpBgFJpLEXdVzT9hsTwCC4usinvLyy5uOoNq3booSO0MVtpMt17tXAmnFaOUd14WhQEdCM= X-Sonic-ID: C;DljwO4BS5BG8pYR6lZB5Vg== M;tAtkPIBS5BG8pYR6lZB5Vg== X-Spam-Flag: No X-Sonic-Spam-Details: 0.0/5.0 by cerberusd Cc: Justin Hibbits X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 13 Oct 2014 02:25:55 -0000 Interesting. If OF is changing the value of r1, there must be some problem with the ABI thunk the 64-bit kernel uses or a problem with trap handlers. This is obviously not systematic if loader and the kernel up to that point have no problems. Does a 32-bit kernel have the same problems on your hardware? That would test whether it is the ABI translation. -Nathan On 10/12/14 17:53, Mark Millard wrote: > NOTE: I make no claim that any of the below hacks for ofwcall are appropriate code for FreeBSD's general context. I only claim that it seems to make the specific PowerMac G5 problem go away, gives solid evidence for at least some of what is going on (justifying the investigative and testing hacks) and so gives evidence for an appropriate, more general FreeBSD solution. > > > The big issue is: The PowerMac G5 openfirmware does not always preserve the %r1 value (the stack pointer contents) that it is initially given, at least when the early "before copyright" crash problem is happening but possibly other times as well. > > I had the following investigative code in ofwcall, snapshotting the value of %r1 before and after openfirmware's code is used: > > lis %r4,openfirmware_entry@ha > ld %r4,openfirmware_entry@l(%r4) > ... > mr %r17,%r1 /* ADDED HACK TO RECORD %r1 before... > /* Finally, branch to OF */ > mtctr %r4 > bctrl > mr %r18,%r1 /* ADDED HACK TO RECORD %r1 after... > > then the DDB show registers from the crash that I'd hacked in would show these values instead of the zeros they otherwise always display, in addition to what the show registers has always shown for r1. > > The results were like the following example for every such crash: > > r17 = 0xC31400 ofwstk+0xfe0 > r18 = 0xd24450 > r1 = 0xd24450 > > Because of that %r1 value the later code such as: > > /* Reload stack pointer and MSR from the OFW stack */ > ld %r6,24(%r1) > ld %r2,16(%r1) > ld %r1,8(%r1) > > gets garbage-in/garbage-out results, including %r6 being values like 0xbc0568 instead of the value saved msr to later be restored: 0x9000000000001032. > > So one PowerMac G5 specific hack involved in my working-boots context is to force the original %r1 value to be used (based on %r17 being a before-call copy, similar to the above): > > ld %r6,24(%r17) > ld %r2,16(%r17) > ld %r1,8(%r17) > > But the exception report from DDB has had problems in part because sprg0 still has the openfirmware value at the time even though the exception is after openfirmware returned (the wrong value results in the register for GET_CPUINFO(). So I hacked in a before-exception restore of FreeBSD's sprg0 inside ofwcall to make the exception handler code have that much FreeBSD context available at the exception (if it occurs, anyway). This was really just to help with information gathering, although I've not tested only having the %r17 changes. > > So overall PowerMac G5 specific hacking the ofwcall code to have instead (based on what was reported above): > > root@FBSDG5M1:~ # svnlite diff /usr/src/sys/powerpc/ofw/ofwcall64.S > Index: /usr/src/sys/powerpc/ofw/ofwcall64.S > =================================================================== > --- /usr/src/sys/powerpc/ofw/ofwcall64.S (revision 272558) > +++ /usr/src/sys/powerpc/ofw/ofwcall64.S (working copy) > @@ -52,6 +52,12 @@ > GLOBAL(rtas_entry) > .llong 0 /* RTAS entry point */ > > + /* HACK: part of having sprg0 in place for trap */ > +ofwsprg0save: > + .space 8 /* sizeof(register_t) */ > +GLOBAL(ofw_sprg0_save) > + .llong 0 > + > /* > * Open Firmware Real-mode Entry Point. This is a huge pain. > */ > @@ -97,6 +103,10 @@ > lis %r4,openfirmware_entry@ha > ld %r4,openfirmware_entry@l(%r4) > > + /* HACK: part of having FreeBSD's sprg0 in place for the exception problem */ > + lis %r14,ofw_sprg0_save@ha > + ld %r14,ofw_sprg0_save@l(%r14) > + > /* > * Set the MSR to the OF value. This has the side effect of disabling > * exceptions, which is important for the next few steps. > @@ -123,14 +133,27 @@ > stw %r5,4(%r1) > stw %r5,0(%r1) > > + /* HACK: part of having FreeBSD's sprg0 in place for the exception problem */ > + lis %r6,ofwsprg0save@ha > + std %r14,ofwsprg0save@l(%r6) > + > + /* HACK: part of IGNORING the later %r1 value from openfirmware */ > + mr %r17,%r1 > + > /* Finally, branch to OF */ > mtctr %r4 > bctrl > > + /* HACK: part of having FreeBSD's sprg0 in place for the exception problem */ > + lis %r6,ofwsprg0save@ha > + ld %r6,ofwsprg0save@l(%r6) > + mtsprg0 %r6 > + > /* Reload stack pointer and MSR from the OFW stack */ > - ld %r6,24(%r1) > - ld %r2,16(%r1) > - ld %r1,8(%r1) > + /* HACKED to ignore the %r1 value that results from openfirmware's call */ > + ld %r6,24(%r17) > + ld %r2,16(%r17) > + ld %r1,8(%r17) > > /* Now set the real MSR */ > mtmsrd %r6 > > This results in no crashes happening so far in my testing, not even the 16 GByte RAM machine that crashed so much. > > NOTE: owf_machdep.c was changed to use "extern register_t ofw_sprg0_save;" to match the above. > > I still have ps3 disabled in GENERIC64 so that I can also have the sc options in GENERIC64. And the DDB and GDB options are still present as well. > > And I still have my hack to force a DDB script that does show registers and shows the ofwcall history information that I hacked in, even for the very early crashes before input is possible. Not that I'm now getting such executions of the script. (A before possible-crash backtrace is also shown by the added code. That still shows up.) > > I'll probably next switch to reverting the DDB related code changes and to removing the DDB/GDB options and see how that goes. > > > === > Mark Millard > markmi at dsl-only.net > >