Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 28 Feb 2015 07:10:59 -0800
From:      Mark Millard <markmi@dsl-only.net>
To:        FreeBSD PowerPC ML <freebsd-ppc@freebsd.org>, Nathan Whitehorn <nwhitehorn@freebsd.org>
Subject:   Large RAM Powermac G5 powerpc64 openfirmware entry point use: I seem to have removed the major/frequent boot problem(s)
Message-ID:  <EF1EC459-BC46-42D5-9C3D-ABCBEF90AC44@dsl-only.net>

next in thread | raw e-mail | index | archive | help
I have removed my re-try logic that I had in ofwcall because other =
changes I've made got rid of the %r1/%r3 corruption issue that I used to =
have so frequently when booting PowerMac G5s that have lots of RAM (8G, =
12G, or 16G).

Also I'm no longer getting the specific .got area corruption that =
authnone code was failing on when it used its %r2 to access various =
things when I built with VERBOSE_SYSINIT. (It is hard to have evidence =
that no corruption is happening anywhere.)

I've done multiple "make -j 8 buildworld kernel && make installworked" =
examples under the changes and have updated to 10.1-STABLE -r279201 =
under these changes. I've done similarly for with and without =
VERBOSE_SYSINIT and the like. I've done a half dozen build/install =
reboot and use sequences.

I've booted and used both a PowerMac11,2 and a Powermac7,2 from the same =
SSD. (I do not have access to other PowerMac G5 variants.) (I've done no =
investigation of analogous points yet for powerpc PowerMacs: only =
powerpc64 usage of G5s.)

[I would not claim that this addresses the less frequent boot failure =
tracebacks that I've reported in the past: just the most frequent one =
and the recent .got corruption evidence.]



So what did I do? It was limited to openfirmware_core, ofw_sprg_prepare, =
and ofwcall.  The bias was to simplify the context involved in my =
investigation, not to produce production FreeBSD code =
structured/partitioned for general use. The below is not in the order of =
investigation but is a description of the net result. I do not claim =
that all the eliminations involved were necessary but I had less code to =
worry about this way.

0) I've looked at some of the openfirmware entry point code to see what =
appeared to be needed vs. not needed.

1) I've looked at some of the BootX code and what it compiles to see an =
example of what was sufficient in that context.

For sys/powerpc/ofw/ofwcall64.S...

2) Despite working in/with/on 10.1-STABLE I started from ofwcall64.S =
from a recent 11.0-CURRENT, from after the powerpc64 kernel-relocation =
changes.
    (And I used the matching sys/powerpc/include/asm.h update so =
ofwcall64.S would compile in 10.1-STABLE.)

3) I removed things that I found to not be necessary: ofwstk and its use =
(openfirmware does its own storage management for such things), use of =
ofwmsr storage (msr being just locally in registers for my code, =
SPRG<?>'s not directly used), saving and restoring unused non-volatile =
registers (openfirmware does 64-bit save/restore of the non-volatile =
registers that it uses). BootX also does nothing special for %r1 when =
calling openfirmware.

4) I added/kept what was needed (possibly expressed differently): =
forcing 32-bit mode before and putting msr back to its original status =
after, putting back %r2's TOC value (openfirmware uses Darwin's rule =
that %r2 is volatile/non-dedicated --no TOC use-- and it replaces r2's =
value), sign extend %r3.

5) In ofwcall I followed all the ABI rules that I've learned of, both =
FreeBSD powerpc64 and Darwin ABI rules (e.g., the %r12 indirect call =
address rule). This includes covering %r1, cr, lr, %r14, %r15 =
saves/restores back to caller's status. (%r14 and %r15 were used in my =
ofwcall.)

6) My ofwcall does increment a count stored in memory if it sees a =
changed %r1 when openfirmware returns. This allows checking if it ever =
happened via kgdb. It also records the most recent such new %r1 value. =
This part of the code I consider temporary a evidence gathering hack in =
case problems eventually happen.

For sys/powerpc/ofw/ofw_machdep.c ...

7) In ofw_sprg_prepare I eliminated the unnecessary SPRG<?> register =
handling, leaving only SPRG0 (for later restoring required FreeBSD =
context) and SPRG3 (for restoring required Openfirmware context).

8) In/for openfirmware_core I eliminated all the trap vector =
save/restore calls. If openfirmware needs any of its exception vector =
code it seems to patch/unpatch as necessary. BootX also does not context =
switch the vectors but just leaves its own in place. I also eliminated =
the extern save_trap_init declaration and the save_trap_of storage. =
Because ofwcall no longer had ofwmsr involved, I removed ofwmsr's extern =
status so it would provide the definition for the link. (I did  not =
change the size or the positions assigned for the used parts of ofwmsr.) =
Much of this was commenting-out activity.

(Note: I choose to ignore that the trap vector save/restore calls have =
code to disable the activity based on ofw_real_mode's value and make it =
locally obvious that no such code was in use for my testing.)

9) I did not eliminate the now-otherwise-unused save_trap_init storage =
or the ofw_save_trap_vec call from powerpc_init in =
sys/powerpc/aim/machdep.c. But I could have --at which point =
ofw_save_trap_vec could also have been eliminated. Also I'm not =
reporting on temporary evidence/investigation code that I had at various =
times but have removed over time.


The result was the frequent boot crashes disappeared and I've no =
evidence so far of %r1/%r3 corruption or the .got corruption that I'd =
reported on earlier.



My openfirmware_core, ofw_sprg_prepare, and ofwcall are too PowerMac G5 =
context-specific for general FreeBSD use. But they should well point to =
a now-known way to improve booting for PowerMac G5's.




Other/supporting notes:

I finally figured out that the ddb code for x/i is incomplete for =
Powerpc64.

For example DDB's x/i calls mtmsrd an Illegal Instruction.

One can see this if you have it decode ofwcall from memory and compare =
to the source code.

Because of this the decoding of the first instructions for the =
openfirmware entry point (0xff846d78 starting address on the G5 =
Quad-Core) is:

   or      r2,r0,r2,
   addis   r2,r0,-0x49
   ori     r2,r2,0xf000 /* so %r2:=3D0xFFB7F000: fixed address (32-bit =
mode). */
   std     r1,r2,0x8,   /* %r1 saved to have a special, separate copy */
   std     r0,r2,0x10,  /* more saves to fixed locations */
   mfspr   r0,lr
   std     r0,r2,0x120, /* more saves to fixed locations: return address =
*/
   mfmsr   r1
   std     r1,r2,0x108, /* more saves to fixed locations: msr */
   rldicl  r1,r1,0,1    /* clears the most significant %r1 bit */
   0x7c200164 (actually mtmsrd %r1) /* forces 32-bit mode */
   isync

Unfortunately for 64-bit mode at the start: %r2=3D0xFF...FFB7F000 and =
std r1,r2,0x8, ends up rejecting the effective address. Apple does not =
force 32-bit mode until a little later in the above. Thus ofwcall does =
need to force 32-bit mode and return it back to normal, despite the msr =
and other save/restore code that openfirmware has.=20

(I've not explored trying to set up a mapping for the involved 64-bit =
effective address range in order to allow the translation of the 64-bit =
addresses in openfirmware's %r2 above. With that it might be that 64-bit =
mode could be left in place.)

As far as I can tell this %r2 value and its use is the only reason that =
FreeBSD needs to establish 32-bit mode before the call into =
openfirmware.

%r2 needs to be restored after openfirmware returns since openfirmware =
is using the Darwin ABI where %r2 is non-volatile and non-dedicated (no =
use of TOCs), as can be seen above.


=3D=3D=3D
Mark Millard
markmi at dsl-only.net




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?EF1EC459-BC46-42D5-9C3D-ABCBEF90AC44>