From owner-freebsd-ppc@FreeBSD.ORG Sun Oct 19 07:43:22 2014 Return-Path: Delivered-To: freebsd-ppc@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id EE78D541 for ; Sun, 19 Oct 2014 07:43:22 +0000 (UTC) Received: from asp.reflexion.net (outbound-241.asp.reflexion.net [69.84.129.241]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 79E7D676 for ; Sun, 19 Oct 2014 07:43:21 +0000 (UTC) Received: (qmail 12712 invoked from network); 19 Oct 2014 07:43:13 -0000 Received: from unknown (HELO mail-cs-01.app.dca.reflexion.local) (10.81.19.1) by 0 (rfx-qmail) with SMTP; 19 Oct 2014 07:43:13 -0000 Received: by mail-cs-01.app.dca.reflexion.local (Reflexion email security v7.30.7) with SMTP; Sun, 19 Oct 2014 03:43:13 -0400 (EDT) Received: (qmail 5700 invoked from network); 19 Oct 2014 07:43:13 -0000 Received: from unknown (HELO iron2.pdx.net) (69.64.224.71) by 0 (rfx-qmail) with (DHE-RSA-AES256-SHA encrypted) SMTP; 19 Oct 2014 07:43:13 -0000 X-No-Relay: not in my network X-No-Relay: not in my network X-No-Relay: not in my network Received: from [192.168.1.8] (c-98-246-178-138.hsd1.or.comcast.net [98.246.178.138]) by iron2.pdx.net (Postfix) with ESMTPSA id E5558B1E001; Sun, 19 Oct 2014 00:43:11 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\)) Subject: Re: My PowerMac G5's no longer crash at boot: PowerMac G5 specific ofwcall changes with justifying evidence [current workaround] From: Mark Millard In-Reply-To: <0CEC8978-E208-4F57-8481-DD9C321EF673@dsl-only.net> Date: Sun, 19 Oct 2014 00:43:11 -0700 Content-Transfer-Encoding: quoted-printable Message-Id: <0EAE6493-FF6B-4F90-8C7B-F32A62DBD6B7@dsl-only.net> References: <76F704FD-BB74-4439-8318-DB4C167B420F@dsl-only.net> <543B3828.8070806@freebsd.org> <9D9B0372-8D8F-4153-85B5-40066206EF67@dsl-only.net> <379AA7FC-98C9-48B9-92BB-60E134817AF1@dsl-only.net> <543D5ACD.20901@freebsd.org> <3D4A76B3-431A-4C94-8747-70369A8A1764@dsl-only.net> <0F85ACBD-F6D6-4ABA-B8FA-00C586A086DE@dsl-only.net> <49920E63-CB4A-429C-AB3A-984075AE183D@dsl-only.net> <0CEC8978-E208-4F57-8481-DD9C321EF673@dsl-only.net> To: Nathan Whitehorn X-Mailer: Apple Mail (2.1878.6) Cc: Justin Hibbits , FreeBSD PowerPC ML X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 19 Oct 2014 07:43:23 -0000 Short of extracting and analyzing the openfirmware code and its behavior = directly I've run out of ideas for investigation of the %r1 and %r3 = corruptions during openfirmware calls on the PowerMac G5's. So my next investigative direction will probably be to hack in %r1 and = %r3 validation into the powerpc/GENERIC ofwcall 32 bit code and have it = report if it finds anything odd. This may take a while for me to get to. = And some time to conclude that nothing is being found if nothing is = found. I believe that given the known problems and observed %r1 and %r3 = corruptions that the FreeBSD ofwcall code for powerpc64 on PowerMacs = would be safer if ofwcall was changed to have the following properties = (at least on/for powerpc64 PowerMacs): A) check if %r3 ends up neither 0 nor -1 and if not then change it to -1 = for what is returned overall. In other words: do not presume things are = okay with other information returned other ways (fields of struct = pointed to by argument) unless the returned openfirmware status in %r3 = is exactly zero. So otherwise have the openfirmware error indicator (-1) = returned from ofwcall. [Do all openfirmware's have the one's complement Boolean style return = values (0 vs. -1) that PowerMac G5's seem to have? If not the code above = would fail to be very general.] B) Similarly check for if %r1 had a net-change (a corruption) and use = the known/recorded before-value and have %r3 be -1 to get to the point = of returning to the caller a failure status to the code calling ofwcall. C) Possibly have one automatic retry of the openfirmware call if (A) or = (B) type problems happen before having such a failure (-1) return. = Re-setup %r1 and %r3 first for such a retry if such is attempted. Handle = retry-failure as in (A) and (B) above. [This comes from my investigation only finding one-time-failures in the = sequence of ofwcall's: after a failure later calls from the same boot = sequence and until shutdown worked without observed corruptions of %r1 = or %r3.] D) As paranoia for now: Have a general bias to not depending on most = registers being preserved across the openfirmware call since bad = register values are part of the observed problem. Probably be biased to = mostly use the registers that ofwcall already explicit saves and = restores (non-volatile registers that openfirmware should also = explicitly save and restore) but use separate storage to save and then = recover values across any calls into openfirmware. However, such changes would mean that such PowerMac builds would not be = generic FreeBSD code unless such things were tolerable for the other = powerpc64 contexts that use ofwcall from ofwcall64.S. My code for this below certainly qualifies as a personal hack based on = information specific to PowerMac G5's. I have also left in place the = early restore of the FreeBSD sprg0 value that allowed the original = exception to have a proper value to use during my investigations. (Those = specific exceptions should no longer be possible in my code.) I've got = ofw_sprg0_save being accessible and used from both ofw_machdep.c and = ofwcalla64.S because of leaving this paranoia item in place. I also have DDB/GDB option additions in GENERIC64 and ddb hacks such = that early crashes tend to "bt; show registers" before hanging. (There = is also the PS3 disable and the addition of sc.) My context is still 10.1-RC1 based. /etc/make.conf with = WITH_DEBUG_FILES=3D , WITHOUT_CLANG=3D , WITH_DEBUG=3D , and = WORKDIRPREFIX assigned. I tend to have verbose_loading=3D"YES" in = /boot/loader.conf . kern.vty depends on which video hardware is = involved. Panic dumps are effectively disabled by it attempting larger = dma transfers than are actually supported: that that size relationship = ends up reported instead. root@FBSDG5M1:/usr/home/markmi # svnlite diff /usr/src/sys/ Index: /usr/src/sys/ddb/db_main.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- /usr/src/sys/ddb/db_main.c (revision 272558) +++ /usr/src/sys/ddb/db_main.c (working copy) @@ -46,6 +46,9 @@ #include #include =20 +/* HACK: part of dealing with lack of input for early boot time */ +#include + SYSCTL_NODE(_debug, OID_AUTO, ddb, CTLFLAG_RW, 0, "DDB settings"); =20 static dbbe_init_f db_init; @@ -210,6 +213,9 @@ watchpt =3D IS_WATCHPOINT_TRAP(type, code); =20 if (db_stop_at_pc(&bkpt)) { + /* HACK: part of early boot handling: no input possible = */ + db_disable_pager(); + if (db_inst_count) { db_printf("After %d instructions (%d loads, %d = stores),\n", db_inst_count, db_load_count, = db_store_count); Index: /usr/src/sys/ddb/db_script.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- /usr/src/sys/ddb/db_script.c (revision 272558) +++ /usr/src/sys/ddb/db_script.c (working copy) @@ -319,10 +319,25 @@ { char scriptname[DB_MAXSCRIPTNAME]; =20 + /* HACK!!! : Additional lines to force a basic default script to = exist. + * Will dump information even if ddb input is not available for = early crash. + * Used to get more information about PowerMac G5 "before = Copyright" hangs. + */ + struct ddb_script *dsp =3D = db_script_lookup(DB_SCRIPT_KDBENTER_DEFAULT); + if (!dsp) db_script_set(DB_SCRIPT_KDBENTER_DEFAULT, "bt; show = registers"); + snprintf(scriptname, sizeof(scriptname), "%s.%s", DB_SCRIPT_KDBENTER_PREFIX, eventname); if (db_script_exec(scriptname, 0) =3D=3D ENOENT) (void)db_script_exec(DB_SCRIPT_KDBENTER_DEFAULT, 0); + + /* HACK!!! : Additional lines to always use the default script, + * even if scriptname existed and was executed. + * Will dump information even if ddb input is not available for = early crash. + * Used to get more information about PowerMac G5 "before = Copyright" hangs. + */ + else + (void)db_script_exec(DB_SCRIPT_KDBENTER_DEFAULT, 0); } =20 /*- Index: /usr/src/sys/powerpc/conf/GENERIC64 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- /usr/src/sys/powerpc/conf/GENERIC64 (revision 272558) +++ /usr/src/sys/powerpc/conf/GENERIC64 (working copy) @@ -28,7 +28,7 @@ =20 # Platform support options POWERMAC #NewWorld Apple PowerMacs -options PS3 #Sony Playstation 3 +#options PS3 #Sony Playstation 3 = HACK!!! to allow sc options MAMBO #IBM Mambo Full System Simulator options PSERIES #PAPR-compliant systems (e.g. = IBM p) =20 @@ -76,6 +76,12 @@ # Debugging support. Always need this: options KDB # Enable kernel debugger = support. options KDB_TRACE # Print a stack trace for a = panic. +options DDB # HACK!!! to dump early crash = info +options GDB # HACK!!! ... +#options KTR +#options KTR_MASK=3DKTR_TRAP +#options KTR_CPUMASK=3D0xF +#options KTR_VERBOSE =20 # Make an SMP-capable kernel by default options SMP # Symmetric MultiProcessor = Kernel @@ -115,6 +121,14 @@ device vt # Core console driver device kbdmux =20 +# HACK!!! to allow sc for 2560x1440 display on Radeon X1950 that vt = mishandled +# syscons is a console driver, resembling an SCO console +device sc +#device kbdmux # HACK: already listed by vt +options SC_OFWFB # OFW frame buffer +options SC_DFLT_FONT # compile font in +makeoptions SC_DFLT_FONT=3Dcp437 + # Serial (COM) ports device scc device uart Index: /usr/src/sys/powerpc/ofw/ofw_machdep.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- /usr/src/sys/powerpc/ofw/ofw_machdep.c (revision 272558) +++ /usr/src/sys/powerpc/ofw/ofw_machdep.c (working copy) @@ -94,6 +94,11 @@ /* * Saved SPRG0-3 from OpenFirmware. Will be restored prior to the = callback. */ +/* HACK: ofw_sprg0_save storage defined in ofwcall + * for use in very early FreeBSD sprg0 restore + * as part of ready-for-possible-exception parania. + */ +extern register_t ofw_sprg0_save; =20 static __inline void Index: /usr/src/sys/powerpc/ofw/ofwcall64.S =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- /usr/src/sys/powerpc/ofw/ofwcall64.S (revision 272558) +++ /usr/src/sys/powerpc/ofw/ofwcall64.S (working copy) @@ -52,6 +52,20 @@ GLOBAL(rtas_entry) .llong 0 /* RTAS entry point */ =20 + /* HACK: part of dealing with openfirmware %r1, %r3 corruptions */ +ofw_entry_addr: /* accessed under ofw = msr */ + .space 4 +ofw_r1_for_retry: /* accessed under ofw msr */ + .space 4 +ofw_r3_for_retry: /* accessed under ofw msr */ + .space 4 + + /* HACK: part of having FreeBSD sprg0 in place for potential = exceptions */ +ofwsprg0save: /* accessed under ofw msr */ + .space 8 /* sizeof(register_t) */ +GLOBAL(ofw_sprg0_save) /* accessed under FreeBSD msr */ + .llong 0 + /* * Open Firmware Real-mode Entry Point. This is a huge pain. */ @@ -90,50 +104,121 @@ std %r30,192(%r1) std %r31,200(%r1) =20 + /* HACK: Avoid depending much on preserved registers + * and be biased to use the ones saved above + */ + /* Record the old MSR */ - mfmsr %r6 + mfmsr %r14 =20 /* read client interface handler */ - lis %r4,openfirmware_entry@ha - ld %r4,openfirmware_entry@l(%r4) + lis %r15,openfirmware_entry@ha + ld %r15,openfirmware_entry@l(%r15) =20 + /* HACK: part of having FreeBSD's sprg0 in place for exceptions. + * Parania code at this point since corrupted %r1 values = are + * avoided by forcing the before-openfirmware value. + */ + lis %r16,ofw_sprg0_save@ha + ld %r16,ofw_sprg0_save@l(%r16) + /* * Set the MSR to the OF value. This has the side effect of = disabling * exceptions, which is important for the next few steps. + * NOTE: The call chain may well have already disabled such in = FreeBSD's + * msr. */ =20 - lis %r5,ofmsr@ha - ld %r5,ofmsr@l(%r5) - mtmsrd %r5 + lis %r17,ofmsr@ha + ld %r17,ofmsr@l(%r17) + mtmsrd %r17 isync =20 /* * Set up OF stack. This needs to be accessible in real mode and * use the 32-bit ABI stack frame format. The pointer to the = current - * kernel stack is placed at the very top of the stack along = with - * the old MSR so we can get them back later. + * kernel stack is placed below the effective ofw-stack along = with the + * active FreeBSD TOC and FreeBSD MSR so we can get them back = later. */ - mr %r5,%r1 + mr %r18,%r1 lis %r1,(ofwstk+OFWSTKSZ-32)@ha addi %r1,%r1,(ofwstk+OFWSTKSZ-32)@l - std %r5,8(%r1) /* Save real stack pointer */ - std %r2,16(%r1) /* Save old TOC */ - std %r6,24(%r1) /* Save old MSR */ - li %r5,0 - stw %r5,4(%r1) - stw %r5,0(%r1) + std %r18,8(%r1) /* Save FreeBSD stack pointer */ + std %r2,16(%r1) /* Save FreeBSD TOC */ + std %r14,24(%r1) /* Save FreeBSD MSR */ + li %r19,0 + stw %r19,4(%r1) + stw %r19,0(%r1) =20 + /* HACK: Avoid depending much on preserved registers */ + + /* HACK: recording openfirmware entry address for use in = possible retry */ + lis %r20,ofw_entry_addr@ha + stw %r15,ofw_entry_addr@l(%r20) + + /* HACK: recording %r1 before openfirmware for use in possible = retry + * and also for testing for corruption (net-change) + */ + lis %r21,ofw_r1_for_retry@ha + stw %r1,ofw_r1_for_retry@l(%r21) + + /* HACK: recording %r3 before openfirmware for use in possible = retry */ + lis %r22,ofw_r3_for_retry@ha + stw %r3,ofw_r3_for_retry@l(%r22) + + /* HACK: part of having FreeBSD's sprg0 in place for exceptions. + * Parania code at this point since corrupted %r1 values = are + * avoided by forcing the before-openfirmware value. + */ + lis %r23,ofwsprg0save@ha + std %r16,ofwsprg0save@l(%r23) + /* Finally, branch to OF */ - mtctr %r4 + mtctr %r15 bctrl =20 - /* Reload stack pointer and MSR from the OFW stack */ - ld %r6,24(%r1) + /* HACK: check if %r1 was corrupted (had a net-change) */ + lis %r21,ofw_r1_for_retry@ha + lwz %r24,ofw_r1_for_retry@l(%r21) + cmpw %r24,%r1 + bne 2f /* stack pointer corrupted so go retry once */ + + /* HACK: %r1 okay but check %r3 for being 0 or -1 vs. anything = else */ + xoris %r25,%r3,0 + cmpw %r25,%r3 + bne 2f /* %r3 was neither 0 nor -1 so corruption: go retry = once */ + + /* HACK: here both %r1 and %r3 appeared to be okay: + * so sequential flow was for "no problems" + */ + +1: /* HACK status: continue/return from whatever status, + * trying to get back cleanly to the FreeBSD context + */ + + /* HACK: part of having FreeBSD's sprg0 in place for any = exception + * during return. + * Parania code at this point since corrupted %r1 values = are + * avoided by forcing the before-openfirmware value. + * NOTE: Calling code also deals with this but too late for the + * original exceptions after openfirmware returned to this = code. + */ + lis %r23,ofwsprg0save@ha + ld %r16,ofwsprg0save@l(%r23) + mtsprg0 %r16 + + /* Reload FreeBSD stack pointer and MSR + * from the bottom of the (i.e., below the effective) OFW stack + * + * HACK note: %r1 may have been forced to the = before-openfirmware value + * (to avoid garbage results and the resulting = exceptions) + */ + ld %r26,24(%r1) ld %r2,16(%r1) ld %r1,8(%r1) =20 - /* Now set the real MSR */ - mtmsrd %r6 + /* Now set the FreeBSD MSR */ + mtmsrd %r26 isync =20 /* Sign-extend the return value from OF */ @@ -168,6 +253,43 @@ mtlr %r0 blr =20 +/* HACK: code for %r1 and/or %r3 corruption's single-retry */ +/* Still under openfirmware's msr, sprg0, stack values */ + +2: /* HACK: corruption observed so retry, restoring %r1 and %r3 = first */ + lis %r20,ofw_entry_addr@ha + lwz %r15,ofw_entry_addr@l(%r20) + lis %r21,ofw_r1_for_retry@ha + lwz %r1,ofw_r1_for_retry@l(%r21) + lis %r22,ofw_r3_for_retry@ha + lwz %r3,ofw_r3_for_retry@l(%r22) + mtctr %r15 + bctrl + + /* HACK: check if %r1 was corrupted (had a net-change) */ + lis %r21,ofw_r1_for_retry@ha + lwz %r24,ofw_r1_for_retry@l(%r21) + cmpw %r24,%r1 + bne 3f /* retry corrupted %r1 + * so go give up with %r3 being -1 and %r1 = forced-good + */ + + /* HACK: %r1 okay but check %r3 for being 0 or -1 vs. anything = else */ + xoris %r25,%r3,0 + cmpw %r25,%r3 + beq 1b /* %r3 also was 0 or -1 so no corruption observed on = retry + * so go do a normal return + */ + +3: /* Either %r1 had a net change after retry + * or %r3 was not one of 0,-1 after retry + * so force %r1 and have %r3 be -1 then go return + */ + lis %r21,ofw_r1_for_retry@ha + lwz %r1,ofw_r1_for_retry@l(%r21) + li %r3,-1 /* the openfirmware failure return value */ + b 1b + /* * RTAS 32-bit Entry Point. Similar to the OF one, but simpler (no = separate * stack) =3D=3D=3D Mark Millard markmi at dsl-only.net