From owner-freebsd-ppc@FreeBSD.ORG Thu Sep 25 20:19:40 2014 Return-Path: Delivered-To: freebsd-ppc@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 9DE3523B for ; Thu, 25 Sep 2014 20:19:40 +0000 (UTC) Received: from asp.reflexion.net (outbound-242.asp.reflexion.net [69.84.129.242]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id A1A23A48 for ; Thu, 25 Sep 2014 20:19:38 +0000 (UTC) Received: (qmail 21733 invoked from network); 25 Sep 2014 20:19:36 -0000 Received: from unknown (HELO mail-cs-03.app.dca.reflexion.local) (10.81.19.3) by 0 (rfx-qmail) with SMTP; 25 Sep 2014 20:19:36 -0000 Received: by mail-cs-03.app.dca.reflexion.local (Reflexion email security v7.30.7) with SMTP; Thu, 25 Sep 2014 16:19:36 -0400 (EDT) Received: (qmail 14895 invoked from network); 25 Sep 2014 20:17:52 -0000 Received: from unknown (HELO iron2.pdx.net) (69.64.224.71) by 0 (rfx-qmail) with (DHE-RSA-AES256-SHA encrypted) SMTP; 25 Sep 2014 20:17:52 -0000 X-No-Relay: not in my network X-No-Relay: not in my network X-No-Relay: not in my network Received: from [192.168.1.8] (c-98-246-178-138.hsd1.or.comcast.net [98.246.178.138]) by iron2.pdx.net (Postfix) with ESMTPSA id 9CEB11C4057; Thu, 25 Sep 2014 13:17:47 -0700 (PDT) Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\)) Subject: Re: lr=u_trap+0x10 and srr0=k_trap+0x28 for "stopped at 0 illegal instruction 0" before-copyright hang on PowerMac G5's From: Mark Millard In-Reply-To: Date: Thu, 25 Sep 2014 13:17:50 -0700 Message-Id: <0DF8A9EC-C81C-4E15-9420-6831BA7D5F8E@dsl-only.net> References: <1118046C-0FF7-49FC-82DA-DB9A7A310991@dsl-only.net> <2ED3DB50-B985-4382-8FF2-3B44E7E65453@dsl-only.net> <6D729F43-662A-429E-9503-0148EC3250B1@dsl-only.net> <72535F89-3942-45A6-B351-7F746209ED9F@dsl-only.net> <0703EF26-6E33-4446-9273-BBFD0CB72893@dsl-only.net> <37575F94-763C-43BF-8DD9-F648F4A7C09F@dsl-only.net> <5422E513.6010806@freebsd.org> <1C02D0D4-14B8-465F-B493-4D3A64E4C35C@dsl-only.net> To: Nathan Whitehorn X-Mailer: Apple Mail (2.1878.6) Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1 Cc: Justin Hibbits , FreeBSD PowerPC ML X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 25 Sep 2014 20:19:40 -0000 The "before copyright" hang/exception is during the first openfirmware = "peer" after "quiesce". The ofw_restore_trap_vec(save_trap_init) = completes fine, the ofwcall(args) is made but it does not return = normally. Ignoring the ofwcall's from before quiesce, the sequence of ofwcall's = is: quiesce finddevice parent getprop getprop getprop finddevice getprop instance-to-package getproplen finddevice getprop getprop peer And when the boot fails before the copyright that ofwcall for peer ends = up resulting in the register dump with no register pointing to the = kernel's normal stack area. I still have no clue what is happening during peer. = ofw_restore_trap_vec(save_trap_init) is being called and is returning = before ofwcall is used. For all I know some uses of peer could require = not being quiesce'd in order for peer to be reliable. In the form of my display indicating what executed the text reported = ends in: ^ where the ^ indicates the stage that last completed in the call sequence = inside openfirmware_core. This information is displayed by the x/s ofw_name_history in the automatically created default script for DDB. I read the sequence = backwards from the end marker (here ^), following the wraparound if = there is that much text and if I care to go back that far. FreeBSD FBSDG5M1 10.1-BETA2 FreeBSD 10.1-BETA2 #11 r271944M: Thu Sep 25 = 12:14:05 PDT 2014 root@FBSDG5M1:/usr/obj/usr/src/sys/GENERIC64 = powerpc My current hacks to get this information are: Index: /usr/src/sys/ddb/db_script.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- /usr/src/sys/ddb/db_script.c (revision 271944) +++ /usr/src/sys/ddb/db_script.c (working copy) @@ -319,10 +319,25 @@ { char scriptname[DB_MAXSCRIPTNAME]; =20 + /* HACK!!! : Additional lines to force a basic default script to = exist. + * Will dump information even if ddb input is not available for = early crash. + * Used to get more information about PowerMac G5 "before = Copyright" hangs. + */ + struct ddb_script *dsp =3D = db_script_lookup(DB_SCRIPT_KDBENTER_DEFAULT); + if (!dsp) db_script_set(DB_SCRIPT_KDBENTER_DEFAULT, "show = registers; bt; x/s ofw_name_history"); + snprintf(scriptname, sizeof(scriptname), "%s.%s", DB_SCRIPT_KDBENTER_PREFIX, eventname); if (db_script_exec(scriptname, 0) =3D=3D ENOENT) (void)db_script_exec(DB_SCRIPT_KDBENTER_DEFAULT, 0); + + /* HACK!!! : Additional lines to always use the default script, + * even if scriptname existed and was executed. + * Will dump information even if ddb input is not available for = early crash. + * Used to get more information about PowerMac G5 "before = Copyright" hangs. + */ + else + (void)db_script_exec(DB_SCRIPT_KDBENTER_DEFAULT, 0); } =20 /*- Index: /usr/src/sys/powerpc/conf/GENERIC64 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- /usr/src/sys/powerpc/conf/GENERIC64 (revision 271944) +++ /usr/src/sys/powerpc/conf/GENERIC64 (working copy) @@ -76,6 +76,8 @@ # Debugging support. Always need this: options KDB # Enable kernel debugger = support. options KDB_TRACE # Print a stack trace for a = panic. +options DDB +options GDB =20 # Make an SMP-capable kernel by default options SMP # Symmetric MultiProcessor = Kernel Index: /usr/src/sys/powerpc/ofw/ofw_machdep.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- /usr/src/sys/powerpc/ofw/ofw_machdep.c (revision 271944) +++ /usr/src/sys/powerpc/ofw/ofw_machdep.c (working copy) @@ -324,6 +324,12 @@ openfirmware(&args); } =20 +/* Part of HACK to have record of ofw call names */ +#define ofw_name_history_record_size 256 +char ofw_name_history[ofw_name_history_record_size+1] =3D {}; /* = Initially: automatically '\0' filled */ +char * ofw_name_history_pos =3D ofw_name_history; +/* End Part of HACK */ + static int openfirmware_core(void *args) { @@ -330,6 +336,42 @@ int result; register_t oldmsr; =20 + { /* HACK to have record of ofw call names */ + struct argtype_prefix { + cell_t name; + }; + + char *name =3D (char*) (uintptr_t) (((struct = argtype_prefix*)args)->name); +=20 + int i; + + *ofw_name_history_pos =3D '<'; + + for(i=3D0; (*name) && i!=3D20; i++) { + ofw_name_history_pos++; + if (ofw_name_history_pos =3D=3D = &ofw_name_history[ofw_name_history_record_size]) { + ofw_name_history_pos =3D = ofw_name_history; + } + *ofw_name_history_pos =3D *name; + + name++; + } + + ofw_name_history_pos++; + if (ofw_name_history_pos =3D=3D = &ofw_name_history[ofw_name_history_record_size]) { + ofw_name_history_pos =3D ofw_name_history; + } + *ofw_name_history_pos =3D '>'; + + ofw_name_history_pos++; + if (ofw_name_history_pos =3D=3D = &ofw_name_history[ofw_name_history_record_size]) { + ofw_name_history_pos =3D ofw_name_history; + } + *ofw_name_history_pos =3D '@'; + + ofw_name_history[ofw_name_history_record_size] =3D '\0'; = /* Paranoia */ + } /* HACK end */ + /* * Turn off exceptions - we really don't want to end up * anywhere unexpected with PCPU set to something strange @@ -337,14 +379,22 @@ */ oldmsr =3D intr_disable(); =20 + *ofw_name_history_pos =3D '#'; /* HACK */ + ofw_sprg_prepare(); =20 + *ofw_name_history_pos =3D '$'; /* HACK */ + /* Save trap vectors */ ofw_save_trap_vec(save_trap_of); =20 + *ofw_name_history_pos =3D '%'; /* HACK */ + /* Restore initially saved trap vectors */ ofw_restore_trap_vec(save_trap_init); =20 + *ofw_name_history_pos =3D '^'; /* HACK */ + #if defined(AIM) && !defined(__powerpc64__) /* * Clear battable[] translations @@ -357,13 +407,21 @@ =20 result =3D ofwcall(args); =20 + *ofw_name_history_pos =3D '&'; /* HACK */ + /* Restore trap vecotrs */ ofw_restore_trap_vec(save_trap_of); =20 + *ofw_name_history_pos =3D '*'; /* HACK */ + ofw_sprg_restore(); =20 + *ofw_name_history_pos =3D '~'; /* HACK */ + intr_restore(oldmsr); =20 + *ofw_name_history_pos =3D '!'; /* HACK */ + return (result); } =3D=3D=3D Mark Millard markmi at dsl-only.net On Sep 25, 2014, at 3:46 AM, Mark Millard = wrote: One source code oddity that I notice is the following mixed use of = ofw_real_mode: always tested vs. never tested (#if 0 ... #endif) ... > /* > * Saved SPRG0-3 from OpenFirmware. Will be restored prior to the = callback. > */ > register_t ofw_sprg0_save; >=20 > static __inline void > ofw_sprg_prepare(void) > { > if (ofw_real_mode) > return; >=20 > /* > * Assume that interrupt are disabled at this point, or > * SPRG1-3 could be trashed > */ > __asm __volatile("mfsprg0 %0\n\t" > "mtsprg0 %1\n\t" > "mtsprg1 %2\n\t" > "mtsprg2 %3\n\t" > "mtsprg3 %4\n\t" > : "=3D&r"(ofw_sprg0_save) > : "r"(ofmsr[1]), > "r"(ofmsr[2]), > "r"(ofmsr[3]), > "r"(ofmsr[4])); > } > =20 > static __inline void > ofw_sprg_restore(void) > { > #if 0 > if (ofw_real_mode) > return; > #endif >=20 > /* > * Note that SPRG1-3 contents are irrelevant. They are scratch > * registers used in the early portion of trap handling when > * interrupts are disabled. > * > * PCPU data cannot be used until this routine is called ! > */ > __asm __volatile("mtsprg0 %0" :: "r"(ofw_sprg0_save)); > } It would seem that for ofw_real_mode !=3D 0 that ofw_sprg_prepare would = never set up ofw_sprg0_save (via mfsprg0) for the later = ofw_sprg_restore's always-executed mtsprg0 that is based on = ofw_sprg0_save. register_t seems to trace back to __int64_t --and that would leave = ofw_sprg0_save initialized to zero as a global and that would have to be = okay as the SPRG0 value to restore in such a case. (I have not tracked = down what any of the per-processor values for SPRG0 are/should-be.) =3D=3D=3D Mark Millard markmi at dsl-only.net On Sep 25, 2014, at 2:12 AM, Mark Millard = wrote: The register dump that has no kernel stack addresses in any registers = does have register contents suggesting a ofwcall use, matching up = reasonably with the code I looked at that is related to ofwcall. ofwcall = is only reached via openfirmware_core from what I can tell. (If there = are other paths into openfirmware than via ofwcall then the register = dump suggests that they are not in use around the crash.) And openfirmware_core has logic for exception vector swapping, going = both directions: > static int > openfirmware_core(void *args) > { > int result; > register_t oldmsr; > =20 > /* > * Turn off exceptions - we really don't want to end up > * anywhere unexpected with PCPU set to something strange > * or the stack pointer wrong. > */ > oldmsr =3D intr_disable(); > =20 > ofw_sprg_prepare(); > =20 > /* Save trap vectors */ > ofw_save_trap_vec(save_trap_of); > =20 > /* Restore initially saved trap vectors */ > ofw_restore_trap_vec(save_trap_init); > =20 > #if defined(AIM) && !defined(__powerpc64__) > /* > * Clear battable[] translations > */ > if (!(cpu_features & PPC_FEATURE_64)) > __asm __volatile("mtdbatu 2, %0\n" > "mtdbatu 3, %0" : : "r" (0)); > isync(); > #endif >=20 > result =3D ofwcall(args); >=20 > /* Restore trap vecotrs */ > ofw_restore_trap_vec(save_trap_of); >=20 > ofw_sprg_restore(); >=20 > intr_restore(oldmsr); >=20 > return (result); > } In turn openfirmware_core is used only by ofw_rendezvous_dispatch and in = turn that is used only by openfirmware: only PCPU_GET(cpuid) =3D=3D 0 = does the above. save_trap_init is initialized by powerpc_init using = ofw_save_trap_vec. [Note that ofw_restore_trap_vec uses __syncicache which does not use = dcbf after the bcopy but instead uses dcbst: That is part of what lead = my investigation into the distinction --and so to my more overall dcbst = vs. dcbf use questions after proving dcbf would not be sufficient for a = fix to the specific boot issue.] Unless the initialization of save_trap_init ends up with the wrong = contents for openfirmware it would appear that the exception vectors are = kept tracking by the above code. But the above does assume that the = openfirmware vectors are unchanged after save_trap_init is initialized: = there is no attempt at tracking of any potential updates to the = openfirmware exception vectors. I would infer then that after ofw_restore_trap_vec(save_trap_of) is = executed is when the exception that DDB reports happened: That is when = FreeBSD's exception vectors are again in place. But a stack pointer into = the kernel stack is not then in place in any register (based on DDB's = register dump): stack handling is messed up already by the point of the = reported exception. And that may actually be why an illegal instruction = at address zero was reached: an incorrect stack context used to get an = address to execute at. =3D=3D=3D Mark Millard markmi at dsl-only.net On Sep 24, 2014, at 8:36 AM, Nathan Whitehorn wrote: There shouldn't be any exceptions at that point, nested or otherwise. = What I suspect is happening is that Open Firmware has turned them on for = some bizarre reason, taken one, and ended up in the kernel's handlers = but with the Open Firmware environment. Saving and restoring the OF = interrupt vectors would be a possible solution; flattening the device = tree in loader so that the kernel doesn't call Open Firmware at all = would be another. I think Justin may have tried the first at some point. -Nathan On 09/24/14 02:04, Mark Millard wrote: > Now that I've had a kernel/boot crash with a successful DDB bt and = show registers (a different submittal) it makes for a good = comparison/contrast with what DDB reports for this "before copyright" = crash. >=20 > Something unique to the "before copyright" context is... >=20 > No registers are reported to have values that point into the range = between tmpstk and esym. >=20 > In other words: There is no valid stack pointer reported as far as I = can tell. r1 has the value 0 instead of being a handling a valid stack = address. tmpstk=3D0xbd7000 and esym=3D0xbdb000 (example for one of my = WITH_DEBUG_FILES=3D and options DDB and GDB builds of 10.1-BETA2). That = at least gives a ball park on the range to expect for pointing into the = stack even with some build variation. >=20 > It leaves me wondering if the DDB report is for a nested exception = handling. That could explain why lr points to u_trap+0x10 and srr0 = points to k_trap+0x28 when normally srr0 would point to the the failing = instruction (or the instruction after) and lr to where that routine = would normally return to. >=20 > The register values that are reported for my 10.1-BETA2 builds that = crash before the copyright notice are: >=20 > r0: 0 > r1: 0 > r2: 0xc81538 vop_unlock_desc > r3: 0xd18868 > r4: 0x894b58 > r5: 0 > r6: 0xc1dee0 M_AUDITBSM > r7: 0xe3f818 ofw_real_mode > r8: 0x1 > r9: 0xe0f580 __pcpu > r10: 0x1c35ec0 > r11: 0 > r12: 0x10000000 > r13: 0xdbb290 thread0 (Note: another submittal has this mistyped as = 0xdbb290.) > r14-r19: all 0 > r20: 0x10c1000 > r21: 0x4 > r22: 0x180abd4 > r23: 0x1803a28 > r24: 0xc000000000008760 > r25: 0xcc89b8 smp_no... > r26: 0xcea108 ofw_rend... > r27: 0x894b58 ofwcall+0xa8 > r28: 0x894b58 ofwcall+0xa8 > r29: 2400022 > r30: 9000000000001032 > r31: 0xbb7d38 >=20 > srr0: 0x102720 k_trap+0x28 > srr1: 9000000000001032 > lr: 0x1026f0 u_trap+0x10 > ctr: 0xff846d78 > cr: 2000deb0 > xer: 0 > dar: f...d50 (lots of f's) > dsisr: 42000000 >=20 >=20 >=20 >=20 >=20 >=20 > =3D=3D=3D > Mark Millard > markmi at dsl-only.net >=20 > On Sep 20, 2014, at 3:42 PM, Mark Millard = wrote: >=20 > [I corrected the SSR0 in the subject to be SRR0.] >=20 > I did miss a register in my list (it matched the shown r30 value). And = it turns out to probably be very important to interpreting what the = "show registers" is reporting: >=20 > SRR1: 0x9000000000001032 >=20 > But bits 43-46 of SRR1 are supposed to indicate which type of Program = Exception, using a single binary 1 to so. No such 1's are present. >=20 > Illegal instruction would have been bit 44 being 1. (PowerPC has the = upper bit numbered zero and increases from there.) >=20 > So the ddb "show registers" is apparently not reporting the status as = of when the "stopped at 0 illegal instruction 0" happened. Thus other = things are also likely not from that exact time frame. >=20 >=20 >=20 > And I misinterpreted the LR value status: The LR value was just left = over from the restore_kernsrs returning when it finished. Execution then = flowed into k_trap. Nothing unusual involved. >=20 >=20 >=20 >=20 >=20 > =3D=3D=3D > Mark Millard > markmi@dsl-only.net >=20 > On Sep 18, 2014, at 8:57 PM, Mark Millard wrote: >=20 > I modified DDB to automatically "show registers" even at the early = "before Copyright" crash time. The end of this note will show the = /usr/src/sys/ddb/db_script.c diff for the hack. While I also had DDB bt, = the bt does not actually print a back trace for this context. (It might = for others.) >=20 > The registers give interesting context despite the lack of a back = trace. I do not know if it will be sufficient to be of much immediate = help if someone used the information to start looking at the problem. >=20 > I'll start with register lr: 0x1026f0 u_trap+0x10. >=20 > /usr/src/sys/powerpc/aim/trap_subr64.S has: >=20 > s_trap: > bf 17,k_trap /* branch if PSL_PR is false = */ > GET_CPUINFO(%r1) > u_trap: > ld %r1,PC_CURPCB(%r1) > mr %r27,%r28 /* Save LR, r29 */ > mtsprg2 %r29 > bl restore_kernsrs /* enable kernel mapping */ > mfsprg2 %r29 > mr %r28,%r27 >=20 > /* > * Now the common trap catching code. > */ > k_trap: > FRAME_SETUP(PC_TEMPSAVE) > /* Call C interrupt dispatcher: */ > trapagain: >=20 > and so this appears to indicate a pending return to execute the = "mfsprg2 %r29" after "bl restore_kernsrs", which indicates that = restore_kernsrs should be active. >=20 > But register srr0 indicates: 0x102720 k_trap+0x28. (So apparently in = FRAME_SETUP(PC_TEMPSAVE) someplace.) >=20 > So it appears to me that the processor got to the k_trap code during = the supposed restore_kernsrs time frame. (But I'm no expert at these = sorts of things or for the processor.) >=20 > I'll list the other register values: >=20 > r0: 0 > r1: 0 > r2: 0xc1be80 M_AUDITBSM > r3: 0xb16138 > r4: 0x8926e8 .ofwcall+0xa8 > r5: 0 > r6: 0xbb5f90 > r7: 0xe3d118 ofw_real_mode > r8: 0x1 > r9: 0xe0ce80 __pcpu > r10: 0x1c35ec9 > r11: 0 > r12: 0x10000000 > r13: db890 thread0 > r14-r19: all 0 > r20: 0x10bc000 > r21: 0x4 > r22: 0x1801db4 > r23: 0x1803a28 > r24: 0xc000000000008760 > r25: 0xcc6908 smp_no_rendevous_barrier > r26: 0xec79e0 ofw_rendezvous_dispatch (yep one has v and the other zv) > r27: 0x8926e8 .ofwcall+0xa8 > r28: 0x8926e8 .ofwcall+0xa8 (yep: same value) > r29: 0x24000022 > r30: 0x9000000000001032 > r31: 0xc7f488 vop_unlock_desc >=20 > ctr: 0xff846d78 > cr: 0x2000d7b0 > xer: 0 > dar: 0xfffffffffffffd50 > dsisr: 0x42000000 >=20 > (Hopefully this manual transcription from the screen display is = complete --and also accurate for what it does present.) >=20 >=20 >=20 >=20 > The personal HACK to /usr/src/sys/ddb/db_script.c's = db_script_kdbenter(...) to have it show registers and try bt... >=20 > $ cd /usr/src/sys/ddb/ > $ svnlite diff . > Index: db_script.c > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- db_script.c (revision 271610) > +++ db_script.c (working copy) > @@ -319,10 +319,25 @@ > { > char scriptname[DB_MAXSCRIPTNAME]; > =20 > + /* HACK!!! : Additional lines to force a basic default script to = exist. > + * Will dump information even if ddb input is not available for = early crash. > + * Used to get more information about PowerMac G5 "before Copyright" = hangs. > + */ > + struct ddb_script *dsp =3D = db_script_lookup(DB_SCRIPT_KDBENTER_DEFAULT); > + if (!dsp) db_script_set(DB_SCRIPT_KDBENTER_DEFAULT, "show registers; = bt"); > + > snprintf(scriptname, sizeof(scriptname), "%s.%s", > DB_SCRIPT_KDBENTER_PREFIX, eventname); > if (db_script_exec(scriptname, 0) =3D=3D ENOENT) > (void)db_script_exec(DB_SCRIPT_KDBENTER_DEFAULT, 0); > + > + /* HACK!!! : Additional lines to always use the default script, > + * even if scriptname existed and was executed. > + * Will dump information even if ddb input is not available for = early crash. > + * Used to get more information about PowerMac G5 "before Copyright" = hangs. > + */ > + else > + (void)db_script_exec(DB_SCRIPT_KDBENTER_DEFAULT, 0); > } > =20 > /*- >=20 >=20 >=20 > =3D=3D=3D > Mark Millard > markmi at dsl-only.net >=20 > On Sep 16, 2014, at 9:28 PM, Mark Millard = wrote: >=20 > In part I sent directly to you because of a past exchange (July-27) = where you had written: >=20 >> Nathan and I both speculate that it's >> dropping into Open Firmware (we make extensive use of OFW), and then >> messing something up, taking a page fault or something. >=20 > The specific text that I report and its uniformity when it is produced = seems to add a little information beyond a speculated "page fault or = something" and so might eventually help a little. As I understand the = text it is reporting execution reaching address zero without any prior = un-handled exceptions or other such that would stop it. A corrupted = stack (pointer) so a bad return address or some such? I'd guess there = are no explicit jumps to address zero so I expect that indirection is = likely involved, with the content for the indirection messed up. >=20 > I really wish that I had a logic analyzer configuration for this. I've = not found a way to make the failing context visible so far and the extra = way of looking at things might have helped. >=20 >=20 >=20 >=20 > =3D=3D=3D > Mark Millard > markmi@dsl-only.net >=20 > On Sep 16, 2014, at 8:28 PM, Justin Hibbits = wrote: >=20 > Hi mark, >=20 > I see this on my G5, and I think it's due to the amount of RAM in the = machine. More than 4gb seems to confuse open firmware when called by = FreeBSD. There is some effort to remove the need of the callbacks but = thus far it's not far along. The good news is that after it boots it's = solid except when switching vtys, buy earlier this year or last year I = added a sysctl hack to disable the call into open firmware on vty switch = (don't recall offhand and not at my computer right now, but if you grep = the sysctl output for reset and ofw you can find it). >=20 > -Justin >=20 > On Sep 16, 2014 8:01 PM, "Mark Millard" wrote: > I've now spent time with rebooting and power-off/power-on for all 3 = PowerMac G5's (one PowerMac7,2 and two PowerMac11,2's) and all 3 get the >=20 >> GDB: no debug ports present >> KDB: debugger backends: DDB >> KDB: current backend: DDB >> [ thread pid -1 tid 1006665719 ] >> Stopped at 0: illegal instruction 0 >> db> >=20 > when they fail just before the Copyright notice would normally be = displayed. None fail any earlier. At that spot none have failed any = other way. It is the same SSD in all 3. (Happens with other SSD's as = well.) Overall there is a mix of Radeon and NVIDIA display boards. = Besides the SSD use and RAM upgrades the rest is stock equipment. scons = used, not vt. (I've yet to try vt.) >=20 > Seeing a failure after the Copyright notice as been fairly rare in all = my experiments from when I started last April or so. The ones that I've = noted had Data Storage Interrupt reported. So far no examples of the = above have been reported after the Copyright notice. So I'd guess that = they are separate issues. Of course it seems that only in the last few = days would I have seen the above sort of thing if it did happen after = the Copyright notice: The prior history does not count for judgements = about that. >=20 > =3D=3D=3D > Mark Millard > markmi at dsl-only.net >=20 > On Sep 16, 2014, at 8:15 AM, Mark Millard wrote: >=20 > Using 10.1-BETA1 I added "options DDB" and "options GDB" to = powerpc64's GENERIC64. (I also used WITH_DEBUG_FILES=3D, WITHOUT_CLANG=3D,= and WITH_DEBUG=3D in /etc/make.conf.) So buildworld, kernel was = basically just set up to have more of a debugging context around = (including for any ports builds). >=20 > The result was new information about the PowerMac G5 boot hangups: The = screen is no longer blank when the G5 is hung up without there being a = Copyright notice yet. It says... >=20 >> GDB: no debug ports present >> KDB: debugger backends: DDB >> KDB: current backend: DDB >> [ thread pid -1 tid 1006665719 ] >> Stopped at 0: illegal instruction 0 >> db> >=20 > (I had no ability to input at that point.) Normally the Copyright = notice would have displayed instead of "[...]" and what follows. (I do = not claim to have all the spacing, capitalization, and such correct = above.) >=20 > That text is constant from hang to hang when it hangs just before it = would normally output the Copyright notice: The numbers do not vary, = much less the other text. It has never failed until after the two KDB = messages are present. So far I've only tested one PowerMac G5, booting = over and over for a few hours. >=20 >=20 >=20 > (I do not claim to be set up for remote kernel debugging. I just = decided to let GDB go along for the ride when I added DDB.) >=20 > =3D=3D=3D > Mark Millard > markmi at dsl-only.net >=20 >=20 >=20 >=20 >=20 >=20