From owner-freebsd-ppc@FreeBSD.ORG Thu Sep 25 21:09:04 2014 Return-Path: Delivered-To: freebsd-ppc@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id BD5C4C1B for ; Thu, 25 Sep 2014 21:09:04 +0000 (UTC) Received: from d.mail.sonic.net (d.mail.sonic.net [64.142.111.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 93189FEC for ; Thu, 25 Sep 2014 21:09:04 +0000 (UTC) Received: from aurora.physics.berkeley.edu (aurora.Physics.Berkeley.EDU [128.32.117.67]) (authenticated bits=0) by d.mail.sonic.net (8.14.9/8.14.9) with ESMTP id s8PL8tHA003153 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES128-SHA bits=128 verify=NOT); Thu, 25 Sep 2014 14:08:55 -0700 Message-ID: <54248467.4050900@freebsd.org> Date: Thu, 25 Sep 2014 14:08:55 -0700 From: Nathan Whitehorn User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:31.0) Gecko/20100101 Thunderbird/31.0 MIME-Version: 1.0 To: Mark Millard Subject: Re: lr=u_trap+0x10 and srr0=k_trap+0x28 for "stopped at 0 illegal instruction 0" before-copyright hang on PowerMac G5's References: <1118046C-0FF7-49FC-82DA-DB9A7A310991@dsl-only.net> <2ED3DB50-B985-4382-8FF2-3B44E7E65453@dsl-only.net> <6D729F43-662A-429E-9503-0148EC3250B1@dsl-only.net> <72535F89-3942-45A6-B351-7F746209ED9F@dsl-only.net> <0703EF26-6E33-4446-9273-BBFD0CB72893@dsl-only.net> <37575F94-763C-43BF-8DD9-F648F4A7C09F@dsl-only.net> <5422E513.6010806@freebsd.org> <1C02D0D4-14B8-465F-B493-4D3A64E4C35C@dsl-only.net> <0DF8A9EC-C81C-4E15-9420-6831BA7D5F8E@dsl-only.net> In-Reply-To: <0DF8A9EC-C81C-4E15-9420-6831BA7D5F8E@dsl-only.net> X-Sonic-CAuth: UmFuZG9tSVbF5MVs0UxQKkX/oWeYrjK+vugZ1Wk4Y1FWoYisuNKFCSiukDi5OT5/o8xkc8VmLF/Qt0gA85k65l51LViLce7TZkz9nIiiURI= X-Sonic-ID: C;LHbkKPhE5BGeZQDu5Qupew== M;jBIQKfhE5BGeZQDu5Qupew== X-Spam-Flag: Unknown X-Sonic-Spam-Details: not scanned (too big) by cerberusd Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1 Cc: Justin Hibbits , FreeBSD PowerPC ML X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 25 Sep 2014 21:09:04 -0000 Can you comment out the call to quiesce? It may not be necessary on your system. -Nathan On 09/25/14 13:17, Mark Millard wrote: > The "before copyright" hang/exception is during the first openfirmware > "peer" after "quiesce". The ofw_restore_trap_vec(save_trap_init) > completes fine, the ofwcall(args) is made but it does not return normally. > > Ignoring the ofwcall's from before quiesce, the sequence of ofwcall's is: > > quiesce > finddevice > parent > getprop > getprop > getprop > finddevice > getprop > instance-to-package > getproplen > finddevice > getprop > getprop > peer > > And when the boot fails before the copyright that ofwcall for peer > ends up resulting in the register dump with no register pointing to > the kernel's normal stack area. > > I still have no clue what is happening during peer. > ofw_restore_trap_vec(save_trap_init) is being called and is returning > before ofwcall is used. For all I know some uses of peer could require > not being quiesce'd in order for peer to be reliable. > > In the form of my display indicating what executed the text reported > ends in: > > ^ > > where the ^ indicates the stage that last completed in the call > sequence inside openfirmware_core. This information is displayed by the > > x/s ofw_name_history > > in the automatically created default script for DDB. I read the > sequence backwards from the end marker (here ^), following the > wraparound if there is that much text and if I care to go back that far. > > FreeBSD FBSDG5M1 10.1-BETA2 FreeBSD 10.1-BETA2 #11 r271944M: Thu Sep > 25 12:14:05 PDT 2014 root@FBSDG5M1:/usr/obj/usr/src/sys/GENERIC64 powerpc > > My current hacks to get this information are: > > Index: /usr/src/sys/ddb/db_script.c > =================================================================== > --- /usr/src/sys/ddb/db_script.c(revision 271944) > +++ /usr/src/sys/ddb/db_script.c(working copy) > @@ -319,10 +319,25 @@ > { > char scriptname[DB_MAXSCRIPTNAME]; > > > +/* HACK!!! : Additional lines to force a basic default script to exist. > +* Will dump information even if ddb input is not available for early > crash. > +* Used to get more information about PowerMac G5 "before Copyright" > hangs. > +*/ > +struct ddb_script *dsp = db_script_lookup(DB_SCRIPT_KDBENTER_DEFAULT); > +if (!dsp) db_script_set(DB_SCRIPT_KDBENTER_DEFAULT, "show registers; > bt; x/s ofw_name_history"); > + > snprintf(scriptname, sizeof(scriptname), "%s.%s", > DB_SCRIPT_KDBENTER_PREFIX, eventname); > if (db_script_exec(scriptname, 0) == ENOENT) > (void)db_script_exec(DB_SCRIPT_KDBENTER_DEFAULT, 0); > + > +/* HACK!!! : Additional lines to always use the default script, > +* even if scriptname existed and was executed. > +* Will dump information even if ddb input is not available for early > crash. > +* Used to get more information about PowerMac G5 "before Copyright" > hangs. > +*/ > +else > +(void)db_script_exec(DB_SCRIPT_KDBENTER_DEFAULT, 0); > } > > > /*- > Index: /usr/src/sys/powerpc/conf/GENERIC64 > =================================================================== > --- /usr/src/sys/powerpc/conf/GENERIC64(revision 271944) > +++ /usr/src/sys/powerpc/conf/GENERIC64(working copy) > @@ -76,6 +76,8 @@ > # Debugging support. Always need this: > options KDB# Enable kernel debugger support. > options KDB_TRACE# Print a stack trace for a panic. > +options DDB > +options GDB > > > # Make an SMP-capable kernel by default > options SMP# Symmetric MultiProcessor Kernel > Index: /usr/src/sys/powerpc/ofw/ofw_machdep.c > =================================================================== > --- /usr/src/sys/powerpc/ofw/ofw_machdep.c(revision 271944) > +++ /usr/src/sys/powerpc/ofw/ofw_machdep.c(working copy) > @@ -324,6 +324,12 @@ > openfirmware(&args); > } > > > +/* Part of HACK to have record of ofw call names */ > +#define ofw_name_history_record_size 256 > +char ofw_name_history[ofw_name_history_record_size+1] = {}; /* > Initially: automatically '\0' filled */ > +char * ofw_name_history_pos = ofw_name_history; > +/* End Part of HACK */ > + > static int > openfirmware_core(void *args) > { > @@ -330,6 +336,42 @@ > intresult; > register_toldmsr; > > > +{ /* HACK to have record of ofw call names */ > +struct argtype_prefix { > +cell_t name; > +}; > + > +char *name = (char*) (uintptr_t) (((struct argtype_prefix*)args)->name); > + > +int i; > + > +*ofw_name_history_pos = '<'; > + > +for(i=0; (*name) && i!=20; i++) { > +ofw_name_history_pos++; > +if (ofw_name_history_pos == > &ofw_name_history[ofw_name_history_record_size]) { > +ofw_name_history_pos = ofw_name_history; > +} > +*ofw_name_history_pos = *name; > + > +name++; > +} > + > +ofw_name_history_pos++; > +if (ofw_name_history_pos == > &ofw_name_history[ofw_name_history_record_size]) { > +ofw_name_history_pos = ofw_name_history; > +} > +*ofw_name_history_pos = '>'; > + > +ofw_name_history_pos++; > +if (ofw_name_history_pos == > &ofw_name_history[ofw_name_history_record_size]) { > +ofw_name_history_pos = ofw_name_history; > +} > +*ofw_name_history_pos = '@'; > + > +ofw_name_history[ofw_name_history_record_size] = '\0'; /* Paranoia */ > +} /* HACK end */ > + > /* > * Turn off exceptions - we really don't want to end up > * anywhere unexpected with PCPU set to something strange > @@ -337,14 +379,22 @@ > */ > oldmsr = intr_disable(); > > > +*ofw_name_history_pos = '#'; /* HACK */ > + > ofw_sprg_prepare(); > > > +*ofw_name_history_pos = '$'; /* HACK */ > + > /* Save trap vectors */ > ofw_save_trap_vec(save_trap_of); > > > +*ofw_name_history_pos = '%'; /* HACK */ > + > /* Restore initially saved trap vectors */ > ofw_restore_trap_vec(save_trap_init); > > > +*ofw_name_history_pos = '^'; /* HACK */ > + > #if defined(AIM) && !defined(__powerpc64__) > /* > * Clear battable[] translations > @@ -357,13 +407,21 @@ > > > result = ofwcall(args); > > > +*ofw_name_history_pos = '&'; /* HACK */ > + > /* Restore trap vecotrs */ > ofw_restore_trap_vec(save_trap_of); > > > +*ofw_name_history_pos = '*'; /* HACK */ > + > ofw_sprg_restore(); > > > +*ofw_name_history_pos = '~'; /* HACK */ > + > intr_restore(oldmsr); > > > +*ofw_name_history_pos = '!'; /* HACK */ > + > return (result); > } > > > > > > === > Mark Millard > markmi at dsl-only.net > > On Sep 25, 2014, at 3:46 AM, Mark Millard > wrote: > > One source code oddity that I notice is the following mixed use of > ofw_real_mode: always tested vs. never tested (#if 0 ... #endif) ... > >> /* >> * Saved SPRG0-3 from OpenFirmware. Will be restored prior to the >> callback. >> */ >> register_t ofw_sprg0_save; >> >> static __inline void >> ofw_sprg_prepare(void) >> { >> if (ofw_real_mode) >> return; >> >> /* >> * Assume that interrupt are disabled at this point, or >> * SPRG1-3 could be trashed >> */ >> __asm __volatile("mfsprg0 %0\n\t" >> "mtsprg0 %1\n\t" >> "mtsprg1 %2\n\t" >> "mtsprg2 %3\n\t" >> "mtsprg3 %4\n\t" >> : "=&r"(ofw_sprg0_save) >> : "r"(ofmsr[1]), >> "r"(ofmsr[2]), >> "r"(ofmsr[3]), >> "r"(ofmsr[4])); >> } >> >> static __inline void >> ofw_sprg_restore(void) >> { >> #if 0 >> if (ofw_real_mode) >> return; >> #endif >> >> /* >> * Note that SPRG1-3 contents are irrelevant. They are scratch >> * registers used in the early portion of trap handling when >> * interrupts are disabled. >> * >> * PCPU data cannot be used until this routine is called ! >> */ >> __asm __volatile("mtsprg0 %0" :: "r"(ofw_sprg0_save)); >> } > > It would seem that for ofw_real_mode != 0 that ofw_sprg_prepare would > never set up ofw_sprg0_save (via mfsprg0) for the later > ofw_sprg_restore's always-executed mtsprg0 that is based on > ofw_sprg0_save. > > register_t seems to trace back to __int64_t --and that would leave > ofw_sprg0_save initialized to zero as a global and that would have to > be okay as the SPRG0 value to restore in such a case. (I have not > tracked down what any of the per-processor values for SPRG0 > are/should-be.) > > > > > > > === > Mark Millard > markmi at dsl-only.net > > On Sep 25, 2014, at 2:12 AM, Mark Millard > wrote: > > The register dump that has no kernel stack addresses in any registers > does have register contents suggesting a ofwcall use, matching up > reasonably with the code I looked at that is related to ofwcall. > ofwcall is only reached via openfirmware_core from what I can tell. > (If there are other paths into openfirmware than via ofwcall then the > register dump suggests that they are not in use around the crash.) > > And openfirmware_core has logic for exception vector swapping, going > both directions: > >> static int >> openfirmware_core(void *args) >> { >> int result; >> register_t oldmsr; >> >> /* >> * Turn off exceptions - we really don't want to end up >> * anywhere unexpected with PCPU set to something strange >> * or the stack pointer wrong. >> */ >> oldmsr = intr_disable(); >> >> ofw_sprg_prepare(); >> >> /* Save trap vectors */ >> ofw_save_trap_vec(save_trap_of); >> >> /* Restore initially saved trap vectors */ >> ofw_restore_trap_vec(save_trap_init); >> >> #if defined(AIM) && !defined(__powerpc64__) >> /* >> * Clear battable[] translations >> */ >> if (!(cpu_features & PPC_FEATURE_64)) >> __asm __volatile("mtdbatu 2, %0\n" >> "mtdbatu 3, %0" : : "r" (0)); >> isync(); >> #endif >> >> result = ofwcall(args); >> >> /* Restore trap vecotrs */ >> ofw_restore_trap_vec(save_trap_of); >> >> ofw_sprg_restore(); >> >> intr_restore(oldmsr); >> >> return (result); >> } > > In turn openfirmware_core is used only by ofw_rendezvous_dispatch and > in turn that is used only by openfirmware: only PCPU_GET(cpuid) == 0 > does the above. save_trap_init is initialized by powerpc_init using > ofw_save_trap_vec. > > [Note that ofw_restore_trap_vec uses __syncicache which does not use > dcbf after the bcopy but instead uses dcbst: That is part of what lead > my investigation into the distinction --and so to my more overall > dcbst vs. dcbf use questions after proving dcbf would not be > sufficient for a fix to the specific boot issue.] > > Unless the initialization of save_trap_init ends up with the wrong > contents for openfirmware it would appear that the exception vectors > are kept tracking by the above code. But the above does assume that > the openfirmware vectors are unchanged after save_trap_init is > initialized: there is no attempt at tracking of any potential updates > to the openfirmware exception vectors. > > I would infer then that after ofw_restore_trap_vec(save_trap_of) is > executed is when the exception that DDB reports happened: That is when > FreeBSD's exception vectors are again in place. But a stack pointer > into the kernel stack is not then in place in any register (based on > DDB's register dump): stack handling is messed up already by the point > of the reported exception. And that may actually be why an illegal > instruction at address zero was reached: an incorrect stack context > used to get an address to execute at. > > > > > === > Mark Millard > markmi at dsl-only.net > > On Sep 24, 2014, at 8:36 AM, Nathan Whitehorn freebsd.org > wrote: > > There shouldn't be any exceptions at that point, nested or otherwise. > What I suspect is happening is that Open Firmware has turned them on > for some bizarre reason, taken one, and ended up in the kernel's > handlers but with the Open Firmware environment. Saving and restoring > the OF interrupt vectors would be a possible solution; flattening the > device tree in loader so that the kernel doesn't call Open Firmware at > all would be another. I think Justin may have tried the first at some > point. > -Nathan > > On 09/24/14 02:04, Mark Millard wrote: >> Now that I've had a kernel/boot crash with a successful DDB bt and >> show registers (a different submittal) it makes for a good >> comparison/contrast with what DDB reports for this "before copyright" >> crash. >> >> Something unique to the "before copyright" context is... >> >> No registers are reported to have values that point into the range >> between tmpstk and esym. >> >> In other words: There is no valid stack pointer reported as far as I >> can tell. r1 has the value 0 instead of being a handling a valid >> stack address. tmpstk=0xbd7000 and esym=0xbdb000 (example for one of >> my WITH_DEBUG_FILES= and options DDB and GDB builds of 10.1-BETA2). >> That at least gives a ball park on the range to expect for pointing >> into the stack even with some build variation. >> >> It leaves me wondering if the DDB report is for a nested exception >> handling. That could explain why lr points to u_trap+0x10 and srr0 >> points to k_trap+0x28 when normally srr0 would point to the the >> failing instruction (or the instruction after) and lr to where that >> routine would normally return to. >> >> The register values that are reported for my 10.1-BETA2 builds that >> crash before the copyright notice are: >> >> r0: 0 >> r1: 0 >> r2: 0xc81538 vop_unlock_desc >> r3: 0xd18868 >> r4: 0x894b58 >> r5: 0 >> r6: 0xc1dee0 M_AUDITBSM >> r7: 0xe3f818 ofw_real_mode >> r8: 0x1 >> r9: 0xe0f580 __pcpu >> r10: 0x1c35ec0 >> r11: 0 >> r12: 0x10000000 >> r13: 0xdbb290 thread0 (Note: another submittal has this mistyped as >> 0xdbb290.) >> r14-r19: all 0 >> r20: 0x10c1000 >> r21: 0x4 >> r22: 0x180abd4 >> r23: 0x1803a28 >> r24: 0xc000000000008760 >> r25: 0xcc89b8 smp_no... >> r26: 0xcea108 ofw_rend... >> r27: 0x894b58 ofwcall+0xa8 >> r28: 0x894b58 ofwcall+0xa8 >> r29: 2400022 >> r30: 9000000000001032 >> r31: 0xbb7d38 >> >> srr0: 0x102720 k_trap+0x28 >> srr1: 9000000000001032 >> lr: 0x1026f0 u_trap+0x10 >> ctr: 0xff846d78 >> cr: 2000deb0 >> xer: 0 >> dar: f...d50 (lots of f's) >> dsisr: 42000000 >> >> >> >> >> >> >> === >> Mark Millard >> markmi at dsl-only.net >> >> On Sep 20, 2014, at 3:42 PM, Mark Millard > > wrote: >> >> [I corrected the SSR0 in the subject to be SRR0.] >> >> I did miss a register in my list (it matched the shown r30 value). >> And it turns out to probably be very important to interpreting what >> the "show registers" is reporting: >> >> SRR1: 0x9000000000001032 >> >> But bits 43-46 of SRR1 are supposed to indicate which type of Program >> Exception, using a single binary 1 to so. No such 1's are present. >> >> Illegal instruction would have been bit 44 being 1. (PowerPC has the >> upper bit numbered zero and increases from there.) >> >> So the ddb "show registers" is apparently not reporting the status as >> of when the "stopped at 0 illegal instruction 0" happened. Thus other >> things are also likely not from that exact time frame. >> >> >> >> And I misinterpreted the LR value status: The LR value was just left >> over from the restore_kernsrs returning when it finished. Execution >> then flowed into k_trap. Nothing unusual involved. >> >> >> >> >> >> === >> Mark Millard >> markmi@dsl-only.net >> >> On Sep 18, 2014, at 8:57 PM, Mark Millard > > wrote: >> >> I modified DDB to automatically "show registers" even at the early >> "before Copyright" crash time. The end of this note will show the >> /usr/src/sys/ddb/db_script.c diff for the hack. While I also had DDB >> bt, the bt does not actually print a back trace for this context. (It >> might for others.) >> >> The registers give interesting context despite the lack of a back >> trace. I do not know if it will be sufficient to be of much immediate >> help if someone used the information to start looking at the problem. >> >> I'll start with register lr: 0x1026f0 u_trap+0x10. >> >> /usr/src/sys/powerpc/aim/trap_subr64.S has: >> >> s_trap: >> bf 17,k_trap /* branch if PSL_PR is false */ >> GET_CPUINFO(%r1) >> u_trap: >> ld %r1,PC_CURPCB(%r1) >> mr %r27,%r28 /* Save LR, r29 */ >> mtsprg2 %r29 >> bl restore_kernsrs /* enable kernel mapping */ >> mfsprg2 %r29 >> mr %r28,%r27 >> >> /* >> * Now the common trap catching code. >> */ >> k_trap: >> FRAME_SETUP(PC_TEMPSAVE) >> /* Call C interrupt dispatcher: */ >> trapagain: >> >> and so this appears to indicate a pending return to execute the >> "mfsprg2 %r29" after "bl restore_kernsrs", which indicates that >> restore_kernsrs should be active. >> >> But register srr0 indicates: 0x102720 k_trap+0x28. (So apparently in >> FRAME_SETUP(PC_TEMPSAVE) someplace.) >> >> So it appears to me that the processor got to the k_trap code during >> the supposed restore_kernsrs time frame. (But I'm no expert at these >> sorts of things or for the processor.) >> >> I'll list the other register values: >> >> r0: 0 >> r1: 0 >> r2: 0xc1be80 M_AUDITBSM >> r3: 0xb16138 >> r4: 0x8926e8 .ofwcall+0xa8 >> r5: 0 >> r6: 0xbb5f90 >> r7: 0xe3d118 ofw_real_mode >> r8: 0x1 >> r9: 0xe0ce80 __pcpu >> r10: 0x1c35ec9 >> r11: 0 >> r12: 0x10000000 >> r13: db890 thread0 >> r14-r19: all 0 >> r20: 0x10bc000 >> r21: 0x4 >> r22: 0x1801db4 >> r23: 0x1803a28 >> r24: 0xc000000000008760 >> r25: 0xcc6908 smp_no_rendevous_barrier >> r26: 0xec79e0 ofw_rendezvous_dispatch (yep one has v and the other zv) >> r27: 0x8926e8 .ofwcall+0xa8 >> r28: 0x8926e8 .ofwcall+0xa8 (yep: same value) >> r29: 0x24000022 >> r30: 0x9000000000001032 >> r31: 0xc7f488 vop_unlock_desc >> >> ctr: 0xff846d78 >> cr: 0x2000d7b0 >> xer: 0 >> dar: 0xfffffffffffffd50 >> dsisr: 0x42000000 >> >> (Hopefully this manual transcription from the screen display is >> complete --and also accurate for what it does present.) >> >> >> >> >> The personal HACK to /usr/src/sys/ddb/db_script.c's >> db_script_kdbenter(...) to have it show registers and try bt... >> >> $ cd /usr/src/sys/ddb/ >> $ svnlite diff . >> Index: db_script.c >> =================================================================== >> --- db_script.c(revision 271610) >> +++ db_script.c(working copy) >> @@ -319,10 +319,25 @@ >> { >> char scriptname[DB_MAXSCRIPTNAME]; >> >> +/* HACK!!! : Additional lines to force a basic default script to exist. >> +* Will dump information even if ddb input is not available for early >> crash. >> +* Used to get more information about PowerMac G5 "before Copyright" >> hangs. >> +*/ >> +struct ddb_script *dsp = db_script_lookup(DB_SCRIPT_KDBENTER_DEFAULT); >> +if (!dsp) db_script_set(DB_SCRIPT_KDBENTER_DEFAULT, "show registers; >> bt"); >> + >> snprintf(scriptname, sizeof(scriptname), "%s.%s", >> DB_SCRIPT_KDBENTER_PREFIX, eventname); >> if (db_script_exec(scriptname, 0) == ENOENT) >> (void)db_script_exec(DB_SCRIPT_KDBENTER_DEFAULT, 0); >> + >> +/* HACK!!! : Additional lines to always use the default script, >> +* even if scriptname existed and was executed. >> +* Will dump information even if ddb input is not available for early >> crash. >> +* Used to get more information about PowerMac G5 "before Copyright" >> hangs. >> +*/ >> +else >> +(void)db_script_exec(DB_SCRIPT_KDBENTER_DEFAULT, 0); >> } >> >> /*- >> >> >> >> === >> Mark Millard >> markmi at dsl-only.net >> >> On Sep 16, 2014, at 9:28 PM, Mark Millard > > wrote: >> >> In part I sent directly to you because of a past exchange (July-27) >> where you had written: >> >>> Nathan and I both speculate that it's >>> dropping into Open Firmware (we make extensive use of OFW), and then >>> messing something up, taking a page fault or something. >> >> The specific text that I report and its uniformity when it is >> produced seems to add a little information beyond a speculated "page >> fault or something" and so might eventually help a little. As I >> understand the text it is reporting execution reaching address zero >> without any prior un-handled exceptions or other such that would stop >> it. A corrupted stack (pointer) so a bad return address or some such? >> I'd guess there are no explicit jumps to address zero so I expect >> that indirection is likely involved, with the content for the >> indirection messed up. >> >> I really wish that I had a logic analyzer configuration for this. >> I've not found a way to make the failing context visible so far and >> the extra way of looking at things might have helped. >> >> >> >> >> === >> Mark Millard >> markmi@dsl-only.net >> >> On Sep 16, 2014, at 8:28 PM, Justin Hibbits > > wrote: >> >> Hi mark, >> >> I see this on my G5, and I think it's due to the amount of RAM in the >> machine. More than 4gb seems to confuse open firmware when called by >> FreeBSD. There is some effort to remove the need of the callbacks but >> thus far it's not far along. The good news is that after it boots >> it's solid except when switching vtys, buy earlier this year or last >> year I added a sysctl hack to disable the call into open firmware on >> vty switch (don't recall offhand and not at my computer right now, >> but if you grep the sysctl output for reset and ofw you can find it). >> >> -Justin >> >> On Sep 16, 2014 8:01 PM, "Mark Millard" > > wrote: >> >> I've now spent time with rebooting and power-off/power-on for all >> 3 PowerMac G5's (one PowerMac7,2 and two PowerMac11,2's) and all >> 3 get the >> >>> GDB: no debug ports present >>> KDB: debugger backends: DDB >>> KDB: current backend: DDB >>> [ thread pid -1 tid 1006665719 ] >>> Stopped at 0: illegal instruction 0 >>> db> >> >> when they fail just before the Copyright notice would normally be >> displayed. None fail any earlier. At that spot none have failed >> any other way. It is the same SSD in all 3. (Happens with other >> SSD's as well.) Overall there is a mix of Radeon and NVIDIA >> display boards. Besides the SSD use and RAM upgrades the rest is >> stock equipment. scons used, not vt. (I've yet to try vt.) >> >> Seeing a failure after the Copyright notice as been fairly rare >> in all my experiments from when I started last April or so. The >> ones that I've noted had Data Storage Interrupt reported. So far >> no examples of the above have been reported after the Copyright >> notice. So I'd guess that they are separate issues. Of course it >> seems that only in the last few days would I have seen the above >> sort of thing if it did happen after the Copyright notice: The >> prior history does not count for judgements about that. >> >> === >> Mark Millard >> markmi at dsl-only.net >> >> On Sep 16, 2014, at 8:15 AM, Mark Millard > > wrote: >> >> Using 10.1-BETA1 I added "options DDB" and "options GDB" to >> powerpc64's GENERIC64. (I also used WITH_DEBUG_FILES=, >> WITHOUT_CLANG=, and WITH_DEBUG= in /etc/make.conf.) So >> buildworld, kernel was basically just set up to have more of a >> debugging context around (including for any ports builds). >> >> The result was new information about the PowerMac G5 boot >> hangups: The screen is no longer blank when the G5 is hung up >> without there being a Copyright notice yet. It says... >> >>> GDB: no debug ports present >>> KDB: debugger backends: DDB >>> KDB: current backend: DDB >>> [ thread pid -1 tid 1006665719 ] >>> Stopped at 0: illegal instruction 0 >>> db> >> >> (I had no ability to input at that point.) Normally the Copyright >> notice would have displayed instead of "[...]" and what follows. >> (I do not claim to have all the spacing, capitalization, and such >> correct above.) >> >> That text is constant from hang to hang when it hangs just before >> it would normally output the Copyright notice: The numbers do not >> vary, much less the other text. It has never failed until after >> the two KDB messages are present. So far I've only tested one >> PowerMac G5, booting over and over for a few hours. >> >> >> >> (I do not claim to be set up for remote kernel debugging. I just >> decided to let GDB go along for the ride when I added DDB.) >> >> === >> Mark Millard >> markmi at dsl-only.net >> >> >> >> >> >> > > > >