Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 25 Sep 2014 14:08:55 -0700
From:      Nathan Whitehorn <nwhitehorn@freebsd.org>
To:        Mark Millard <markmi@dsl-only.net>
Cc:        Justin Hibbits <chmeeedalf@gmail.com>, FreeBSD PowerPC ML <freebsd-ppc@freebsd.org>
Subject:   Re: lr=u_trap+0x10 and srr0=k_trap+0x28 for "stopped at 0 illegal instruction 0" before-copyright hang on PowerMac G5's
Message-ID:  <54248467.4050900@freebsd.org>
In-Reply-To: <0DF8A9EC-C81C-4E15-9420-6831BA7D5F8E@dsl-only.net>
References:  <1118046C-0FF7-49FC-82DA-DB9A7A310991@dsl-only.net> <2ED3DB50-B985-4382-8FF2-3B44E7E65453@dsl-only.net> <CAHSQbTBXxrgXQdNeCs=C5wJaT_bmh9FU836O6VnJDbJuqCUujw@mail.gmail.com> <6D729F43-662A-429E-9503-0148EC3250B1@dsl-only.net> <72535F89-3942-45A6-B351-7F746209ED9F@dsl-only.net> <0703EF26-6E33-4446-9273-BBFD0CB72893@dsl-only.net> <37575F94-763C-43BF-8DD9-F648F4A7C09F@dsl-only.net> <5422E513.6010806@freebsd.org> <1C02D0D4-14B8-465F-B493-4D3A64E4C35C@dsl-only.net> <FDA60573-A9BD-4793-8273-22E960805CFF@dsl-only.net> <0DF8A9EC-C81C-4E15-9420-6831BA7D5F8E@dsl-only.net>

next in thread | previous in thread | raw e-mail | index | archive | help
Can you comment out the call to quiesce? It may not be necessary on your 
system.
-Nathan

On 09/25/14 13:17, Mark Millard wrote:
> The "before copyright" hang/exception is during the first openfirmware 
> "peer" after "quiesce". The ofw_restore_trap_vec(save_trap_init) 
> completes fine, the ofwcall(args) is made but it does not return normally.
>
> Ignoring the ofwcall's from before quiesce, the sequence of ofwcall's is:
>
> quiesce
> finddevice
> parent
> getprop
> getprop
> getprop
> finddevice
> getprop
> instance-to-package
> getproplen
> finddevice
> getprop
> getprop
> peer
>
> And when the boot fails before the copyright that ofwcall for peer 
> ends up resulting in the register dump with no register pointing to 
> the kernel's normal stack area.
>
> I still have no clue what is happening during peer. 
> ofw_restore_trap_vec(save_trap_init) is being called and is returning 
> before ofwcall is used. For all I know some uses of peer could require 
> not being quiesce'd in order for peer to be reliable.
>
> In the form of my display indicating what executed the text reported 
> ends in:
>
> <peer>^
>
> where the ^ indicates the stage that last completed in the call 
> sequence inside openfirmware_core. This information is displayed by the
>
> x/s ofw_name_history
>
> in the automatically created default script for DDB. I read the 
> sequence backwards from the end marker (here ^), following the 
> wraparound if there is that much text and if I care to go back that far.
>
> FreeBSD FBSDG5M1 10.1-BETA2 FreeBSD 10.1-BETA2 #11 r271944M: Thu Sep 
> 25 12:14:05 PDT 2014 root@FBSDG5M1:/usr/obj/usr/src/sys/GENERIC64  powerpc
>
> My current hacks to get this information are:
>
> Index: /usr/src/sys/ddb/db_script.c
> ===================================================================
> --- /usr/src/sys/ddb/db_script.c(revision 271944)
> +++ /usr/src/sys/ddb/db_script.c(working copy)
> @@ -319,10 +319,25 @@
>  {
> char scriptname[DB_MAXSCRIPTNAME];
>
>
> +/* HACK!!! : Additional lines to force a basic default script to exist.
> +* Will dump information even if ddb input is not available for early 
> crash.
> +* Used to get more information about PowerMac G5 "before Copyright" 
> hangs.
> +*/
> +struct ddb_script *dsp = db_script_lookup(DB_SCRIPT_KDBENTER_DEFAULT);
> +if (!dsp) db_script_set(DB_SCRIPT_KDBENTER_DEFAULT, "show registers; 
> bt; x/s ofw_name_history");
> +
> snprintf(scriptname, sizeof(scriptname), "%s.%s",
> DB_SCRIPT_KDBENTER_PREFIX, eventname);
> if (db_script_exec(scriptname, 0) == ENOENT)
> (void)db_script_exec(DB_SCRIPT_KDBENTER_DEFAULT, 0);
> +
> +/* HACK!!! : Additional lines to always use the default script,
> +*       even if scriptname existed and was executed.
> +* Will dump information even if ddb input is not available for early 
> crash.
> +* Used to get more information about PowerMac G5 "before Copyright" 
> hangs.
> +*/
> +else
> +(void)db_script_exec(DB_SCRIPT_KDBENTER_DEFAULT, 0);
>  }
>
>
>  /*-
> Index: /usr/src/sys/powerpc/conf/GENERIC64
> ===================================================================
> --- /usr/src/sys/powerpc/conf/GENERIC64(revision 271944)
> +++ /usr/src/sys/powerpc/conf/GENERIC64(working copy)
> @@ -76,6 +76,8 @@
>  # Debugging support.  Always need this:
>  options KDB# Enable kernel debugger support.
>  options KDB_TRACE# Print a stack trace for a panic.
> +options DDB
> +options GDB
>
>
>  # Make an SMP-capable kernel by default
>  options SMP# Symmetric MultiProcessor Kernel
> Index: /usr/src/sys/powerpc/ofw/ofw_machdep.c
> ===================================================================
> --- /usr/src/sys/powerpc/ofw/ofw_machdep.c(revision 271944)
> +++ /usr/src/sys/powerpc/ofw/ofw_machdep.c(working copy)
> @@ -324,6 +324,12 @@
> openfirmware(&args);
>  }
>
>
> +/* Part of HACK to have record of ofw call names */
> +#define ofw_name_history_record_size 256
> +char ofw_name_history[ofw_name_history_record_size+1] = {}; /* 
> Initially: automatically '\0' filled */
> +char * ofw_name_history_pos = ofw_name_history;
> +/* End Part of HACK */
> +
>  static int
>  openfirmware_core(void *args)
>  {
> @@ -330,6 +336,42 @@
> intresult;
> register_toldmsr;
>
>
> +{ /* HACK to have record of ofw call names */
> +struct argtype_prefix {
> +cell_t name;
> +};
> +
> +char *name = (char*) (uintptr_t) (((struct argtype_prefix*)args)->name);
> +
> +int i;
> +
> +*ofw_name_history_pos = '<';
> +
> +for(i=0; (*name) && i!=20; i++) {
> +ofw_name_history_pos++;
> +if (ofw_name_history_pos == 
> &ofw_name_history[ofw_name_history_record_size]) {
> +ofw_name_history_pos = ofw_name_history;
> +}
> +*ofw_name_history_pos = *name;
> +
> +name++;
> +}
> +
> +ofw_name_history_pos++;
> +if (ofw_name_history_pos == 
> &ofw_name_history[ofw_name_history_record_size]) {
> +ofw_name_history_pos = ofw_name_history;
> +}
> +*ofw_name_history_pos = '>';
> +
> +ofw_name_history_pos++;
> +if (ofw_name_history_pos == 
> &ofw_name_history[ofw_name_history_record_size]) {
> +ofw_name_history_pos = ofw_name_history;
> +}
> +*ofw_name_history_pos = '@';
> +
> +ofw_name_history[ofw_name_history_record_size] = '\0'; /* Paranoia */
> +} /* HACK end */
> +
> /*
> * Turn off exceptions - we really don't want to end up
> * anywhere unexpected with PCPU set to something strange
> @@ -337,14 +379,22 @@
> */
> oldmsr = intr_disable();
>
>
> +*ofw_name_history_pos = '#'; /* HACK */
> +
> ofw_sprg_prepare();
>
>
> +*ofw_name_history_pos = '$'; /* HACK */
> +
> /* Save trap vectors */
> ofw_save_trap_vec(save_trap_of);
>
>
> +*ofw_name_history_pos = '%'; /* HACK */
> +
> /* Restore initially saved trap vectors */
> ofw_restore_trap_vec(save_trap_init);
>
>
> +*ofw_name_history_pos = '^'; /* HACK */
> +
>  #if defined(AIM) && !defined(__powerpc64__)
> /*
> * Clear battable[] translations
> @@ -357,13 +407,21 @@
>
>
> result = ofwcall(args);
>
>
> +*ofw_name_history_pos = '&'; /* HACK */
> +
> /* Restore trap vecotrs */
> ofw_restore_trap_vec(save_trap_of);
>
>
> +*ofw_name_history_pos = '*'; /* HACK */
> +
> ofw_sprg_restore();
>
>
> +*ofw_name_history_pos = '~'; /* HACK */
> +
> intr_restore(oldmsr);
>
>
> +*ofw_name_history_pos = '!'; /* HACK */
> +
> return (result);
>  }
>
>
>
>
>
> ===
> Mark Millard
> markmi at dsl-only.net <http://dsl-only.net>;
>
> On Sep 25, 2014, at 3:46 AM, Mark Millard <markmi at dsl-only.net 
> <http://dsl-only.net>>; wrote:
>
> One source code oddity that I notice is the following mixed use of 
> ofw_real_mode: always tested vs. never tested (#if 0 ... #endif) ...
>
>> /*
>>  * Saved SPRG0-3 from OpenFirmware. Will be restored prior to the 
>> callback.
>>  */
>> register_t      ofw_sprg0_save;
>>
>> static __inline void
>> ofw_sprg_prepare(void)
>> {
>>         if (ofw_real_mode)
>>                 return;
>>
>>         /*
>>          * Assume that interrupt are disabled at this point, or
>>          * SPRG1-3 could be trashed
>>          */
>>         __asm __volatile("mfsprg0 %0\n\t"
>>                          "mtsprg0 %1\n\t"
>>                          "mtsprg1 %2\n\t"
>>                          "mtsprg2 %3\n\t"
>>                          "mtsprg3 %4\n\t"
>>                          : "=&r"(ofw_sprg0_save)
>>                          : "r"(ofmsr[1]),
>>                          "r"(ofmsr[2]),
>>                          "r"(ofmsr[3]),
>>                          "r"(ofmsr[4]));
>> }
>>
>> static __inline void
>> ofw_sprg_restore(void)
>> {
>> #if 0
>>         if (ofw_real_mode)
>>                 return;
>> #endif
>>
>>         /*
>>          * Note that SPRG1-3 contents are irrelevant. They are scratch
>>          * registers used in the early portion of trap handling when
>>          * interrupts are disabled.
>>          *
>>          * PCPU data cannot be used until this routine is called !
>>          */
>>         __asm __volatile("mtsprg0 %0" :: "r"(ofw_sprg0_save));
>> }
>
> It would seem that for ofw_real_mode != 0 that ofw_sprg_prepare would 
> never set up ofw_sprg0_save (via mfsprg0) for the later 
> ofw_sprg_restore's always-executed mtsprg0 that is based on 
> ofw_sprg0_save.
>
> register_t seems to trace back to __int64_t --and that would leave 
> ofw_sprg0_save initialized to zero as a global and that would have to 
> be okay as the SPRG0 value to restore in such a case. (I have not 
> tracked down what any of the per-processor values for SPRG0 
> are/should-be.)
>
>
>
>
>
>
> ===
> Mark Millard
> markmi at dsl-only.net <http://dsl-only.net/>;
>
> On Sep 25, 2014, at 2:12 AM, Mark Millard <markmi at dsl-only.net 
> <http://dsl-only.net/>>; wrote:
>
> The register dump that has no kernel stack addresses in any registers 
> does have register contents suggesting a ofwcall use, matching up 
> reasonably with the code I looked at that is related to ofwcall. 
> ofwcall is only reached via openfirmware_core from what I can tell. 
> (If there are other paths into openfirmware than via ofwcall then the 
> register dump suggests that they are not in use around the crash.)
>
> And openfirmware_core has logic for exception vector swapping, going 
> both directions:
>
>> static int
>> openfirmware_core(void *args)
>> {
>>         int result;
>>         register_t oldmsr;
>>
>>         /*
>>          * Turn off exceptions - we really don't want to end up
>>          * anywhere unexpected with PCPU set to something strange
>>          * or the stack pointer wrong.
>>          */
>>         oldmsr = intr_disable();
>>
>> ofw_sprg_prepare();
>>
>>         /* Save trap vectors */
>> ofw_save_trap_vec(save_trap_of);
>>
>>         /* Restore initially saved trap vectors */
>> ofw_restore_trap_vec(save_trap_init);
>>
>> #if defined(AIM) && !defined(__powerpc64__)
>>         /*
>>          * Clear battable[] translations
>>          */
>>         if (!(cpu_features & PPC_FEATURE_64))
>>                 __asm __volatile("mtdbatu 2, %0\n"
>>       "mtdbatu 3, %0" : : "r" (0));
>>         isync();
>> #endif
>>
>>         result = ofwcall(args);
>>
>>         /* Restore trap vecotrs */
>> ofw_restore_trap_vec(save_trap_of);
>>
>> ofw_sprg_restore();
>>
>> intr_restore(oldmsr);
>>
>>         return (result);
>> }
>
> In turn openfirmware_core is used only by ofw_rendezvous_dispatch and 
> in turn that is used only by openfirmware: only PCPU_GET(cpuid) == 0 
> does the above. save_trap_init is initialized by powerpc_init using 
> ofw_save_trap_vec.
>
> [Note that ofw_restore_trap_vec uses __syncicache which does not use 
> dcbf after the bcopy but instead uses dcbst: That is part of what lead 
> my investigation into the distinction --and so to my more overall 
> dcbst vs. dcbf use questions after proving dcbf would not be 
> sufficient for a fix to the specific boot issue.]
>
> Unless the initialization of save_trap_init ends up with the wrong 
> contents for openfirmware it would appear that the exception vectors 
> are kept tracking by the above code. But the above does assume that 
> the openfirmware vectors are unchanged after save_trap_init is 
> initialized: there is no attempt at tracking of any potential updates 
> to the openfirmware exception vectors.
>
> I would infer then that after ofw_restore_trap_vec(save_trap_of) is 
> executed is when the exception that DDB reports happened: That is when 
> FreeBSD's exception vectors are again in place. But a stack pointer 
> into the kernel stack is not then in place in any register (based on 
> DDB's register dump): stack handling is messed up already by the point 
> of the reported exception. And that may actually be why an illegal 
> instruction at address zero was reached: an incorrect stack context 
> used to get an address to execute at.
>
>
>
>
> ===
> Mark Millard
> markmi at dsl-only.net <http://dsl-only.net/>;
>
> On Sep 24, 2014, at 8:36 AM, Nathan Whitehorn <nwhitehorn at 
> freebsd.org <http://freebsd.org/>>; wrote:
>
> There shouldn't be any exceptions at that point, nested or otherwise. 
> What I suspect is happening is that Open Firmware has turned them on 
> for some bizarre reason, taken one, and ended up in the kernel's 
> handlers but with the Open Firmware environment. Saving and restoring 
> the OF interrupt vectors would be a possible solution; flattening the 
> device tree in loader so that the kernel doesn't call Open Firmware at 
> all would be another. I think Justin may have tried the first at some 
> point.
> -Nathan
>
> On 09/24/14 02:04, Mark Millard wrote:
>> Now that I've had a kernel/boot crash with a successful DDB bt and 
>> show registers (a different submittal) it makes for a good 
>> comparison/contrast with what DDB reports for this "before copyright" 
>> crash.
>>
>> Something unique to the "before copyright" context is...
>>
>> No registers are reported to have values that point into the range 
>> between tmpstk and esym.
>>
>> In other words: There is no valid stack pointer reported as far as I 
>> can tell. r1 has the value 0 instead of being a handling a valid 
>> stack address. tmpstk=0xbd7000 and esym=0xbdb000 (example for one of 
>> my WITH_DEBUG_FILES= and options DDB and GDB builds of 10.1-BETA2). 
>> That at least gives a ball park on the range to expect for pointing 
>> into the stack even with some build variation.
>>
>> It leaves me wondering if the DDB report is for a nested exception 
>> handling. That could explain why lr points to u_trap+0x10 and srr0 
>> points to k_trap+0x28 when normally srr0 would point to the the 
>> failing instruction (or the instruction after) and lr to where that 
>> routine would normally return to.
>>
>> The register values that are reported for my 10.1-BETA2 builds that 
>> crash before the copyright notice are:
>>
>> r0: 0
>> r1: 0
>> r2: 0xc81538 vop_unlock_desc
>> r3: 0xd18868
>> r4: 0x894b58
>> r5: 0
>> r6: 0xc1dee0 M_AUDITBSM
>> r7: 0xe3f818 ofw_real_mode
>> r8: 0x1
>> r9: 0xe0f580 __pcpu
>> r10: 0x1c35ec0
>> r11: 0
>> r12: 0x10000000
>> r13: 0xdbb290 thread0 (Note: another submittal has this mistyped as 
>> 0xdbb290.)
>> r14-r19: all 0
>> r20: 0x10c1000
>> r21: 0x4
>> r22: 0x180abd4
>> r23: 0x1803a28
>> r24: 0xc000000000008760
>> r25: 0xcc89b8 smp_no...
>> r26: 0xcea108 ofw_rend...
>> r27: 0x894b58 ofwcall+0xa8
>> r28: 0x894b58 ofwcall+0xa8
>> r29: 2400022
>> r30: 9000000000001032
>> r31: 0xbb7d38
>>
>> srr0: 0x102720 k_trap+0x28
>> srr1: 9000000000001032
>> lr: 0x1026f0 u_trap+0x10
>> ctr: 0xff846d78
>> cr: 2000deb0
>> xer: 0
>> dar: f...d50 (lots of f's)
>> dsisr: 42000000
>>
>>
>>
>>
>>
>>
>> ===
>> Mark Millard
>> markmi at dsl-only.net <http://dsl-only.net/>;
>>
>> On Sep 20, 2014, at 3:42 PM, Mark Millard <markmi at dsl-only.net 
>> <http://dsl-only.net/>>; wrote:
>>
>> [I corrected the SSR0 in the subject to be SRR0.]
>>
>> I did miss a register in my list (it matched the shown r30 value). 
>> And it turns out to probably be very important to interpreting what 
>> the "show registers" is reporting:
>>
>> SRR1: 0x9000000000001032
>>
>> But bits 43-46 of SRR1 are supposed to indicate which type of Program 
>> Exception, using a single binary 1 to so. No such 1's are present.
>>
>> Illegal instruction would have been bit 44 being 1. (PowerPC has the 
>> upper bit numbered zero and increases from there.)
>>
>> So the ddb "show registers" is apparently not reporting the status as 
>> of when the "stopped at 0 illegal instruction 0" happened. Thus other 
>> things are also likely not from that exact time frame.
>>
>>
>>
>> And I misinterpreted the LR value status: The LR value was just left 
>> over from the restore_kernsrs returning when it finished. Execution 
>> then flowed into k_trap. Nothing unusual involved.
>>
>>
>>
>>
>>
>> ===
>> Mark Millard
>> markmi@dsl-only.net <mailto:markmi@dsl-only.net>
>>
>> On Sep 18, 2014, at 8:57 PM, Mark Millard <markmi@dsl-only.net 
>> <mailto:markmi@dsl-only.net>> wrote:
>>
>> I modified DDB to automatically "show registers" even at the early 
>> "before Copyright" crash time. The end of this note will show the 
>> /usr/src/sys/ddb/db_script.c diff for the hack. While I also had DDB 
>> bt, the bt does not actually print a back trace for this context. (It 
>> might for others.)
>>
>> The registers give interesting context despite the lack of a back 
>> trace. I do not know if it will be sufficient to be of much immediate 
>> help if someone used the information to start looking at the problem.
>>
>> I'll start with register lr: 0x1026f0 u_trap+0x10.
>>
>> /usr/src/sys/powerpc/aim/trap_subr64.S has:
>>
>> s_trap:
>>         bf 17,k_trap               /* branch if PSL_PR is false */
>> GET_CPUINFO(%r1)
>> u_trap:
>>         ld %r1,PC_CURPCB(%r1)
>>         mr %r27,%r28               /* Save LR, r29 */
>>         mtsprg2 %r29
>>         bl restore_kernsrs         /* enable kernel mapping */
>>         mfsprg2 %r29
>>         mr %r28,%r27
>>
>> /*
>>  * Now the common trap catching code.
>>  */
>> k_trap:
>> FRAME_SETUP(PC_TEMPSAVE)
>> /* Call C interrupt dispatcher: */
>> trapagain:
>>
>> and so this appears to indicate a pending return to execute the 
>> "mfsprg2 %r29" after "bl restore_kernsrs", which indicates that 
>> restore_kernsrs should be active.
>>
>> But register srr0 indicates: 0x102720 k_trap+0x28. (So apparently in 
>> FRAME_SETUP(PC_TEMPSAVE) someplace.)
>>
>> So it appears to me that the processor got to the k_trap code during 
>> the supposed restore_kernsrs time frame. (But I'm no expert at these 
>> sorts of things or for the processor.)
>>
>> I'll list the other register values:
>>
>> r0:  0
>> r1:  0
>> r2:  0xc1be80 M_AUDITBSM
>> r3:  0xb16138
>> r4:  0x8926e8 .ofwcall+0xa8
>> r5:  0
>> r6:  0xbb5f90
>> r7:  0xe3d118 ofw_real_mode
>> r8:  0x1
>> r9:  0xe0ce80 __pcpu
>> r10: 0x1c35ec9
>> r11: 0
>> r12: 0x10000000
>> r13: db890 thread0
>> r14-r19: all 0
>> r20: 0x10bc000
>> r21: 0x4
>> r22: 0x1801db4
>> r23: 0x1803a28
>> r24: 0xc000000000008760
>> r25: 0xcc6908 smp_no_rendevous_barrier
>> r26: 0xec79e0 ofw_rendezvous_dispatch (yep one has v and the other zv)
>> r27: 0x8926e8 .ofwcall+0xa8
>> r28: 0x8926e8 .ofwcall+0xa8 (yep: same value)
>> r29: 0x24000022
>> r30: 0x9000000000001032
>> r31: 0xc7f488 vop_unlock_desc
>>
>> ctr: 0xff846d78
>> cr:  0x2000d7b0
>> xer: 0
>> dar: 0xfffffffffffffd50
>> dsisr: 0x42000000
>>
>> (Hopefully this manual transcription from the screen display is 
>> complete --and also accurate for what it does present.)
>>
>>
>>
>>
>> The personal HACK to /usr/src/sys/ddb/db_script.c's 
>> db_script_kdbenter(...) to have it show registers and try bt...
>>
>> $ cd /usr/src/sys/ddb/
>> $ svnlite diff .
>> Index: db_script.c
>> ===================================================================
>> --- db_script.c(revision 271610)
>> +++ db_script.c(working copy)
>> @@ -319,10 +319,25 @@
>>  {
>> char scriptname[DB_MAXSCRIPTNAME];
>>
>> +/* HACK!!! : Additional lines to force a basic default script to exist.
>> +* Will dump information even if ddb input is not available for early 
>> crash.
>> +* Used to get more information about PowerMac G5 "before Copyright" 
>> hangs.
>> +*/
>> +struct ddb_script *dsp = db_script_lookup(DB_SCRIPT_KDBENTER_DEFAULT);
>> +if (!dsp) db_script_set(DB_SCRIPT_KDBENTER_DEFAULT, "show registers; 
>> bt");
>> +
>> snprintf(scriptname, sizeof(scriptname), "%s.%s",
>>   DB_SCRIPT_KDBENTER_PREFIX, eventname);
>> if (db_script_exec(scriptname, 0) == ENOENT)
>> (void)db_script_exec(DB_SCRIPT_KDBENTER_DEFAULT, 0);
>> +
>> +/* HACK!!! : Additional lines to always use the default script,
>> +*           even if scriptname existed and was executed.
>> +* Will dump information even if ddb input is not available for early 
>> crash.
>> +* Used to get more information about PowerMac G5 "before Copyright" 
>> hangs.
>> +*/
>> +else
>> +(void)db_script_exec(DB_SCRIPT_KDBENTER_DEFAULT, 0);
>>  }
>>
>>  /*-
>>
>>
>>
>> ===
>> Mark Millard
>> markmi at dsl-only.net <http://dsl-only.net/>;
>>
>> On Sep 16, 2014, at 9:28 PM, Mark Millard <markmi at dsl-only.net 
>> <http://dsl-only.net/>>; wrote:
>>
>> In part I sent directly to you because of a past exchange (July-27) 
>> where you had written:
>>
>>> Nathan and I both speculate that it's
>>> dropping into Open Firmware (we make extensive use of OFW), and then
>>> messing something up, taking a page fault or something.
>>
>> The specific text that I report and its uniformity when it is 
>> produced seems to add a little information beyond a speculated "page 
>> fault or something" and so might eventually help a little. As I 
>> understand the text it is reporting execution reaching address zero 
>> without any prior un-handled exceptions or other such that would stop 
>> it. A corrupted stack (pointer) so a bad return address or some such? 
>> I'd guess there are no explicit jumps to address zero so I expect 
>> that indirection is likely involved, with the content for the 
>> indirection messed up.
>>
>> I really wish that I had a logic analyzer configuration for this. 
>> I've not found a way to make the failing context visible so far and 
>> the extra way of looking at things might have helped.
>>
>>
>>
>>
>> ===
>> Mark Millard
>> markmi@dsl-only.net <mailto:markmi@dsl-only.net>
>>
>> On Sep 16, 2014, at 8:28 PM, Justin Hibbits <chmeeedalf@gmail.com 
>> <mailto:chmeeedalf@gmail.com>> wrote:
>>
>> Hi mark,
>>
>> I see this on my G5, and I think it's due to the amount of RAM in the 
>> machine. More than 4gb seems to confuse open firmware when called by 
>> FreeBSD. There is some effort to remove the need of the callbacks but 
>> thus far it's not far along. The good news is that after it boots 
>> it's solid except when switching vtys, buy earlier this year or last 
>> year I added a sysctl hack to disable the call into open firmware on 
>> vty switch (don't recall offhand and not at my computer right now, 
>> but if you grep the sysctl output for reset and ofw you can find it).
>>
>> -Justin
>>
>> On Sep 16, 2014 8:01 PM, "Mark Millard" <markmi@dsl-only.net 
>> <mailto:markmi@dsl-only.net>> wrote:
>>
>>     I've now spent time with rebooting and power-off/power-on for all
>>     3 PowerMac G5's (one PowerMac7,2 and two PowerMac11,2's) and all
>>     3 get the
>>
>>>     GDB: no debug ports present
>>>     KDB: debugger backends: DDB
>>>     KDB: current backend: DDB
>>>     [ thread pid -1 tid 1006665719 ]
>>>     Stopped at 0: illegal instruction 0
>>>     db>
>>
>>     when they fail just before the Copyright notice would normally be
>>     displayed. None fail any earlier. At that spot none have failed
>>     any other way. It is the same SSD in all 3. (Happens with other
>>     SSD's as well.) Overall there is a mix of Radeon and NVIDIA
>>     display boards. Besides the SSD use and RAM upgrades the rest is
>>     stock equipment. scons used, not vt. (I've yet to try vt.)
>>
>>     Seeing a failure after the Copyright notice as been fairly rare
>>     in all my experiments from when I started last April or so. The
>>     ones that I've noted had Data Storage Interrupt reported. So far
>>     no examples of the above have been reported after the Copyright
>>     notice. So I'd guess that they are separate issues. Of course it
>>     seems that only in the last few days would I have seen the above
>>     sort of thing if it did happen after the Copyright notice: The
>>     prior history does not count for judgements about that.
>>
>>     ===
>>     Mark Millard
>>     markmi at dsl-only.net <http://dsl-only.net/>;
>>
>>     On Sep 16, 2014, at 8:15 AM, Mark Millard <markmi@dsl-only.net
>>     <mailto:markmi@dsl-only.net>> wrote:
>>
>>     Using 10.1-BETA1 I added "options DDB" and "options GDB" to
>>     powerpc64's GENERIC64. (I also used WITH_DEBUG_FILES=,
>>     WITHOUT_CLANG=, and WITH_DEBUG= in /etc/make.conf.) So
>>     buildworld, kernel was basically just set up to have more of a
>>     debugging context around (including for any ports builds).
>>
>>     The result was new information about the PowerMac G5 boot
>>     hangups: The screen is no longer blank when the G5 is hung up
>>     without there being a Copyright notice yet. It says...
>>
>>>     GDB: no debug ports present
>>>     KDB: debugger backends: DDB
>>>     KDB: current backend: DDB
>>>     [ thread pid -1 tid 1006665719 ]
>>>     Stopped at 0: illegal instruction 0
>>>     db>
>>
>>     (I had no ability to input at that point.) Normally the Copyright
>>     notice would have displayed instead of "[...]" and what follows.
>>     (I do not claim to have all the spacing, capitalization, and such
>>     correct above.)
>>
>>     That text is constant from hang to hang when it hangs just before
>>     it would normally output the Copyright notice: The numbers do not
>>     vary, much less the other text. It has never failed until after
>>     the two KDB messages are present. So far I've only tested one
>>     PowerMac G5, booting over and over for a few hours.
>>
>>
>>
>>     (I do not claim to be set up for remote kernel debugging. I just
>>     decided to let GDB go along for the ride when I added DDB.)
>>
>>     ===
>>     Mark Millard
>>     markmi at dsl-only.net <http://dsl-only.net/>;
>>
>>
>>
>>
>>
>>
>
>
>
>




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?54248467.4050900>