Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 25 Sep 2014 14:41:29 -0700
From:      Mark Millard <markmi@dsl-only.net>
To:        Nathan Whitehorn <nwhitehorn@freebsd.org>
Cc:        Justin Hibbits <chmeeedalf@gmail.com>, FreeBSD PowerPC ML <freebsd-ppc@freebsd.org>
Subject:   Re: lr=u_trap+0x10 and srr0=k_trap+0x28 for "stopped at 0 illegal instruction 0" before-copyright hang on PowerMac G5's
Message-ID:  <D7D2B0EF-7EEB-48E7-9485-990BACC709EE@dsl-only.net>
In-Reply-To: <54248467.4050900@freebsd.org>
References:  <1118046C-0FF7-49FC-82DA-DB9A7A310991@dsl-only.net> <2ED3DB50-B985-4382-8FF2-3B44E7E65453@dsl-only.net> <CAHSQbTBXxrgXQdNeCs=C5wJaT_bmh9FU836O6VnJDbJuqCUujw@mail.gmail.com> <6D729F43-662A-429E-9503-0148EC3250B1@dsl-only.net> <72535F89-3942-45A6-B351-7F746209ED9F@dsl-only.net> <0703EF26-6E33-4446-9273-BBFD0CB72893@dsl-only.net> <37575F94-763C-43BF-8DD9-F648F4A7C09F@dsl-only.net> <5422E513.6010806@freebsd.org> <1C02D0D4-14B8-465F-B493-4D3A64E4C35C@dsl-only.net> <FDA60573-A9BD-4793-8273-22E960805CFF@dsl-only.net> <0DF8A9EC-C81C-4E15-9420-6831BA7D5F8E@dsl-only.net> <54248467.4050900@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
The first boot after make -8 kernel without quiesce also died during =
peer, I'd guess the same one.

Looks like quiesce does not matter for the issue. (But it is handy for =
identifying which peer fails.)



=3D=3D=3D
Mark Millard
markmi at dsl-only.net

On Sep 25, 2014, at 2:08 PM, Nathan Whitehorn <nwhitehorn at =
freebsd.org> wrote:

Can you comment out the call to quiesce? It may not be necessary on your =
system.
-Nathan

On 09/25/14 13:17, Mark Millard wrote:
> The "before copyright" hang/exception is during the first openfirmware =
"peer" after "quiesce". The ofw_restore_trap_vec(save_trap_init) =
completes fine, the ofwcall(args) is made but it does not return =
normally.
>=20
> Ignoring the ofwcall's from before quiesce, the sequence of ofwcall's =
is:
>=20
> quiesce
> finddevice
> parent
> getprop
> getprop
> getprop
> finddevice
> getprop
> instance-to-package
> getproplen
> finddevice
> getprop
> getprop
> peer
>=20
> And when the boot fails before the copyright that ofwcall for peer =
ends up resulting in the register dump with no register pointing to the =
kernel's normal stack area.
>=20
> I still have no clue what is happening during peer. =
ofw_restore_trap_vec(save_trap_init) is being called and is returning =
before ofwcall is used. For all I know some uses of peer could require =
not being quiesce'd in order for peer to be reliable.
>=20
> In the form of my display indicating what executed the text reported =
ends in:
>=20
> <peer>^
>=20
> where the ^ indicates the stage that last completed in the call =
sequence inside openfirmware_core. This information is displayed by the
>=20
> x/s ofw_name_history
>=20
> in the automatically created default script for DDB. I read the =
sequence backwards from the end marker (here ^), following the =
wraparound if there is that much text and if I care to go back that far.
>=20
> FreeBSD FBSDG5M1 10.1-BETA2 FreeBSD 10.1-BETA2 #11 r271944M: Thu Sep =
25 12:14:05 PDT 2014     root@FBSDG5M1:/usr/obj/usr/src/sys/GENERIC64  =
powerpc
>=20
> My current hacks to get this information are:
>=20
> Index: /usr/src/sys/ddb/db_script.c
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> --- /usr/src/sys/ddb/db_script.c (revision 271944)
> +++ /usr/src/sys/ddb/db_script.c (working copy)
> @@ -319,10 +319,25 @@
>  {
>   char scriptname[DB_MAXSCRIPTNAME];
> =20
> + /* HACK!!! : Additional lines to force a basic default script to =
exist.
> +  * Will dump information even if ddb input is not available for =
early crash.
> +  * Used to get more information about PowerMac G5 "before Copyright" =
hangs.
> +  */
> + struct ddb_script *dsp =3D =
db_script_lookup(DB_SCRIPT_KDBENTER_DEFAULT);
> + if (!dsp) db_script_set(DB_SCRIPT_KDBENTER_DEFAULT, "show registers; =
bt; x/s ofw_name_history");
> +
>   snprintf(scriptname, sizeof(scriptname), "%s.%s",
>       DB_SCRIPT_KDBENTER_PREFIX, eventname);
>   if (db_script_exec(scriptname, 0) =3D=3D ENOENT)
>   (void)db_script_exec(DB_SCRIPT_KDBENTER_DEFAULT, 0);
> +
> + /* HACK!!! : Additional lines to always use the default script,
> +  *           even if scriptname existed and was executed.
> +  * Will dump information even if ddb input is not available for =
early crash.
> +  * Used to get more information about PowerMac G5 "before Copyright" =
hangs.
> +  */
> + else
> + (void)db_script_exec(DB_SCRIPT_KDBENTER_DEFAULT, 0);
>  }
> =20
>  /*-
> Index: /usr/src/sys/powerpc/conf/GENERIC64
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> --- /usr/src/sys/powerpc/conf/GENERIC64 (revision 271944)
> +++ /usr/src/sys/powerpc/conf/GENERIC64 (working copy)
> @@ -76,6 +76,8 @@
>  # Debugging support.  Always need this:
>  options   KDB # Enable kernel debugger support.
>  options   KDB_TRACE # Print a stack trace for a panic.
> +options   DDB
> +options   GDB
> =20
>  # Make an SMP-capable kernel by default
>  options   SMP # Symmetric MultiProcessor Kernel
> Index: /usr/src/sys/powerpc/ofw/ofw_machdep.c
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> --- /usr/src/sys/powerpc/ofw/ofw_machdep.c (revision 271944)
> +++ /usr/src/sys/powerpc/ofw/ofw_machdep.c (working copy)
> @@ -324,6 +324,12 @@
>   openfirmware(&args);
>  }
> =20
> +/* Part of HACK to have record of ofw call names */
> +#define ofw_name_history_record_size 256
> +char ofw_name_history[ofw_name_history_record_size+1] =3D {}; /* =
Initially: automatically '\0' filled */
> +char * ofw_name_history_pos =3D ofw_name_history;
> +/* End Part of HACK */
> +
>  static int
>  openfirmware_core(void *args)
>  {
> @@ -330,6 +336,42 @@
>   int result;
>   register_t oldmsr;
> =20
> + { /* HACK to have record of ofw call names */
> + struct argtype_prefix {
> + cell_t name;
> + };
> +
> + char *name =3D (char*) (uintptr_t) (((struct =
argtype_prefix*)args)->name);
> +=20
> + int i;
> +
> + *ofw_name_history_pos =3D '<';
> +
> + for(i=3D0; (*name) && i!=3D20; i++) {
> + ofw_name_history_pos++;
> + if (ofw_name_history_pos =3D=3D =
&ofw_name_history[ofw_name_history_record_size]) {
> + ofw_name_history_pos =3D ofw_name_history;
> + }
> + *ofw_name_history_pos =3D *name;
> +
> + name++;
> + }
> +
> + ofw_name_history_pos++;
> + if (ofw_name_history_pos =3D=3D =
&ofw_name_history[ofw_name_history_record_size]) {
> + ofw_name_history_pos =3D ofw_name_history;
> + }
> + *ofw_name_history_pos =3D '>';
> +
> + ofw_name_history_pos++;
> + if (ofw_name_history_pos =3D=3D =
&ofw_name_history[ofw_name_history_record_size]) {
> + ofw_name_history_pos =3D ofw_name_history;
> + }
> + *ofw_name_history_pos =3D '@';
> +
> + ofw_name_history[ofw_name_history_record_size] =3D '\0'; /* Paranoia =
*/
> + } /* HACK end */
> +
>   /*
>    * Turn off exceptions - we really don't want to end up
>    * anywhere unexpected with PCPU set to something strange
> @@ -337,14 +379,22 @@
>    */
>   oldmsr =3D intr_disable();
> =20
> + *ofw_name_history_pos =3D '#'; /* HACK */
> +
>   ofw_sprg_prepare();
> =20
> + *ofw_name_history_pos =3D '$'; /* HACK */
> +
>   /* Save trap vectors */
>   ofw_save_trap_vec(save_trap_of);
> =20
> + *ofw_name_history_pos =3D '%'; /* HACK */
> +
>   /* Restore initially saved trap vectors */
>   ofw_restore_trap_vec(save_trap_init);
> =20
> + *ofw_name_history_pos =3D '^'; /* HACK */
> +
>  #if defined(AIM) && !defined(__powerpc64__)
>   /*
>    * Clear battable[] translations
> @@ -357,13 +407,21 @@
> =20
>   result =3D ofwcall(args);
> =20
> + *ofw_name_history_pos =3D '&'; /* HACK */
> +
>   /* Restore trap vecotrs */
>   ofw_restore_trap_vec(save_trap_of);
> =20
> + *ofw_name_history_pos =3D '*'; /* HACK */
> +
>   ofw_sprg_restore();
> =20
> + *ofw_name_history_pos =3D '~'; /* HACK */
> +
>   intr_restore(oldmsr);
> =20
> + *ofw_name_history_pos =3D '!'; /* HACK */
> +
>   return (result);
>  }
>=20
>=20
>=20
>=20
>=20
> =3D=3D=3D
> Mark Millard
> markmi at dsl-only.net
>=20
> On Sep 25, 2014, at 3:46 AM, Mark Millard <markmi at dsl-only.net> =
wrote:
>=20
> One source code oddity that I notice is the following mixed use of =
ofw_real_mode: always tested vs. never tested (#if 0 ... #endif) ...
>=20
>> /*
>>  * Saved SPRG0-3 from OpenFirmware. Will be restored prior to the =
callback.
>>  */
>> register_t      ofw_sprg0_save;
>>=20
>> static __inline void
>> ofw_sprg_prepare(void)
>> {
>>         if (ofw_real_mode)
>>                 return;
>>=20
>>         /*
>>          * Assume that interrupt are disabled at this point, or
>>          * SPRG1-3 could be trashed
>>          */
>>         __asm __volatile("mfsprg0 %0\n\t"
>>                          "mtsprg0 %1\n\t"
>>                          "mtsprg1 %2\n\t"
>>                          "mtsprg2 %3\n\t"
>>                          "mtsprg3 %4\n\t"
>>                          : "=3D&r"(ofw_sprg0_save)
>>                          : "r"(ofmsr[1]),
>>                          "r"(ofmsr[2]),
>>                          "r"(ofmsr[3]),
>>                          "r"(ofmsr[4]));
>> }
>>        =20
>> static __inline void
>> ofw_sprg_restore(void)
>> {
>> #if 0
>>         if (ofw_real_mode)
>>                 return;
>> #endif
>>=20
>>         /*
>>          * Note that SPRG1-3 contents are irrelevant. They are =
scratch
>>          * registers used in the early portion of trap handling when
>>          * interrupts are disabled.
>>          *
>>          * PCPU data cannot be used until this routine is called !
>>          */
>>         __asm __volatile("mtsprg0 %0" :: "r"(ofw_sprg0_save));
>> }
>=20
> It would seem that for ofw_real_mode !=3D 0 that ofw_sprg_prepare =
would never set up ofw_sprg0_save (via mfsprg0) for the later =
ofw_sprg_restore's always-executed mtsprg0 that is based on =
ofw_sprg0_save.
>=20
> register_t seems to trace back to __int64_t --and that would leave =
ofw_sprg0_save initialized to zero as a global and that would have to be =
okay as the SPRG0 value to restore in such a case. (I have not tracked =
down what any of the per-processor values for SPRG0 are/should-be.)
>=20
>=20
>=20
>=20
>=20
>=20
> =3D=3D=3D
> Mark Millard
> markmi at dsl-only.net
>=20
> On Sep 25, 2014, at 2:12 AM, Mark Millard <markmi at dsl-only.net> =
wrote:
>=20
> The register dump that has no kernel stack addresses in any registers =
does have register contents suggesting a ofwcall use, matching up =
reasonably with the code I looked at that is related to ofwcall. ofwcall =
is only reached via openfirmware_core from what I can tell. (If there =
are other paths into openfirmware than via ofwcall then the register =
dump suggests that they are not in use around the crash.)
>=20
> And openfirmware_core has logic for exception vector swapping, going =
both directions:
>=20
>> static int
>> openfirmware_core(void *args)
>> {
>>         int             result;
>>         register_t      oldmsr;
>> =20
>>         /*
>>          * Turn off exceptions - we really don't want to end up
>>          * anywhere unexpected with PCPU set to something strange
>>          * or the stack pointer wrong.
>>          */
>>         oldmsr =3D intr_disable();
>> =20
>>         ofw_sprg_prepare();
>> =20
>>         /* Save trap vectors */
>>         ofw_save_trap_vec(save_trap_of);
>> =20
>>         /* Restore initially saved trap vectors */
>>         ofw_restore_trap_vec(save_trap_init);
>> =20
>> #if defined(AIM) && !defined(__powerpc64__)
>>         /*
>>          * Clear battable[] translations
>>          */
>>         if (!(cpu_features & PPC_FEATURE_64))
>>                 __asm __volatile("mtdbatu 2, %0\n"
>>                                  "mtdbatu 3, %0" : : "r" (0));
>>         isync();
>> #endif
>>=20
>>         result =3D ofwcall(args);
>>=20
>>         /* Restore trap vecotrs */
>>         ofw_restore_trap_vec(save_trap_of);
>>=20
>>         ofw_sprg_restore();
>>=20
>>         intr_restore(oldmsr);
>>=20
>>         return (result);
>> }
>=20
> In turn openfirmware_core is used only by ofw_rendezvous_dispatch and =
in turn that is used only by openfirmware: only PCPU_GET(cpuid) =3D=3D 0 =
does the above. save_trap_init is initialized by powerpc_init using =
ofw_save_trap_vec.
>=20
> [Note that ofw_restore_trap_vec uses __syncicache which does not use =
dcbf after the bcopy but instead uses dcbst: That is part of what lead =
my investigation into the distinction --and so to my more overall dcbst =
vs. dcbf use questions after proving dcbf would not be sufficient for a =
fix to the specific boot issue.]
>=20
> Unless the initialization of save_trap_init ends up with the wrong =
contents for openfirmware it would appear that the exception vectors are =
kept tracking by the above code. But the above does assume that the =
openfirmware vectors are unchanged after save_trap_init is initialized: =
there is no attempt at tracking of any potential updates to the =
openfirmware exception vectors.
>=20
> I would infer then that after ofw_restore_trap_vec(save_trap_of) is =
executed is when the exception that DDB reports happened: That is when =
FreeBSD's exception vectors are again in place. But a stack pointer into =
the kernel stack is not then in place in any register (based on DDB's =
register dump): stack handling is messed up already by the point of the =
reported exception. And that may actually be why an illegal instruction =
at address zero was reached: an incorrect stack context used to get an =
address to execute at.
>=20
>=20
>=20
>=20
> =3D=3D=3D
> Mark Millard
> markmi at dsl-only.net
>=20
> On Sep 24, 2014, at 8:36 AM, Nathan Whitehorn <nwhitehorn at =
freebsd.org> wrote:
>=20
> There shouldn't be any exceptions at that point, nested or otherwise. =
What I suspect is happening is that Open Firmware has turned them on for =
some bizarre reason, taken one, and ended up in the kernel's handlers =
but with the Open Firmware environment. Saving and restoring the OF =
interrupt vectors would be a possible solution; flattening the device =
tree in loader so that the kernel doesn't call Open Firmware at all =
would be another. I think Justin may have tried the first at some point.
> -Nathan
>=20
> On 09/24/14 02:04, Mark Millard wrote:
>> Now that I've had a kernel/boot crash with a successful DDB bt and =
show registers (a different submittal) it makes for a good =
comparison/contrast with what DDB reports for this "before copyright" =
crash.
>>=20
>> Something unique to the "before copyright" context is...
>>=20
>> No registers are reported to have values that point into the range =
between tmpstk and esym.
>>=20
>> In other words: There is no valid stack pointer reported as far as I =
can tell. r1 has the value 0 instead of being a handling a valid stack =
address. tmpstk=3D0xbd7000 and esym=3D0xbdb000 (example for one of my =
WITH_DEBUG_FILES=3D and options DDB and GDB builds of 10.1-BETA2). That =
at least gives a ball park on the range to expect for pointing into the =
stack even with some build variation.
>>=20
>> It leaves me wondering if the DDB report is for a nested exception =
handling. That could explain why lr points to u_trap+0x10 and srr0 =
points to k_trap+0x28 when normally srr0 would point to the the failing =
instruction (or the instruction after) and lr to where that routine =
would normally return to.
>>=20
>> The register values that are reported for my 10.1-BETA2 builds that =
crash before the copyright notice are:
>>=20
>> r0: 0
>> r1: 0
>> r2: 0xc81538 vop_unlock_desc
>> r3: 0xd18868
>> r4: 0x894b58
>> r5: 0
>> r6: 0xc1dee0 M_AUDITBSM
>> r7: 0xe3f818 ofw_real_mode
>> r8: 0x1
>> r9: 0xe0f580 __pcpu
>> r10: 0x1c35ec0
>> r11: 0
>> r12: 0x10000000
>> r13: 0xdbb290 thread0 (Note: another submittal has this mistyped as =
0xdbb290.)
>> r14-r19: all 0
>> r20: 0x10c1000
>> r21: 0x4
>> r22: 0x180abd4
>> r23: 0x1803a28
>> r24: 0xc000000000008760
>> r25: 0xcc89b8 smp_no...
>> r26: 0xcea108 ofw_rend...
>> r27: 0x894b58 ofwcall+0xa8
>> r28: 0x894b58 ofwcall+0xa8
>> r29: 2400022
>> r30: 9000000000001032
>> r31: 0xbb7d38
>>=20
>> srr0: 0x102720 k_trap+0x28
>> srr1: 9000000000001032
>> lr: 0x1026f0 u_trap+0x10
>> ctr: 0xff846d78
>> cr: 2000deb0
>> xer: 0
>> dar: f...d50 (lots of f's)
>> dsisr: 42000000
>>=20
>>=20
>>=20
>>=20
>>=20
>>=20
>> =3D=3D=3D
>> Mark Millard
>> markmi at dsl-only.net
>>=20
>> On Sep 20, 2014, at 3:42 PM, Mark Millard <markmi at dsl-only.net> =
wrote:
>>=20
>> [I corrected the SSR0 in the subject to be SRR0.]
>>=20
>> I did miss a register in my list (it matched the shown r30 value). =
And it turns out to probably be very important to interpreting what the =
"show registers" is reporting:
>>=20
>> SRR1: 0x9000000000001032
>>=20
>> But bits 43-46 of SRR1 are supposed to indicate which type of Program =
Exception, using a single binary 1 to so. No such 1's are present.
>>=20
>> Illegal instruction would have been bit 44 being 1. (PowerPC has the =
upper bit numbered zero and increases from there.)
>>=20
>> So the ddb "show registers" is apparently not reporting the status as =
of when the "stopped at 0 illegal instruction 0" happened. Thus other =
things are also likely not from that exact time frame.
>>=20
>>=20
>>=20
>> And I misinterpreted the LR value status: The LR value was just left =
over from the restore_kernsrs returning when it finished. Execution then =
flowed into k_trap. Nothing unusual involved.
>>=20
>>=20
>>=20
>>=20
>>=20
>> =3D=3D=3D
>> Mark Millard
>> markmi@dsl-only.net
>>=20
>> On Sep 18, 2014, at 8:57 PM, Mark Millard <markmi@dsl-only.net> =
wrote:
>>=20
>> I modified DDB to automatically "show registers" even at the early =
"before Copyright" crash time. The end of this note will show the =
/usr/src/sys/ddb/db_script.c diff for the hack. While I also had DDB bt, =
the bt does not actually print a back trace for this context. (It might =
for others.)
>>=20
>> The registers give interesting context despite the lack of a back =
trace. I do not know if it will be sufficient to be of much immediate =
help if someone used the information to start looking at the problem.
>>=20
>> I'll start with register lr: 0x1026f0 u_trap+0x10.
>>=20
>> /usr/src/sys/powerpc/aim/trap_subr64.S has:
>>=20
>> s_trap:
>>         bf      17,k_trap               /* branch if PSL_PR is false =
*/
>>         GET_CPUINFO(%r1)
>> u_trap:
>>         ld      %r1,PC_CURPCB(%r1)
>>         mr      %r27,%r28               /* Save LR, r29 */
>>         mtsprg2 %r29
>>         bl      restore_kernsrs         /* enable kernel mapping */
>>         mfsprg2 %r29
>>         mr      %r28,%r27
>>=20
>> /*
>>  * Now the common trap catching code.
>>  */
>> k_trap:
>>         FRAME_SETUP(PC_TEMPSAVE)
>> /* Call C interrupt dispatcher: */
>> trapagain:
>>=20
>> and so this appears to indicate a pending return to execute the =
"mfsprg2 %r29" after "bl restore_kernsrs", which indicates that =
restore_kernsrs should be active.
>>=20
>> But register srr0 indicates: 0x102720 k_trap+0x28. (So apparently in =
FRAME_SETUP(PC_TEMPSAVE) someplace.)
>>=20
>> So it appears to me that the processor got to the k_trap code during =
the supposed restore_kernsrs time frame. (But I'm no expert at these =
sorts of things or for the processor.)
>>=20
>> I'll list the other register values:
>>=20
>> r0:  0
>> r1:  0
>> r2:  0xc1be80 M_AUDITBSM
>> r3:  0xb16138
>> r4:  0x8926e8 .ofwcall+0xa8
>> r5:  0
>> r6:  0xbb5f90
>> r7:  0xe3d118 ofw_real_mode
>> r8:  0x1
>> r9:  0xe0ce80 __pcpu
>> r10: 0x1c35ec9
>> r11: 0
>> r12: 0x10000000
>> r13: db890    thread0
>> r14-r19: all 0
>> r20: 0x10bc000
>> r21: 0x4
>> r22: 0x1801db4
>> r23: 0x1803a28
>> r24: 0xc000000000008760
>> r25: 0xcc6908 smp_no_rendevous_barrier
>> r26: 0xec79e0 ofw_rendezvous_dispatch (yep one has v and the other =
zv)
>> r27: 0x8926e8 .ofwcall+0xa8
>> r28: 0x8926e8 .ofwcall+0xa8 (yep: same value)
>> r29: 0x24000022
>> r30: 0x9000000000001032
>> r31: 0xc7f488 vop_unlock_desc
>>=20
>> ctr: 0xff846d78
>> cr:  0x2000d7b0
>> xer: 0
>> dar: 0xfffffffffffffd50
>> dsisr: 0x42000000
>>=20
>> (Hopefully this manual transcription from the screen display is =
complete --and also accurate for what it does present.)
>>=20
>>=20
>>=20
>>=20
>> The personal HACK to /usr/src/sys/ddb/db_script.c's =
db_script_kdbenter(...) to have it show registers and try bt...
>>=20
>> $ cd /usr/src/sys/ddb/
>> $ svnlite diff .
>> Index: db_script.c
>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>> --- db_script.c (revision 271610)
>> +++ db_script.c (working copy)
>> @@ -319,10 +319,25 @@
>>  {
>>   char scriptname[DB_MAXSCRIPTNAME];
>> =20
>> + /* HACK!!! : Additional lines to force a basic default script to =
exist.
>> +  * Will dump information even if ddb input is not available for =
early crash.
>> +  * Used to get more information about PowerMac G5 "before =
Copyright" hangs.
>> +  */
>> + struct ddb_script *dsp =3D =
db_script_lookup(DB_SCRIPT_KDBENTER_DEFAULT);
>> + if (!dsp) db_script_set(DB_SCRIPT_KDBENTER_DEFAULT, "show =
registers; bt");
>> +
>>   snprintf(scriptname, sizeof(scriptname), "%s.%s",
>>       DB_SCRIPT_KDBENTER_PREFIX, eventname);
>>   if (db_script_exec(scriptname, 0) =3D=3D ENOENT)
>>   (void)db_script_exec(DB_SCRIPT_KDBENTER_DEFAULT, 0);
>> +
>> + /* HACK!!! : Additional lines to always use the default script,
>> +  *           even if scriptname existed and was executed.
>> +  * Will dump information even if ddb input is not available for =
early crash.
>> +  * Used to get more information about PowerMac G5 "before =
Copyright" hangs.
>> +  */
>> + else
>> + (void)db_script_exec(DB_SCRIPT_KDBENTER_DEFAULT, 0);
>>  }
>> =20
>>  /*-
>>=20
>>=20
>>=20
>> =3D=3D=3D
>> Mark Millard
>> markmi at dsl-only.net
>>=20
>> On Sep 16, 2014, at 9:28 PM, Mark Millard <markmi at dsl-only.net> =
wrote:
>>=20
>> In part I sent directly to you because of a past exchange (July-27) =
where you had written:
>>=20
>>> Nathan and I both speculate that it's
>>> dropping into Open Firmware (we make extensive use of OFW), and then
>>> messing something up, taking a page fault or something.
>>=20
>> The specific text that I report and its uniformity when it is =
produced seems to add a little information beyond a speculated "page =
fault or something" and so might eventually help a little. As I =
understand the text it is reporting execution reaching address zero =
without any prior un-handled exceptions or other such that would stop =
it. A corrupted stack (pointer) so a bad return address or some such? =
I'd guess there are no explicit jumps to address zero so I expect that =
indirection is likely involved, with the content for the indirection =
messed up.
>>=20
>> I really wish that I had a logic analyzer configuration for this. =
I've not found a way to make the failing context visible so far and the =
extra way of looking at things might have helped.
>>=20
>>=20
>>=20
>>=20
>> =3D=3D=3D
>> Mark Millard
>> markmi@dsl-only.net
>>=20
>> On Sep 16, 2014, at 8:28 PM, Justin Hibbits <chmeeedalf@gmail.com> =
wrote:
>>=20
>> Hi mark,
>>=20
>> I see this on my G5, and I think it's due to the amount of RAM in the =
machine. More than 4gb seems to confuse open firmware when called by =
FreeBSD. There is some effort to remove the need of the callbacks but =
thus far it's not far along. The good news is that after it boots it's =
solid except when switching vtys, buy earlier this year or last year I =
added a sysctl hack to disable the call into open firmware on vty switch =
(don't recall offhand and not at my computer right now, but if you grep =
the sysctl output for reset and ofw you can find it).
>>=20
>> -Justin
>>=20
>> On Sep 16, 2014 8:01 PM, "Mark Millard" <markmi@dsl-only.net> wrote:
>> I've now spent time with rebooting and power-off/power-on for all 3 =
PowerMac G5's (one PowerMac7,2 and two PowerMac11,2's) and all 3 get the
>>=20
>>> GDB: no debug ports present
>>> KDB: debugger backends: DDB
>>> KDB: current backend: DDB
>>> [ thread pid -1 tid 1006665719 ]
>>> Stopped at 0: illegal instruction 0
>>> db>
>>=20
>> when they fail just before the Copyright notice would normally be =
displayed. None fail any earlier. At that spot none have failed any =
other way. It is the same SSD in all 3. (Happens with other SSD's as =
well.) Overall there is a mix of Radeon and NVIDIA display boards. =
Besides the SSD use and RAM upgrades the rest is stock equipment. scons =
used, not vt. (I've yet to try vt.)
>>=20
>> Seeing a failure after the Copyright notice as been fairly rare in =
all my experiments from when I started last April or so. The ones that =
I've noted had Data Storage Interrupt reported. So far no examples of =
the above have been reported after the Copyright notice. So I'd guess =
that they are separate issues. Of course it seems that only in the last =
few days would I have seen the above sort of thing if it did happen =
after the Copyright notice: The prior history does not count for =
judgements about that.
>>=20
>> =3D=3D=3D
>> Mark Millard
>> markmi at dsl-only.net
>>=20
>> On Sep 16, 2014, at 8:15 AM, Mark Millard <markmi@dsl-only.net> =
wrote:
>>=20
>> Using 10.1-BETA1 I added "options DDB" and "options GDB" to =
powerpc64's GENERIC64. (I also used WITH_DEBUG_FILES=3D, WITHOUT_CLANG=3D,=
 and WITH_DEBUG=3D in /etc/make.conf.) So buildworld, kernel was =
basically just set up to have more of a debugging context around =
(including for any ports builds).
>>=20
>> The result was new information about the PowerMac G5 boot hangups: =
The screen is no longer blank when the G5 is hung up without there being =
a Copyright notice yet. It says...
>>=20
>>> GDB: no debug ports present
>>> KDB: debugger backends: DDB
>>> KDB: current backend: DDB
>>> [ thread pid -1 tid 1006665719 ]
>>> Stopped at 0: illegal instruction 0
>>> db>
>>=20
>> (I had no ability to input at that point.) Normally the Copyright =
notice would have displayed instead of "[...]" and what follows. (I do =
not claim to have all the spacing, capitalization, and such correct =
above.)
>>=20
>> That text is constant from hang to hang when it hangs just before it =
would normally output the Copyright notice: The numbers do not vary, =
much less the other text. It has never failed until after the two KDB =
messages are present. So far I've only tested one PowerMac G5, booting =
over and over for a few hours.
>>=20
>>=20
>>=20
>> (I do not claim to be set up for remote kernel debugging. I just =
decided to let GDB go along for the ride when I added DDB.)
>>=20
>> =3D=3D=3D
>> Mark Millard
>> markmi at dsl-only.net
>>=20
>>=20
>>=20
>>=20
>>=20
>>=20
>=20
>=20
>=20
>=20





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?D7D2B0EF-7EEB-48E7-9485-990BACC709EE>