Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 24 Sep 2014 02:04:56 -0700
From:      Mark Millard <markmi@dsl-only.net>
To:        FreeBSD PowerPC ML <freebsd-ppc@freebsd.org>, Nathan Whitehorn <nwhitehorn@freebsd.org>
Cc:        Justin Hibbits <chmeeedalf@gmail.com>
Subject:   Re: lr=u_trap+0x10 and srr0=k_trap+0x28 for "stopped at 0 illegal instruction 0" before-copyright hang on PowerMac G5's
Message-ID:  <37575F94-763C-43BF-8DD9-F648F4A7C09F@dsl-only.net>
In-Reply-To: <0703EF26-6E33-4446-9273-BBFD0CB72893@dsl-only.net>
References:  <1118046C-0FF7-49FC-82DA-DB9A7A310991@dsl-only.net> <2ED3DB50-B985-4382-8FF2-3B44E7E65453@dsl-only.net> <CAHSQbTBXxrgXQdNeCs=C5wJaT_bmh9FU836O6VnJDbJuqCUujw@mail.gmail.com> <6D729F43-662A-429E-9503-0148EC3250B1@dsl-only.net> <72535F89-3942-45A6-B351-7F746209ED9F@dsl-only.net> <0703EF26-6E33-4446-9273-BBFD0CB72893@dsl-only.net>

next in thread | previous in thread | raw e-mail | index | archive | help
Now that I've had a kernel/boot crash with a successful DDB bt and show =
registers (a different submittal) it makes for a good =
comparison/contrast with what DDB reports for this "before copyright" =
crash.

Something unique to the "before copyright" context is...

No registers are reported to have values that point into the range =
between tmpstk and esym.

In other words: There is no valid stack pointer reported as far as I can =
tell. r1 has the value 0 instead of being a handling a valid stack =
address. tmpstk=3D0xbd7000 and esym=3D0xbdb000 (example for one of my =
WITH_DEBUG_FILES=3D and options DDB and GDB builds of 10.1-BETA2). That =
at least gives a ball park on the range to expect for pointing into the =
stack even with some build variation.

It leaves me wondering if the DDB report is for a nested exception =
handling. That could explain why lr points to u_trap+0x10 and srr0 =
points to k_trap+0x28 when normally srr0 would point to the the failing =
instruction (or the instruction after) and lr to where that routine =
would normally return to.

The register values that are reported for my 10.1-BETA2 builds that =
crash before the copyright notice are:

r0: 0
r1: 0
r2: 0xc81538 vop_unlock_desc
r3: 0xd18868
r4: 0x894b58
r5: 0
r6: 0xc1dee0 M_AUDITBSM
r7: 0xe3f818 ofw_real_mode
r8: 0x1
r9: 0xe0f580 __pcpu
r10: 0x1c35ec0
r11: 0
r12: 0x10000000
r13: 0xdbb290 thread0 (Note: another submittal has this mistyped as =
0xdbb290.)
r14-r19: all 0
r20: 0x10c1000
r21: 0x4
r22: 0x180abd4
r23: 0x1803a28
r24: 0xc000000000008760
r25: 0xcc89b8 smp_no...
r26: 0xcea108 ofw_rend...
r27: 0x894b58 ofwcall+0xa8
r28: 0x894b58 ofwcall+0xa8
r29: 2400022
r30: 9000000000001032
r31: 0xbb7d38

srr0: 0x102720 k_trap+0x28
srr1: 9000000000001032
lr: 0x1026f0 u_trap+0x10
ctr: 0xff846d78
cr: 2000deb0
xer: 0
dar: f...d50 (lots of f's)
dsisr: 42000000






=3D=3D=3D
Mark Millard
markmi at dsl-only.net

On Sep 20, 2014, at 3:42 PM, Mark Millard <markmi at dsl-only.net> =
wrote:

[I corrected the SSR0 in the subject to be SRR0.]

I did miss a register in my list (it matched the shown r30 value). And =
it turns out to probably be very important to interpreting what the =
"show registers" is reporting:

SRR1: 0x9000000000001032

But bits 43-46 of SRR1 are supposed to indicate which type of Program =
Exception, using a single binary 1 to so. No such 1's are present.

Illegal instruction would have been bit 44 being 1. (PowerPC has the =
upper bit numbered zero and increases from there.)

So the ddb "show registers" is apparently not reporting the status as of =
when the "stopped at 0 illegal instruction 0" happened. Thus other =
things are also likely not from that exact time frame.



And I misinterpreted the LR value status: The LR value was just left =
over from the restore_kernsrs returning when it finished. Execution then =
flowed into k_trap. Nothing unusual involved.





=3D=3D=3D
Mark Millard
markmi@dsl-only.net

On Sep 18, 2014, at 8:57 PM, Mark Millard <markmi@dsl-only.net> wrote:

I modified DDB to automatically "show registers" even at the early =
"before Copyright" crash time. The end of this note will show the =
/usr/src/sys/ddb/db_script.c diff for the hack. While I also had DDB bt, =
the bt does not actually print a back trace for this context. (It might =
for others.)

The registers give interesting context despite the lack of a back trace. =
I do not know if it will be sufficient to be of much immediate help if =
someone used the information to start looking at the problem.

I'll start with register lr: 0x1026f0 u_trap+0x10.

/usr/src/sys/powerpc/aim/trap_subr64.S has:

s_trap:
        bf      17,k_trap               /* branch if PSL_PR is false */
        GET_CPUINFO(%r1)
u_trap:
        ld      %r1,PC_CURPCB(%r1)
        mr      %r27,%r28               /* Save LR, r29 */
        mtsprg2 %r29
        bl      restore_kernsrs         /* enable kernel mapping */
        mfsprg2 %r29
        mr      %r28,%r27

/*
 * Now the common trap catching code.
 */
k_trap:
        FRAME_SETUP(PC_TEMPSAVE)
/* Call C interrupt dispatcher: */
trapagain:

and so this appears to indicate a pending return to execute the "mfsprg2 =
%r29" after "bl restore_kernsrs", which indicates that restore_kernsrs =
should be active.

But register srr0 indicates: 0x102720 k_trap+0x28. (So apparently in =
FRAME_SETUP(PC_TEMPSAVE) someplace.)

So it appears to me that the processor got to the k_trap code during the =
supposed restore_kernsrs time frame. (But I'm no expert at these sorts =
of things or for the processor.)

I'll list the other register values:

r0:  0
r1:  0
r2:  0xc1be80 M_AUDITBSM
r3:  0xb16138
r4:  0x8926e8 .ofwcall+0xa8
r5:  0
r6:  0xbb5f90
r7:  0xe3d118 ofw_real_mode
r8:  0x1
r9:  0xe0ce80 __pcpu
r10: 0x1c35ec9
r11: 0
r12: 0x10000000
r13: db890    thread0
r14-r19: all 0
r20: 0x10bc000
r21: 0x4
r22: 0x1801db4
r23: 0x1803a28
r24: 0xc000000000008760
r25: 0xcc6908 smp_no_rendevous_barrier
r26: 0xec79e0 ofw_rendezvous_dispatch (yep one has v and the other zv)
r27: 0x8926e8 .ofwcall+0xa8
r28: 0x8926e8 .ofwcall+0xa8 (yep: same value)
r29: 0x24000022
r30: 0x9000000000001032
r31: 0xc7f488 vop_unlock_desc

ctr: 0xff846d78
cr:  0x2000d7b0
xer: 0
dar: 0xfffffffffffffd50
dsisr: 0x42000000

(Hopefully this manual transcription from the screen display is complete =
--and also accurate for what it does present.)




The personal HACK to /usr/src/sys/ddb/db_script.c's =
db_script_kdbenter(...) to have it show registers and try bt...

$ cd /usr/src/sys/ddb/
$ svnlite diff .
Index: db_script.c
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
--- db_script.c	(revision 271610)
+++ db_script.c	(working copy)
@@ -319,10 +319,25 @@
 {
 	char scriptname[DB_MAXSCRIPTNAME];
=20
+	/* HACK!!! : Additional lines to force a basic default script to =
exist.
+	 * Will dump information even if ddb input is not available for =
early crash.
+	 * Used to get more information about PowerMac G5 "before =
Copyright" hangs.
+	 */
+	struct ddb_script *dsp =3D =
db_script_lookup(DB_SCRIPT_KDBENTER_DEFAULT);
+	if (!dsp) db_script_set(DB_SCRIPT_KDBENTER_DEFAULT, "show =
registers; bt");
+
 	snprintf(scriptname, sizeof(scriptname), "%s.%s",
 	    DB_SCRIPT_KDBENTER_PREFIX, eventname);
 	if (db_script_exec(scriptname, 0) =3D=3D ENOENT)
 		(void)db_script_exec(DB_SCRIPT_KDBENTER_DEFAULT, 0);
+
+	/* HACK!!! : Additional lines to always use the default script,
+	 *           even if scriptname existed and was executed.
+	 * Will dump information even if ddb input is not available for =
early crash.
+	 * Used to get more information about PowerMac G5 "before =
Copyright" hangs.
+	 */
+	else
+		(void)db_script_exec(DB_SCRIPT_KDBENTER_DEFAULT, 0);
 }
=20
 /*-



=3D=3D=3D
Mark Millard
markmi at dsl-only.net

On Sep 16, 2014, at 9:28 PM, Mark Millard <markmi at dsl-only.net> =
wrote:

In part I sent directly to you because of a past exchange (July-27) =
where you had written:

> Nathan and I both speculate that it's
> dropping into Open Firmware (we make extensive use of OFW), and then
> messing something up, taking a page fault or something.

The specific text that I report and its uniformity when it is produced =
seems to add a little information beyond a speculated "page fault or =
something" and so might eventually help a little. As I understand the =
text it is reporting execution reaching address zero without any prior =
un-handled exceptions or other such that would stop it. A corrupted =
stack (pointer) so a bad return address or some such? I'd guess there =
are no explicit jumps to address zero so I expect that indirection is =
likely involved, with the content for the indirection messed up.

I really wish that I had a logic analyzer configuration for this. I've =
not found a way to make the failing context visible so far and the extra =
way of looking at things might have helped.




=3D=3D=3D
Mark Millard
markmi@dsl-only.net

On Sep 16, 2014, at 8:28 PM, Justin Hibbits <chmeeedalf@gmail.com> =
wrote:

Hi mark,

I see this on my G5, and I think it's due to the amount of RAM in the =
machine. More than 4gb seems to confuse open firmware when called by =
FreeBSD. There is some effort to remove the need of the callbacks but =
thus far it's not far along. The good news is that after it boots it's =
solid except when switching vtys, buy earlier this year or last year I =
added a sysctl hack to disable the call into open firmware on vty switch =
(don't recall offhand and not at my computer right now, but if you grep =
the sysctl output for reset and ofw you can find it).

-Justin

On Sep 16, 2014 8:01 PM, "Mark Millard" <markmi@dsl-only.net> wrote:
I've now spent time with rebooting and power-off/power-on for all 3 =
PowerMac G5's (one PowerMac7,2 and two PowerMac11,2's) and all 3 get the

> GDB: no debug ports present
> KDB: debugger backends: DDB
> KDB: current backend: DDB
> [ thread pid -1 tid 1006665719 ]
> Stopped at 0: illegal instruction 0
> db>

when they fail just before the Copyright notice would normally be =
displayed. None fail any earlier. At that spot none have failed any =
other way. It is the same SSD in all 3. (Happens with other SSD's as =
well.) Overall there is a mix of Radeon and NVIDIA display boards. =
Besides the SSD use and RAM upgrades the rest is stock equipment. scons =
used, not vt. (I've yet to try vt.)

Seeing a failure after the Copyright notice as been fairly rare in all =
my experiments from when I started last April or so. The ones that I've =
noted had Data Storage Interrupt reported. So far no examples of the =
above have been reported after the Copyright notice. So I'd guess that =
they are separate issues. Of course it seems that only in the last few =
days would I have seen the above sort of thing if it did happen after =
the Copyright notice: The prior history does not count for judgements =
about that.

=3D=3D=3D
Mark Millard
markmi at dsl-only.net

On Sep 16, 2014, at 8:15 AM, Mark Millard <markmi@dsl-only.net> wrote:

Using 10.1-BETA1 I added "options DDB" and "options GDB" to powerpc64's =
GENERIC64. (I also used WITH_DEBUG_FILES=3D, WITHOUT_CLANG=3D, and =
WITH_DEBUG=3D in /etc/make.conf.) So buildworld, kernel was basically =
just set up to have more of a debugging context around (including for =
any ports builds).

The result was new information about the PowerMac G5 boot hangups: The =
screen is no longer blank when the G5 is hung up without there being a =
Copyright notice yet. It says...

> GDB: no debug ports present
> KDB: debugger backends: DDB
> KDB: current backend: DDB
> [ thread pid -1 tid 1006665719 ]
> Stopped at 0: illegal instruction 0
> db>

(I had no ability to input at that point.) Normally the Copyright notice =
would have displayed instead of "[...]" and what follows. (I do not =
claim to have all the spacing, capitalization, and such correct above.)

That text is constant from hang to hang when it hangs just before it =
would normally output the Copyright notice: The numbers do not vary, =
much less the other text. It has never failed until after the two KDB =
messages are present. So far I've only tested one PowerMac G5, booting =
over and over for a few hours.



(I do not claim to be set up for remote kernel debugging. I just decided =
to let GDB go along for the ride when I added DDB.)

=3D=3D=3D
Mark Millard
markmi at dsl-only.net









Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?37575F94-763C-43BF-8DD9-F648F4A7C09F>