Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 27 May 2017 01:17:22 -0700
From:      Mark Millard <markmi@dsl-only.net>
To:        Justin Hibbits <jhibbits@FreeBSD.org>, FreeBSD PowerPC ML <freebsd-ppc@freebsd.org>, freebsd-hackers@freebsd.org
Subject:   Re: A good backtrace from a head -r317820 powerpc random/periodic panic: execution of garbage at 0x0090a030 (in .hash section) [better bt; more]
Message-ID:  <CF7FA043-23DC-4012-B7D4-3A0E21BE924A@dsl-only.net>
In-Reply-To: <62BF8E69-E7E6-4C4F-AB33-38B03E903CDA@dsl-only.net>
References:  <1CE8346B-04F3-48AB-A3E9-6DF3B86B8D1A@dsl-only.net> <8C88BB6F-E747-42A1-9DDC-35EC6D865141@dsl-only.net> <62BF8E69-E7E6-4C4F-AB33-38B03E903CDA@dsl-only.net>

next in thread | previous in thread | raw e-mail | index | archive | help
[I suggest a patch this time.]

On 2017-May-27, at 12:42 AM, Mark Millard <markmi@dsl-only.net> wrote:

> [Top post of the answer to what is wrong. I
> have submitted 219589 for this.]
>=20
> TARGET_ARCH=3Dpowerpc64 got a fix to
> bugzilla 205458, avoiding inappropriate
> restoration of openfirmware's sprg0
> value.
>=20
> It turns out that TARGET_ARCH=3Dpowerpc
> needs to detect when it is running on
> powerpc64 and do the same thing if it
> is to avoid trashing memory and running
> unreliably on PowerMac G5's.
>=20
> The issue is that for powerpc64 it is
> inappropriate to restore the sprg0
> value to its openfirmware value. This
> is because the FreeBSD real-mode
> handling is involved instead of the
> openfirmware's original virtual mode,
> making openfirware's value simply
> inappropriate.
>=20
> Quoting Nathan W. from Comment
> #4 of 205458:
>=20
>> Where this explodes is if OF uses an unmapped SLB entry.
>> The SLB fault handler runs in real mode and refers to the
>> PCPU pointer in SPRG0, which blows up the kernel. Having
>> a value of SPRG0 that works for the kernel is less fatal
>> than preserving OF's value in this case.
>=20
> I know that part of the code does detect
> the powerpc64 context vs. not and does
> things differently to emulate being powerpc
> like on powerpc64 (such as limiting
> RAM use as a consequence).
>=20
> The powerpc64 vs. not status needs to be
> recorded and used to control a sprg0
> restoration choice: avoid restoring
> openfirmware's value on powerpc64;
> otherwise restore it.

I expect that the following patch would
fix the problem:

# svnlite diff /usr/src/sys/powerpc/ofw/ofw_machdep.c
Index: /usr/src/sys/powerpc/ofw/ofw_machdep.c
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
--- /usr/src/sys/powerpc/ofw/ofw_machdep.c	(revision 317820)
+++ /usr/src/sys/powerpc/ofw/ofw_machdep.c	(working copy)
@@ -147,7 +147,8 @@
 	 * PCPU data cannot be used until this routine is called !
 	 */
 #ifndef __powerpc64__
-	__asm __volatile("mtsprg0 %0" :: "r"(ofw_sprg0_save));
+	if (cpu_features & PPC_FEATURE_64 !=3D PPC_FEATURE_64)
+		__asm __volatile("mtsprg0 %0" :: "r"(ofw_sprg0_save));
 #endif
 }
 #endif

This is based on cpu_features already having had
PPC_FEATURE_64 masked in before this if things
are running on a PowerMac G5 or other powerpc64.


=3D=3D=3D
Mark Millard
markmi at dsl-only.net

[The original (better) panic evidence. . .]

On 2017-May-26, at 10:29 PM, Mark Millard <markmi@dsl-only.net> wrote:

> [Additional information that does not need to
> interlace with the prior material, so see
> after. A somewhat better backtrace reported
> by ddb. And so on.]
>=20
>=20
> On 2017-May-26, at 7:14 PM, Mark Millard <markmi at dsl-only.net> =
wrote:
>=20
>> I lucked out and got a vmcore.9 for a random
>> panic that I could manage to backtrace for
>> one of my test builds of -r317820. It appears
>> that not all that much happened before it got
>> the panic so much context was better preserved
>> this time.
>>=20
>> (I do not explore from ddb as I've had that
>> panic and mess up the dump just made by
>> replacing it. So this is a manual backtrace
>> from the debug.minidump=3D0 style vmcore.9
>> file. objdump was used on the
>> /boot/kernel/kernel to find code.)
>>=20
>> Being able to see the problem is very
>> sensitive to kernel memory layout. This
>> is why I'm sticking with -r317820 built
>> production style: the kind of context
>> the problem was first observed in.
>> Attempting a debug kernel build simply
>> did not repeat the problem for days
>> (vs. the usual hours for builds like
>> this).
>>=20
>>=20
>> So below is the backtrace:
>>=20
>> (I do not show what trap_fatal calls:
>> starting with trap_fatal and going toward
>> larger memory addresses. . .)
>>=20
>> [vmcore.9's
>> offset
>> in file
>> when no
>> 0x prefix]
>>=20
>> 013e83b0  df 5e 55 50 00 8f 34 5c  fa 50 05 af fa 50 05 af  =
|.^UP..4\.P...P..|
>> 0x008f3454 <trap+0x1228> mr      r3,r26
>> 0x008f3458 <trap+0x122c> bl      008f2030 <trap_fatal>
>> 0x008f345c <trap+0x1230> b       008f34c8 <trap+0x129c>
>>=20
>> 013e83c0  fa 50 05 af fa 50 05 af  fa 50 05 af fa 50 05 af  =
|.P...P...P...P..|
>> *
>> 013e83e0  df 5e 54 00 fa 50 05 af  fa 50 05 af fa 50 05 af  =
|.^T..P...P...P..|
>> 013e83f0  fa 50 05 af fa 50 05 af  00 d1 ca ac df 5e 54 00  =
|.P...P.......^T.|
>> 013e8400  df 5e 54 20 00 53 3a d0  fa 50 05 af fa 50 05 af  |.^T =
.S:..P...P..|
>> 013e8410  fa 50 05 af fa 50 05 af  00 d1 ca ac df 5e 54 20  =
|.P...P.......^T |
>> 013e8420  df 5e 54 d0 00 53 3a d0  00 00 00 00 00 00 00 00  =
|.^T..S:.........|
>> 013e8430  00 00 00 00 00 00 00 00  00 d1 ca ac 00 00 00 0f  =
|................|
>> 013e8440  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  =
|................|
>> *
>> 013e8460  00 00 00 54 7f ff ff ff  00 00 00 00 ff ff ff aa  =
|...T............|
>> 013e8470  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  =
|................|
>> *
>> 013e8490  00 d4 c4 5c 00 d4 c4 5c  00 17 99 dd 00 17 9a 5c  =
|...\...\.......\|
>> 013e84a0  00 00 17 9a 00 d4 c4 6c  df 5e 54 ec 00 00 00 54  =
|.......l.^T....T|
>> 013e84b0  df 5e 54 d0 7f ff ff ff  05 aa 36 c0 05 aa 39 e8  =
|.^T.......6...9.|
>> 013e84c0  00 00 00 00 00 00 00 00  00 d1 ca ac df 5e 54 d0  =
|.............^T.|
>> 013e84d0  df 5e 55 80 00 53 3a d0  05 91 d0 00 05 91 d3 28  =
|.^U..S:........(|
>> 013e84e0  df 5e 55 00 00 00 00 00  00 d1 ca ac 00 00 00 0f  =
|.^U.............|
>> 013e84f0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  =
|................|
>> 013e8500  00 00 00 00 00 00 00 00  00 d4 bd ec 00 cb 98 98  =
|................|
>> 013e8510  00 d4 c4 5c 00 d4 c4 5c  00 17 99 f8 00 17 99 f8  =
|...\...\........|
>> 013e8520  00 00 17 99 fa 69 b8 79  00 17 9a 0a 00 00 17 99  =
|.....i.y........|
>> 013e8530  f8 1c 2f cc df 5e 55 88  01 47 d6 c0 01 43 a2 00  =
|../..^U..G...C..|
>> 013e8540  41 eb 30 00 0a 00 00 00  00 d2 6e 4c df 5e 55 50  =
|A.0.......nL.^UP|
>>=20
>> 013e8550  df 5e 55 80 00 8e 7d 40  df 5e 55 9c 00 00 00 28  =
|.^U...}@.^U....(|
>> 0x008e7d28 <powerpc_interrupt+0x184> mfmsr   r0
>> 0x008e7d2c <powerpc_interrupt+0x188> or      r0,r0,r9
>> 0x008e7d30 <powerpc_interrupt+0x18c> mtmsr   r0
>> 0x008e7d34 <powerpc_interrupt+0x190> isync
>> 0x008e7d38 <powerpc_interrupt+0x194> mr      r3,r25
>> 0x008e7d3c <powerpc_interrupt+0x198> bl      008f222c <trap>
>> 0x008e7d40 <powerpc_interrupt+0x19c> lwz     r11,0(r1)
>>=20
>> 013e8560  00 00 00 00 00 00 17 9a  06 40 c2 a8 01 43 a2 00  =
|.........@...C..|
>> 013e8570  41 eb 30 00 0a 00 00 00  00 00 00 00 00 08 10 32  =
|A.0............2|
>>=20
>>                                    r0          r1
>> 013e8580  df 5e 56 40 00 10 08 f8  00 00 00 04 df 5e 56 40  =
|.^V@.........^V@|
>> The struct trapframe starts is 0x 013e8588 in the
>> vmcore.9 file. The 0x00108f8 is as shown below:
>>=20
>> 0x001008ec <k_trap+0x118> isync
>> 0x001008f0 <trapagain> addi    r3,r1,8
>> 0x001008f4 <trapagain+0x4> bl      008e7ba4 <powerpc_interrupt>
>> 0x001008f8 <trapexit> mfmsr   r3
>> 0x001008fc <trapexit+0x4> andi.   r3,r3,32767
>>=20
>> 013e8590  01 47 d6 c0 00 00 00 28  01 47 c6 c0 00 00 00 04  =
|.G.....(.G......|
>>             r2          r3           r4         r5
>>=20
>> 013e85a0  00 00 00 04 00 00 00 0f  00 00 00 00 00 d4 c0 3c  =
|...............<|
>>             r6          r7           r8         r9
>>=20
>> 013e85b0  01 47 d6 c0 df 5e 56 80  14 3d 60 da 00 00 00 00  =
|.G...^V..=3D`.....|
>>            r10         r11          r12         r13
>>=20
>> 013e85c0  00 d4 bd ec 00 cb 98 98  00 d4 c4 5c 00 d4 c4 5c  =
|...........\...\|
>>            r14         r15          r16         r17
>>=20
>> 013e85d0  00 17 99 f8 00 17 99 f8  00 00 17 99 fa 69 b8 79  =
|.............i.y|
>>            r18         r19          r20         r21
>>=20
>> 013e85e0  00 17 9a 0a 00 00 17 99  f8 1c 2f cc 00 00 17 9a  =
|........../.....|
>>            r22         r32          r24         r25
>>=20
>> 013e85f0  06 40 c2 a8 01 43 a2 00  00 eb a7 80 01 47 d6 c0  =
|.@...C.......G..|
>>            r26         r27          r28         r29
>>=20
>> 013e8600  00 d1 ca ac df 5e 56 40  00 53 5a d0 20 00 90 44  =
|.....^V@.SZ. ..D|
>>            r30         r31           lr          cr
>>=20
>>            xer         ctr          srr0     srr1
>> 013e8610  00 00 00 00 00 00 00 00  00 90 a0 30 00 08 10 32  =
|...........0...2|
>> Note: objdump shows no code at 0x0090a030. 0x0090a030 is
>> inside what objdump -x reports as the section .hash (Idx
>> 2).
>>=20
>>            exc         dar         dsisr    (booke dbcer0)
>> 013e8620  00 00 07 00 41 eb 30 00  0a 00 00 00 01 47 c6 c0  =
|....A.0......G..|
>> The 0x00007000 above is the framep->exc (exception code)
>> (program in this case).
>>=20
>> 013e8630  00 eb a7 80 01 47 60 10  00 d1 ca ac df 5e 56 40  =
|.....G`......^V@|
>>=20
>> At this point the above does not match the below
>> part of the stack trace.
>>=20
>> The lr part of struct trapframe was: 0x00535ad0 so
>> showing around that:
>>=20
>> 00535ab8 <sched_affinity> stwu    r1,-32(r1)
>> 00535abc <sched_affinity+0x4> mflr    r0
>> 00535ac0 <sched_affinity+0x8> stw     r29,20(r1)
>> 00535ac4 <sched_affinity+0xc> stw     r30,24(r1)
>> 00535ac8 <sched_affinity+0x10> stw     r31,28(r1)
>> 00535acc <sched_affinity+0x14> stw     r0,36(r1)
>> 00535ad0 <sched_affinity+0x18> mr      r31,r1
>> 00535ad4 <sched_affinity+0x1c> mr      r29,r3
>>=20
>> Back to the stack backtrace. . .
>>=20
>> 013e8640  df 5e 56 80 00 53 59 dc  00 17 99 f8 00 17 99 f8  =
|.^V..SY.........|
>> 0x005359c8 <sched_add+0x18c> bl      008ea420 <spinlock_exit>
>> 0x005359cc <sched_add+0x190> mr      r3,r28
>> 0x005359d0 <sched_add+0x194> mr      r4,r27
>> 0x005359d4 <sched_add+0x198> mr      r5,r25
>> 0x005359d8 <sched_add+0x19c> bl      005356ec <tdq_add>
>> 0x005359dc <sched_add+0x1a0> mfsprg  r9,0
>>=20
>> I show around 0x005356ec
>> from the sched_add+0x19c bl that is above
>> because the routine is not referenced
>> in the stack tracce but the above indicates
>> that it should have been called:
>> 0x005356ec <tdq_add> stwu    r1,-32(r1)
>> 0x005356f0 <tdq_add+0x4> mflr    r0
>> 0x005356f4 <tdq_add+0x8> stw     r28,16(r1)
>> 0x005356f8 <tdq_add+0xc> stw     r29,20(r1)
>> 0x005356fc <tdq_add+0x10> stw     r30,24(r1)
>> 0x00535700 <tdq_add+0x14> stw     r31,28(r1)
>> 0x00535704 <tdq_add+0x18> stw     r0,36(r1)
>> . . .
>>=20
>> Back to the stack backtrace again. . .
>>=20
>> 013e8650  00 00 17 99 fa 69 b8 79  00 17 9a 0a 00 00 00 04  =
|.....i.y........|
>> 013e8660  f8 1c 2f cc 00 00 17 9a  06 40 c2 a8 01 43 a2 00  =
|../......@...C..|
>> 013e8670  01 47 c6 c0 01 47 60 10  00 d1 b4 30 df 5e 56 80  =
|.G...G`....0.^V.|
>>=20
>> 013e8680  df 5e 56 b0 00 4a 87 8c  00 d2 5b 10 00 00 00 04  =
|.^V..J....[.....|
>> 0x004a8780 <intr_event_schedule_thread+0xc4> mr      r3,r28
>> 0x004a8784 <intr_event_schedule_thread+0xc8> li      r4,4
>> 0x004a8788 <intr_event_schedule_thread+0xcc> bl      0053583c =
<sched_add>
>> 0x004a878c <intr_event_schedule_thread+0xd0> lwz     r9,0(r28)
>>=20
>> 013e8690  df 5e 56 b0 00 00 17 9a  06 40 c2 a8 01 43 a2 00  =
|.^V......@...C..|
>> 013e86a0  00 00 00 00 01 46 81 c0  00 d1 b4 30 df 5e 56 b0  =
|.....F.....0.^V.|
>>=20
>> 013e86b0  df 5e 56 f0 00 4a 97 0c  00 00 00 00 00 00 00 04  =
|.^V..J..........|
>> 0x004a9700 <swi_sched+0xa4> bl      005000ec <critical_exit>
>> 0x004a9704 <swi_sched+0xa8> mr      r3,r27
>> 0x004a9708 <swi_sched+0xac> bl      004a86bc =
<intr_event_schedule_thread>
>> 0x004a970c <swi_sched+0xb0> lwz     r11,0(r1)
>>=20
>> 013e86c0  df 5e 56 e0 01 47 d6 c0  01 47 d6 c0 01 45 4d 40  =
|.^V..G...G...EM@|
>> 013e86d0  df 5e 56 f0 00 8e a4 44  06 40 c2 a8 00 00 17 9a  =
|.^V....D.@......|
>> 013e86e0  78 00 00 00 00 e9 56 00  00 d1 c8 20 df 5e 56 f0  =
|x.....V.... .^V.|
>>=20
>> 013e86f0  df 5e 57 50 00 51 79 6c  df 5e 58 78 01 47 d6 c0  =
|.^WP.Qyl.^Xx.G..|
>> 0x00517960 <callout_process+0x420> lwz     r3,264(r29)
>> 0x00517964 <callout_process+0x424> li      r4,0
>> 0x00517968 <callout_process+0x428> bl      004a965c <swi_sched>
>> 0x0051796c <callout_process+0x42c> lwz     r11,0(r1)
>>=20
>> 013e8700  01 47 d7 b8 00 00 00 00  00 d1 ab 24 00 00 00 04  =
|.G.........$....|
>> 013e8710  00 c9 66 bc 00 c4 5d 48  00 c9 66 bc 00 d4 c5 3c  =
|..f...]H..f....<|
>> 013e8720  00 d0 53 00 00 eb a7 80  00 00 00 01 00 00 00 00  =
|..S.............|
>> 013e8730  df 5e 59 8c 00 00 00 00  df 5e 58 78 00 00 17 99  =
|.^Y......^Xx....|
>> 013e8740  f8 1c 2f cc d0 01 dd 00  00 d2 5b 10 df 5e 57 50  =
|../.......[..^WP|
>>=20
>> 013e8750  df 5e 57 a0 00 8a b2 70  df 5e 57 60 df 5e 57 60  =
|.^W....p.^W`.^W`|
>> 0x008ab264 <handleevents+0x2a4> mr      r3,r27
>> 0x008ab268 <handleevents+0x2a8> mr      r4,r28
>> 0x008ab26c <handleevents+0x2ac> bl      00517540 <callout_process>
>> 0x008ab270 <handleevents+0x2b0> li      r3,0
>>=20
>> 013e8760  df 5e 57 a0 df 5e 58 78  05 86 37 00 00 00 00 04  =
|.^W..^Xx..7.....|
>> 013e8770  00 00 00 00 05 9b f2 00  00 c9 66 bc 01 47 d6 c0  =
|..........f..G..|
>> 013e8780  df 5e 59 8c 00 f6 1d 10  00 00 17 99 f8 1c 2f cc  =
|.^Y.........../.|
>> 013e8790  d0 01 dd 00 d0 01 dd 30  00 d2 5b 10 df 5e 57 a0  =
|.......0..[..^W.|
>>=20
>> 013e87a0  df 5e 58 20 00 8a d1 10  00 d2 6e 5c df 5e 57 b0  |.^X =
......n\.^W.|
>> 0x008ad100 <timercb+0x4b8> mr      r3,r26
>> 0x008ad104 <timercb+0x4bc> mr      r4,r27
>> 0x008ad108 <timercb+0x4c0> li      r5,0
>> 0x008ad10c <timercb+0x4c4> bl      008aafc0 <handleevents>
>> 0x008ad110 <timercb+0x4c8> lwz     r11,0(r1)
>>=20
>> 013e87b0  df 5e 57 e0 00 4a 96 00  00 00 17 99 00 00 00 00  =
|.^W..J..........|
>> 013e87c0  f8 1c 2f cc 0a 3f a0 12  df 5e 58 78 05 86 37 00  =
|../..?...^Xx..7.|
>> 013e87d0  01 48 b1 00 05 86 37 80  00 d4 bd ec 00 cb 98 98  =
|.H....7.........|
>> 013e87e0  00 c9 66 bc 00 c4 5d 48  00 c9 66 bc 00 d4 c5 3c  =
|..f...]H..f....<|
>> 013e87f0  df 5e 59 e0 00 eb a7 80  00 c9 66 bc 01 47 d6 c0  =
|.^Y.......f..G..|
>> 013e8800  df 5e 59 8c df 5e 58 78  01 47 d6 c0 00 00 00 00  =
|.^Y..^Xx.G......|
>> 013e8810  00 f6 1d 10 00 00 00 01  00 d2 6b c8 df 5e 58 20  =
|..........k..^X |
>>=20
>> 013e8820  df 5e 58 40 00 8e 1e 48  00 00 00 00 00 eb a7 80  =
|.^X@...H........|
>> So 0x13e8820 from vmcore.9 has start of struct trapeframe.
>> The 0x8e1e48 is from:
>> 0x008e1e34 <decr_intr+0xe0> lwz     r0,56(r28)
>> 0x008e1e38 <decr_intr+0xe4> mtctr   r0
>> 0x008e1e3c <decr_intr+0xe8> mr      r3,r28
>> 0x008e1e40 <decr_intr+0xec> lwz     r4,64(r28)
>> 0x008e1e44 <decr_intr+0xf0> bctrl
>> 0x008e1e48 <decr_intr+0xf4> addi    r29,r29,-1
>>=20
>> 013e8830  01 47 d7 94 00 00 00 01  00 d2 6e 4c df 5e 58 40  =
|.G........nL.^X@|
>> 013e8840  df 5e 58 70 00 8e 7c 9c  00 d1 ca ac df 5e 58 50  =
|.^Xp..|......^XP|
>> 013e8850  00 cd f0 74 00 00 00 01  00 00 00 01 00 eb a7 80  =
|...t............|
>> 013e8860  41 eb 30 00 0a 00 00 00  00 00 00 00 00 00 90 32  =
|A.0............2|
>> 013e8870  df 5e 59 30 00 10 08 f8  00 04 90 32 df 5e 59 30  =
|.^Y0.......2.^Y0|
>> 013e8880  01 47 d6 c0 00 00 00 00  0d 0b bf e3 00 00 00 00  =
|.G..............|
>> 013e8890  0d 0b bf e3 00 19 eb 7c  00 00 00 00 00 00 00 44  =
|.......|.......D|
>> 013e88a0  01 fc a0 55 00 00 90 32  d0 01 dd 00 00 00 00 00  =
|...U...2........|
>> 013e88b0  00 d4 bd ec 00 cb 98 98  00 c9 66 bc 00 c4 5d 48  =
|..........f...]H|
>> 013e88c0  00 c9 66 bc 00 d4 c5 3c  df 5e 59 e0 00 eb a7 80  =
|..f....<.^Y.....|
>> 013e88d0  00 c9 66 bc 01 47 d6 c0  df 5e 59 8c 00 00 00 01  =
|..f..G...^Y.....|
>> 013e88e0  00 00 00 01 00 eb a7 80  00 00 00 00 00 8e 3b f8  =
|..............;.|
>>=20
>>                                     srr0
>> 013e88f0  00 d2 6b f0 df 5e 59 30  00 8e 3c 14 40 00 00 42  =
|..k..^Y0..<.@..B|
>> The 0x00833c14 from the trap frame is the srr0 (conceptual lr):
>> 0x008e3c04 <cpu_idle_60x+0xc> stw     r31,28(r1)
>> 0x008e3c08 <cpu_idle_60x+0x10> stw     r0,36(r1)
>> 0x008e3c0c <cpu_idle_60x+0x14> mr      r31,r1
>> 0x008e3c10 <cpu_idle_60x+0x18> bcl-    20,4*cr7+so,008e3c14 =
<cpu_idle_60x+0x1c>
>> 0x008e3c14 <cpu_idle_60x+0x1c> mflr    r30
>> 0x008e3c18 <cpu_idle_60x+0x20> lwz     r0,-32(r30)
>>=20
>> 013e8900  20 00 00 00 00 8e 3b f8  00 8e 3c 80 00 00 90 32  | =
.....;...<....2|
>>=20
>>            exc
>> 013e8910  00 00 09 00 41 eb 30 00  0a 00 00 00 00 00 00 00  =
|....A.0.........|
>> The 0x00009000 above is the framep->exc (exception code)
>> (decrementer in this case).
>>=20
>> 013e8920  eb 10 25 b5 55 7e c4 22  00 00 00 00 00 00 00 04  =
|..%.U~."........|
>>=20
>> 013e8930  df 5e 59 50 00 00 00 01  00 00 00 01 00 eb a7 80  =
|.^YP............|
>> (trap [above] means no lr filled in here)
>>=20
>> 013e8940  00 00 00 00 00 d4 ca 34  00 d2 6b f0 df 5e 59 50  =
|.......4..k..^YP|
>>=20
>> 013e8950  df 5e 59 70 00 8e 31 9c  00 00 00 02 00 eb a7 80  =
|.^Yp..1.........|
>> 0x008e318c <cpu_idle+0x48> bl      008ad8b8 <cpu_idleclock>
>> 0x008e3190 <cpu_idle+0x4c> lwz     r29,0(r29)
>> 0x008e3194 <cpu_idle+0x50> mtctr   r29
>> 0x008e3198 <cpu_idle+0x54> bctrl
>> 0x008e319c <cpu_idle+0x58> bl      008ad7b8 <cpu_activeclock>
>>=20
>> 013e8960  00 f2 d5 fc 00 00 00 01  00 d1 ca ac df 5e 59 70  =
|.............^Yp|
>>=20
>> 013e8970  df 5e 5a 50 00 53 6e 7c  fa 50 05 af fa 50 05 af  =
|.^ZP.Sn|.P...P..|
>> 0x00536e78 <sched_idletd+0x4d0> bl      008e3144 <cpu_idle>
>> 0x00536e7c <sched_idletd+0x4d4> stw     r28,136(r27)
>>=20
>>=20
>> 013e8980  fa 50 05 af fa 50 05 af  fa 50 05 af ff ff ff fe  =
|.P...P...P......|
>> 013e8990  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  =
|................|
>> 013e89a0  ff ff ff ff ff ff ff ff  ff ff ff ff fa 50 05 af  =
|.............P..|
>> 013e89b0  fa 50 05 af 00 00 00 02  ff ff ff ff 00 00 01 f0  =
|.P..............|
>> 013e89c0  ff ff ff fe ff ff ff ff  ff ff ff ff ff ff ff ff  =
|................|
>> 013e89d0  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  =
|................|
>> 013e89e0  ff ff ff fe ff ff ff ff  ff ff ff ff ff ff ff ff  =
|................|
>> 013e89f0  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  =
|................|
>> 013e8a00  df 5e 5a 20 fa 50 05 af  00 00 00 00 00 00 00 00  |.^Z =
.P..........|
>> 013e8a10  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  =
|................|
>> *
>> 013e8a30  00 00 00 00 00 53 69 a8  df 5e 5a 98 00 00 00 00  =
|.....Si..^Z.....|
>> 013e8a40  01 47 96 e0 01 47 d6 c0  00 d1 b3 70 df 5e 5a 50  =
|.G...G.....p.^ZP|
>>=20
>> 013e8a50  df 5e 5a 80 00 4a 3c b4  df 5e 5a 60 df 5e 5a 60  =
|.^Z..J<..^Z`.^Z`|
>> 0x004a3ca0 <fork_exit+0xe4> bl      008ea420 <spinlock_exit>
>> 0x004a3ca4 <fork_exit+0xe8> mr      r3,r27
>> 0x004a3ca8 <fork_exit+0xec> mr      r4,r26
>> 0x004a3cac <fork_exit+0xf0> mtctr   r25
>> 0x004a3cb0 <fork_exit+0xf4> bctrl
>> 0x004a3cb4 <fork_exit+0xf8> lwz     r0,108(r28)
>>=20
>> 013e8a60  df 5e 5a 80 00 00 00 00  00 00 00 00 00 00 00 00  =
|.^Z.............|
>> 013e8a70  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  =
|................|
>>=20
>> 013e8a80  00 00 00 00 00 8f 18 d0  00 53 69 a8 00 00 00 00  =
|.........Si.....|
>> 0x008f18c0 <fork_trampoline> lwz     r3,8(r1)
>> 0x008f18c4 <fork_trampoline+0x4> lwz     r4,12(r1)
>> 0x008f18c8 <fork_trampoline+0x8> lwz     r5,16(r1)
>> 0x008f18cc <fork_trampoline+0xc> bl      004a3bbc <fork_exit>
>> 0x008f18d0 <fork_trampoline+0x10> addi    r1,r1,16
>> 0x008f18d4 <fork_trampoline+0x14> b       001008f8 <trapexit>
>>=20
>>=20
>> So that is an example context for the failure.
>>=20
>> It has taken weeks to get this. It may be some
>> time before I get another for comparison/contrast.
>>=20
>> And sticking with the same build vs. trying some more
>> to find a better way to get more evidence is not
>> obvious to me at this point.
>=20
> I should have mentioned that this is
> TARGET_ARCH=3Dpowerpc but used on a so-called PowerMac
> G5 "Quad Core".
>=20
> I've had 2 more examples, with the same 0x0090a030
> srr0 and such (vmcore.0 and vmcore.1) (I have added
> one more little block of code for detecting an earlier
> problem symptom not being currently seen so the build
> is slight different from the earlier report.)
>=20
> Looking around in vmcore.0 I find 3 examples of
> "00 90 a0 30" in areas overlapping with objdump -x's
> coverage. . .
>=20
> 00c77fd0  00 00 00 00 00 00 00 00  00 90 a0 30 00 08 10 32  =
|...........0...2|
>                                  [         ]
>=20
> (from sorted objdump -x output:)
> 00c775a8 l     O .data  00000dc0 seqprog
> 00c78368 l     O .data  0000000c seeprom_long_ewen
>=20
>                                  [         ]
> 00f65820  41 eb 30 00 0a 00 00 00  00 90 a0 30 00 08 10 32  =
|A.0........0...2|
> . . .
> 00f65870  00 90 a0 30 00 08 10 32  00 00 00 00 df 5d 30 00  =
|...0...2.....]0.|
>         [         ]
>=20
> (from sorted objdump -x output:)
> 00f64780 g     O .bss   00020000 __pcpu
> 00f84780 l     O .bss   00000004 ap_letgo
>=20
>=20
> For vmcore.1 I went ahead and tried exploring
> some with ddb --and had no new panics. . .
> (Hand transcriptions of pictures)
>=20
> fatal kernel trap:
> exception =3D 0x700 (program)
> srr0      =3D 0x90a030
> srr1      =3D 0x81032
> lr        =3D 0x535ad0
> curthread =3D 0x147d6c0
> pic =3D 11, comm =3D idle: cpu0
>=20
> [ thread pid 11 tid 100003 ]
> Stopped at _etext+0xb8fc: illegal instruction 0
> db> bt
> 0xdf5e55d0: at sched_wakeup+0xa4
> 0xdf5e55f0: at setrunnable+0x9c
> 0xdf5e5610: at sleepq_resume_thread+0x17c
> 0xdf5e5640: at sleepq_timeout+0xc8
> 0xdf5e5680: at softclock_call_cc+0x1f0
> 0xdf5e56f0: at callout_process+0x27c
> 0xdf5e57a0: at timercb+0x4c4
> 0xdf5e5820: at decr_intr+0xf0
> 0xdf5e5840: at powerpc_interrupt_0xf4
> 0xdf5e5870: at kernel DECR trap
>              by cpu_idle_60x+0x88 (so: srr0)
>              srr1=3D0x9032
>              r1  =3D0xdf5e5930
>              cr  =3D0x40000042
>              xer =3D0x20000000
>              ctr =3D0x8e3bf8
> saved LR(0xfffffffd) is invalid.
>=20
> db> show reg (but reformatted
> r0   =3D0x4
> r1   =3D0xdf5e5590
> r2   =3D0x147d6c0
> r3   =3D0x54       testppc64size+0x34
> r4   =3D0x591d000
> r5   =3D0
> r6   =3D0
> r7   =3D0xf
> r8   =3D0
> r9   =3D0xd4c03c   cold
> r10  =3D0x147d6c0
> r11  =3D0xdf5e55d0
> r12  =3D0
> r13  =3D0
> r14  =3D0xd4bdec   sdt_probe_func
> r15  =3D0xcb9898   std_lockstat___spin__release
> r16  =3D0xd4c45c   callwheelmask
> r17  =3D0xd4c45c   callwheelmask
> r18  =3D0x55925
> r19  =3D0x559a4
> r20  =3D0x559      dsmisssize+0x469
> r21  =3D0x591d000
> r22  =3D0x566430   sleeppq_timeout
> r23  =3D0x114      dsmisssize+0x24
> r24  =3D0
> r25  =3D0
> r26  =3D0x1
> r27  =3D0
> r28  =3D0xeba780   tdq_cpu
> r29  =3D0x147d6c0
> r30  =3D0xd1caac
> r31  =3D0xdf5e5590
> srr0 =3D0x90a030
> srr1 =3D0x81032
> lr   =3D0x535ad0   shed_affinity+0x18
> ctr  =3D0
> cr   =3D0x20009034
> xer  =3D0
> dar  =3D0x419df5d4
> dsisr=3D0x24000000
> _etext+0xb8fc: illegal instruction 0
>=20
>=20
> Just for completeness: acttrace also showed:
>=20
> Tracing command less pid 1144 tid 100150 td 0x5bc8a20 (CPU 3)
> 0xd25a59f0: at powrpc_dispatch_intr+0xc8
> 0xd25a5a20: at openpic_dispatch+0x90
> 0xd25a5a50: at powerpc_interrupt+0xc0
> 0xd25a5a80: at user EXI trap
>              by 0x181c68c (so: ssr0)
>              r1 =3D0xffffdb30
>              cr =3D0x44020624
>              xer=3D0
>              ctr=3D0x41989570
>=20
> Tracing command idle pid 11 tid 100004 td 0x147d360 (CPU 1)
> saved LR(0x4c) in invalid
>=20
> Tracing command idle pid 11 tid 100005 td 0x147d360 (CPU 2)
> saved LR(0x4c) in invalid





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CF7FA043-23DC-4012-B7D4-3A0E21BE924A>