Date: Sat, 27 May 2017 01:17:22 -0700 From: Mark Millard <markmi@dsl-only.net> To: Justin Hibbits <jhibbits@FreeBSD.org>, FreeBSD PowerPC ML <freebsd-ppc@freebsd.org>, freebsd-hackers@freebsd.org Cc: Nathan Whitehorn <nwhitehorn@freebsd.org> Subject: Re: A good backtrace from a head -r317820 powerpc random/periodic panic: execution of garbage at 0x0090a030 (in .hash section) [better bt; more] Message-ID: <CF7FA043-23DC-4012-B7D4-3A0E21BE924A@dsl-only.net> In-Reply-To: <62BF8E69-E7E6-4C4F-AB33-38B03E903CDA@dsl-only.net> References: <1CE8346B-04F3-48AB-A3E9-6DF3B86B8D1A@dsl-only.net> <8C88BB6F-E747-42A1-9DDC-35EC6D865141@dsl-only.net> <62BF8E69-E7E6-4C4F-AB33-38B03E903CDA@dsl-only.net>
next in thread | previous in thread | raw e-mail | index | archive | help
[I suggest a patch this time.] On 2017-May-27, at 12:42 AM, Mark Millard <markmi@dsl-only.net> wrote: > [Top post of the answer to what is wrong. I > have submitted 219589 for this.] >=20 > TARGET_ARCH=3Dpowerpc64 got a fix to > bugzilla 205458, avoiding inappropriate > restoration of openfirmware's sprg0 > value. >=20 > It turns out that TARGET_ARCH=3Dpowerpc > needs to detect when it is running on > powerpc64 and do the same thing if it > is to avoid trashing memory and running > unreliably on PowerMac G5's. >=20 > The issue is that for powerpc64 it is > inappropriate to restore the sprg0 > value to its openfirmware value. This > is because the FreeBSD real-mode > handling is involved instead of the > openfirmware's original virtual mode, > making openfirware's value simply > inappropriate. >=20 > Quoting Nathan W. from Comment > #4 of 205458: >=20 >> Where this explodes is if OF uses an unmapped SLB entry. >> The SLB fault handler runs in real mode and refers to the >> PCPU pointer in SPRG0, which blows up the kernel. Having >> a value of SPRG0 that works for the kernel is less fatal >> than preserving OF's value in this case. >=20 > I know that part of the code does detect > the powerpc64 context vs. not and does > things differently to emulate being powerpc > like on powerpc64 (such as limiting > RAM use as a consequence). >=20 > The powerpc64 vs. not status needs to be > recorded and used to control a sprg0 > restoration choice: avoid restoring > openfirmware's value on powerpc64; > otherwise restore it. I expect that the following patch would fix the problem: # svnlite diff /usr/src/sys/powerpc/ofw/ofw_machdep.c Index: /usr/src/sys/powerpc/ofw/ofw_machdep.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- /usr/src/sys/powerpc/ofw/ofw_machdep.c (revision 317820) +++ /usr/src/sys/powerpc/ofw/ofw_machdep.c (working copy) @@ -147,7 +147,8 @@ * PCPU data cannot be used until this routine is called ! */ #ifndef __powerpc64__ - __asm __volatile("mtsprg0 %0" :: "r"(ofw_sprg0_save)); + if (cpu_features & PPC_FEATURE_64 !=3D PPC_FEATURE_64) + __asm __volatile("mtsprg0 %0" :: "r"(ofw_sprg0_save)); #endif } #endif This is based on cpu_features already having had PPC_FEATURE_64 masked in before this if things are running on a PowerMac G5 or other powerpc64. =3D=3D=3D Mark Millard markmi at dsl-only.net [The original (better) panic evidence. . .] On 2017-May-26, at 10:29 PM, Mark Millard <markmi@dsl-only.net> wrote: > [Additional information that does not need to > interlace with the prior material, so see > after. A somewhat better backtrace reported > by ddb. And so on.] >=20 >=20 > On 2017-May-26, at 7:14 PM, Mark Millard <markmi at dsl-only.net> = wrote: >=20 >> I lucked out and got a vmcore.9 for a random >> panic that I could manage to backtrace for >> one of my test builds of -r317820. It appears >> that not all that much happened before it got >> the panic so much context was better preserved >> this time. >>=20 >> (I do not explore from ddb as I've had that >> panic and mess up the dump just made by >> replacing it. So this is a manual backtrace >> from the debug.minidump=3D0 style vmcore.9 >> file. objdump was used on the >> /boot/kernel/kernel to find code.) >>=20 >> Being able to see the problem is very >> sensitive to kernel memory layout. This >> is why I'm sticking with -r317820 built >> production style: the kind of context >> the problem was first observed in. >> Attempting a debug kernel build simply >> did not repeat the problem for days >> (vs. the usual hours for builds like >> this). >>=20 >>=20 >> So below is the backtrace: >>=20 >> (I do not show what trap_fatal calls: >> starting with trap_fatal and going toward >> larger memory addresses. . .) >>=20 >> [vmcore.9's >> offset >> in file >> when no >> 0x prefix] >>=20 >> 013e83b0 df 5e 55 50 00 8f 34 5c fa 50 05 af fa 50 05 af = |.^UP..4\.P...P..| >> 0x008f3454 <trap+0x1228> mr r3,r26 >> 0x008f3458 <trap+0x122c> bl 008f2030 <trap_fatal> >> 0x008f345c <trap+0x1230> b 008f34c8 <trap+0x129c> >>=20 >> 013e83c0 fa 50 05 af fa 50 05 af fa 50 05 af fa 50 05 af = |.P...P...P...P..| >> * >> 013e83e0 df 5e 54 00 fa 50 05 af fa 50 05 af fa 50 05 af = |.^T..P...P...P..| >> 013e83f0 fa 50 05 af fa 50 05 af 00 d1 ca ac df 5e 54 00 = |.P...P.......^T.| >> 013e8400 df 5e 54 20 00 53 3a d0 fa 50 05 af fa 50 05 af |.^T = .S:..P...P..| >> 013e8410 fa 50 05 af fa 50 05 af 00 d1 ca ac df 5e 54 20 = |.P...P.......^T | >> 013e8420 df 5e 54 d0 00 53 3a d0 00 00 00 00 00 00 00 00 = |.^T..S:.........| >> 013e8430 00 00 00 00 00 00 00 00 00 d1 ca ac 00 00 00 0f = |................| >> 013e8440 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 = |................| >> * >> 013e8460 00 00 00 54 7f ff ff ff 00 00 00 00 ff ff ff aa = |...T............| >> 013e8470 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 = |................| >> * >> 013e8490 00 d4 c4 5c 00 d4 c4 5c 00 17 99 dd 00 17 9a 5c = |...\...\.......\| >> 013e84a0 00 00 17 9a 00 d4 c4 6c df 5e 54 ec 00 00 00 54 = |.......l.^T....T| >> 013e84b0 df 5e 54 d0 7f ff ff ff 05 aa 36 c0 05 aa 39 e8 = |.^T.......6...9.| >> 013e84c0 00 00 00 00 00 00 00 00 00 d1 ca ac df 5e 54 d0 = |.............^T.| >> 013e84d0 df 5e 55 80 00 53 3a d0 05 91 d0 00 05 91 d3 28 = |.^U..S:........(| >> 013e84e0 df 5e 55 00 00 00 00 00 00 d1 ca ac 00 00 00 0f = |.^U.............| >> 013e84f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 = |................| >> 013e8500 00 00 00 00 00 00 00 00 00 d4 bd ec 00 cb 98 98 = |................| >> 013e8510 00 d4 c4 5c 00 d4 c4 5c 00 17 99 f8 00 17 99 f8 = |...\...\........| >> 013e8520 00 00 17 99 fa 69 b8 79 00 17 9a 0a 00 00 17 99 = |.....i.y........| >> 013e8530 f8 1c 2f cc df 5e 55 88 01 47 d6 c0 01 43 a2 00 = |../..^U..G...C..| >> 013e8540 41 eb 30 00 0a 00 00 00 00 d2 6e 4c df 5e 55 50 = |A.0.......nL.^UP| >>=20 >> 013e8550 df 5e 55 80 00 8e 7d 40 df 5e 55 9c 00 00 00 28 = |.^U...}@.^U....(| >> 0x008e7d28 <powerpc_interrupt+0x184> mfmsr r0 >> 0x008e7d2c <powerpc_interrupt+0x188> or r0,r0,r9 >> 0x008e7d30 <powerpc_interrupt+0x18c> mtmsr r0 >> 0x008e7d34 <powerpc_interrupt+0x190> isync >> 0x008e7d38 <powerpc_interrupt+0x194> mr r3,r25 >> 0x008e7d3c <powerpc_interrupt+0x198> bl 008f222c <trap> >> 0x008e7d40 <powerpc_interrupt+0x19c> lwz r11,0(r1) >>=20 >> 013e8560 00 00 00 00 00 00 17 9a 06 40 c2 a8 01 43 a2 00 = |.........@...C..| >> 013e8570 41 eb 30 00 0a 00 00 00 00 00 00 00 00 08 10 32 = |A.0............2| >>=20 >> r0 r1 >> 013e8580 df 5e 56 40 00 10 08 f8 00 00 00 04 df 5e 56 40 = |.^V@.........^V@| >> The struct trapframe starts is 0x 013e8588 in the >> vmcore.9 file. The 0x00108f8 is as shown below: >>=20 >> 0x001008ec <k_trap+0x118> isync >> 0x001008f0 <trapagain> addi r3,r1,8 >> 0x001008f4 <trapagain+0x4> bl 008e7ba4 <powerpc_interrupt> >> 0x001008f8 <trapexit> mfmsr r3 >> 0x001008fc <trapexit+0x4> andi. r3,r3,32767 >>=20 >> 013e8590 01 47 d6 c0 00 00 00 28 01 47 c6 c0 00 00 00 04 = |.G.....(.G......| >> r2 r3 r4 r5 >>=20 >> 013e85a0 00 00 00 04 00 00 00 0f 00 00 00 00 00 d4 c0 3c = |...............<| >> r6 r7 r8 r9 >>=20 >> 013e85b0 01 47 d6 c0 df 5e 56 80 14 3d 60 da 00 00 00 00 = |.G...^V..=3D`.....| >> r10 r11 r12 r13 >>=20 >> 013e85c0 00 d4 bd ec 00 cb 98 98 00 d4 c4 5c 00 d4 c4 5c = |...........\...\| >> r14 r15 r16 r17 >>=20 >> 013e85d0 00 17 99 f8 00 17 99 f8 00 00 17 99 fa 69 b8 79 = |.............i.y| >> r18 r19 r20 r21 >>=20 >> 013e85e0 00 17 9a 0a 00 00 17 99 f8 1c 2f cc 00 00 17 9a = |........../.....| >> r22 r32 r24 r25 >>=20 >> 013e85f0 06 40 c2 a8 01 43 a2 00 00 eb a7 80 01 47 d6 c0 = |.@...C.......G..| >> r26 r27 r28 r29 >>=20 >> 013e8600 00 d1 ca ac df 5e 56 40 00 53 5a d0 20 00 90 44 = |.....^V@.SZ. ..D| >> r30 r31 lr cr >>=20 >> xer ctr srr0 srr1 >> 013e8610 00 00 00 00 00 00 00 00 00 90 a0 30 00 08 10 32 = |...........0...2| >> Note: objdump shows no code at 0x0090a030. 0x0090a030 is >> inside what objdump -x reports as the section .hash (Idx >> 2). >>=20 >> exc dar dsisr (booke dbcer0) >> 013e8620 00 00 07 00 41 eb 30 00 0a 00 00 00 01 47 c6 c0 = |....A.0......G..| >> The 0x00007000 above is the framep->exc (exception code) >> (program in this case). >>=20 >> 013e8630 00 eb a7 80 01 47 60 10 00 d1 ca ac df 5e 56 40 = |.....G`......^V@| >>=20 >> At this point the above does not match the below >> part of the stack trace. >>=20 >> The lr part of struct trapframe was: 0x00535ad0 so >> showing around that: >>=20 >> 00535ab8 <sched_affinity> stwu r1,-32(r1) >> 00535abc <sched_affinity+0x4> mflr r0 >> 00535ac0 <sched_affinity+0x8> stw r29,20(r1) >> 00535ac4 <sched_affinity+0xc> stw r30,24(r1) >> 00535ac8 <sched_affinity+0x10> stw r31,28(r1) >> 00535acc <sched_affinity+0x14> stw r0,36(r1) >> 00535ad0 <sched_affinity+0x18> mr r31,r1 >> 00535ad4 <sched_affinity+0x1c> mr r29,r3 >>=20 >> Back to the stack backtrace. . . >>=20 >> 013e8640 df 5e 56 80 00 53 59 dc 00 17 99 f8 00 17 99 f8 = |.^V..SY.........| >> 0x005359c8 <sched_add+0x18c> bl 008ea420 <spinlock_exit> >> 0x005359cc <sched_add+0x190> mr r3,r28 >> 0x005359d0 <sched_add+0x194> mr r4,r27 >> 0x005359d4 <sched_add+0x198> mr r5,r25 >> 0x005359d8 <sched_add+0x19c> bl 005356ec <tdq_add> >> 0x005359dc <sched_add+0x1a0> mfsprg r9,0 >>=20 >> I show around 0x005356ec >> from the sched_add+0x19c bl that is above >> because the routine is not referenced >> in the stack tracce but the above indicates >> that it should have been called: >> 0x005356ec <tdq_add> stwu r1,-32(r1) >> 0x005356f0 <tdq_add+0x4> mflr r0 >> 0x005356f4 <tdq_add+0x8> stw r28,16(r1) >> 0x005356f8 <tdq_add+0xc> stw r29,20(r1) >> 0x005356fc <tdq_add+0x10> stw r30,24(r1) >> 0x00535700 <tdq_add+0x14> stw r31,28(r1) >> 0x00535704 <tdq_add+0x18> stw r0,36(r1) >> . . . >>=20 >> Back to the stack backtrace again. . . >>=20 >> 013e8650 00 00 17 99 fa 69 b8 79 00 17 9a 0a 00 00 00 04 = |.....i.y........| >> 013e8660 f8 1c 2f cc 00 00 17 9a 06 40 c2 a8 01 43 a2 00 = |../......@...C..| >> 013e8670 01 47 c6 c0 01 47 60 10 00 d1 b4 30 df 5e 56 80 = |.G...G`....0.^V.| >>=20 >> 013e8680 df 5e 56 b0 00 4a 87 8c 00 d2 5b 10 00 00 00 04 = |.^V..J....[.....| >> 0x004a8780 <intr_event_schedule_thread+0xc4> mr r3,r28 >> 0x004a8784 <intr_event_schedule_thread+0xc8> li r4,4 >> 0x004a8788 <intr_event_schedule_thread+0xcc> bl 0053583c = <sched_add> >> 0x004a878c <intr_event_schedule_thread+0xd0> lwz r9,0(r28) >>=20 >> 013e8690 df 5e 56 b0 00 00 17 9a 06 40 c2 a8 01 43 a2 00 = |.^V......@...C..| >> 013e86a0 00 00 00 00 01 46 81 c0 00 d1 b4 30 df 5e 56 b0 = |.....F.....0.^V.| >>=20 >> 013e86b0 df 5e 56 f0 00 4a 97 0c 00 00 00 00 00 00 00 04 = |.^V..J..........| >> 0x004a9700 <swi_sched+0xa4> bl 005000ec <critical_exit> >> 0x004a9704 <swi_sched+0xa8> mr r3,r27 >> 0x004a9708 <swi_sched+0xac> bl 004a86bc = <intr_event_schedule_thread> >> 0x004a970c <swi_sched+0xb0> lwz r11,0(r1) >>=20 >> 013e86c0 df 5e 56 e0 01 47 d6 c0 01 47 d6 c0 01 45 4d 40 = |.^V..G...G...EM@| >> 013e86d0 df 5e 56 f0 00 8e a4 44 06 40 c2 a8 00 00 17 9a = |.^V....D.@......| >> 013e86e0 78 00 00 00 00 e9 56 00 00 d1 c8 20 df 5e 56 f0 = |x.....V.... .^V.| >>=20 >> 013e86f0 df 5e 57 50 00 51 79 6c df 5e 58 78 01 47 d6 c0 = |.^WP.Qyl.^Xx.G..| >> 0x00517960 <callout_process+0x420> lwz r3,264(r29) >> 0x00517964 <callout_process+0x424> li r4,0 >> 0x00517968 <callout_process+0x428> bl 004a965c <swi_sched> >> 0x0051796c <callout_process+0x42c> lwz r11,0(r1) >>=20 >> 013e8700 01 47 d7 b8 00 00 00 00 00 d1 ab 24 00 00 00 04 = |.G.........$....| >> 013e8710 00 c9 66 bc 00 c4 5d 48 00 c9 66 bc 00 d4 c5 3c = |..f...]H..f....<| >> 013e8720 00 d0 53 00 00 eb a7 80 00 00 00 01 00 00 00 00 = |..S.............| >> 013e8730 df 5e 59 8c 00 00 00 00 df 5e 58 78 00 00 17 99 = |.^Y......^Xx....| >> 013e8740 f8 1c 2f cc d0 01 dd 00 00 d2 5b 10 df 5e 57 50 = |../.......[..^WP| >>=20 >> 013e8750 df 5e 57 a0 00 8a b2 70 df 5e 57 60 df 5e 57 60 = |.^W....p.^W`.^W`| >> 0x008ab264 <handleevents+0x2a4> mr r3,r27 >> 0x008ab268 <handleevents+0x2a8> mr r4,r28 >> 0x008ab26c <handleevents+0x2ac> bl 00517540 <callout_process> >> 0x008ab270 <handleevents+0x2b0> li r3,0 >>=20 >> 013e8760 df 5e 57 a0 df 5e 58 78 05 86 37 00 00 00 00 04 = |.^W..^Xx..7.....| >> 013e8770 00 00 00 00 05 9b f2 00 00 c9 66 bc 01 47 d6 c0 = |..........f..G..| >> 013e8780 df 5e 59 8c 00 f6 1d 10 00 00 17 99 f8 1c 2f cc = |.^Y.........../.| >> 013e8790 d0 01 dd 00 d0 01 dd 30 00 d2 5b 10 df 5e 57 a0 = |.......0..[..^W.| >>=20 >> 013e87a0 df 5e 58 20 00 8a d1 10 00 d2 6e 5c df 5e 57 b0 |.^X = ......n\.^W.| >> 0x008ad100 <timercb+0x4b8> mr r3,r26 >> 0x008ad104 <timercb+0x4bc> mr r4,r27 >> 0x008ad108 <timercb+0x4c0> li r5,0 >> 0x008ad10c <timercb+0x4c4> bl 008aafc0 <handleevents> >> 0x008ad110 <timercb+0x4c8> lwz r11,0(r1) >>=20 >> 013e87b0 df 5e 57 e0 00 4a 96 00 00 00 17 99 00 00 00 00 = |.^W..J..........| >> 013e87c0 f8 1c 2f cc 0a 3f a0 12 df 5e 58 78 05 86 37 00 = |../..?...^Xx..7.| >> 013e87d0 01 48 b1 00 05 86 37 80 00 d4 bd ec 00 cb 98 98 = |.H....7.........| >> 013e87e0 00 c9 66 bc 00 c4 5d 48 00 c9 66 bc 00 d4 c5 3c = |..f...]H..f....<| >> 013e87f0 df 5e 59 e0 00 eb a7 80 00 c9 66 bc 01 47 d6 c0 = |.^Y.......f..G..| >> 013e8800 df 5e 59 8c df 5e 58 78 01 47 d6 c0 00 00 00 00 = |.^Y..^Xx.G......| >> 013e8810 00 f6 1d 10 00 00 00 01 00 d2 6b c8 df 5e 58 20 = |..........k..^X | >>=20 >> 013e8820 df 5e 58 40 00 8e 1e 48 00 00 00 00 00 eb a7 80 = |.^X@...H........| >> So 0x13e8820 from vmcore.9 has start of struct trapeframe. >> The 0x8e1e48 is from: >> 0x008e1e34 <decr_intr+0xe0> lwz r0,56(r28) >> 0x008e1e38 <decr_intr+0xe4> mtctr r0 >> 0x008e1e3c <decr_intr+0xe8> mr r3,r28 >> 0x008e1e40 <decr_intr+0xec> lwz r4,64(r28) >> 0x008e1e44 <decr_intr+0xf0> bctrl >> 0x008e1e48 <decr_intr+0xf4> addi r29,r29,-1 >>=20 >> 013e8830 01 47 d7 94 00 00 00 01 00 d2 6e 4c df 5e 58 40 = |.G........nL.^X@| >> 013e8840 df 5e 58 70 00 8e 7c 9c 00 d1 ca ac df 5e 58 50 = |.^Xp..|......^XP| >> 013e8850 00 cd f0 74 00 00 00 01 00 00 00 01 00 eb a7 80 = |...t............| >> 013e8860 41 eb 30 00 0a 00 00 00 00 00 00 00 00 00 90 32 = |A.0............2| >> 013e8870 df 5e 59 30 00 10 08 f8 00 04 90 32 df 5e 59 30 = |.^Y0.......2.^Y0| >> 013e8880 01 47 d6 c0 00 00 00 00 0d 0b bf e3 00 00 00 00 = |.G..............| >> 013e8890 0d 0b bf e3 00 19 eb 7c 00 00 00 00 00 00 00 44 = |.......|.......D| >> 013e88a0 01 fc a0 55 00 00 90 32 d0 01 dd 00 00 00 00 00 = |...U...2........| >> 013e88b0 00 d4 bd ec 00 cb 98 98 00 c9 66 bc 00 c4 5d 48 = |..........f...]H| >> 013e88c0 00 c9 66 bc 00 d4 c5 3c df 5e 59 e0 00 eb a7 80 = |..f....<.^Y.....| >> 013e88d0 00 c9 66 bc 01 47 d6 c0 df 5e 59 8c 00 00 00 01 = |..f..G...^Y.....| >> 013e88e0 00 00 00 01 00 eb a7 80 00 00 00 00 00 8e 3b f8 = |..............;.| >>=20 >> srr0 >> 013e88f0 00 d2 6b f0 df 5e 59 30 00 8e 3c 14 40 00 00 42 = |..k..^Y0..<.@..B| >> The 0x00833c14 from the trap frame is the srr0 (conceptual lr): >> 0x008e3c04 <cpu_idle_60x+0xc> stw r31,28(r1) >> 0x008e3c08 <cpu_idle_60x+0x10> stw r0,36(r1) >> 0x008e3c0c <cpu_idle_60x+0x14> mr r31,r1 >> 0x008e3c10 <cpu_idle_60x+0x18> bcl- 20,4*cr7+so,008e3c14 = <cpu_idle_60x+0x1c> >> 0x008e3c14 <cpu_idle_60x+0x1c> mflr r30 >> 0x008e3c18 <cpu_idle_60x+0x20> lwz r0,-32(r30) >>=20 >> 013e8900 20 00 00 00 00 8e 3b f8 00 8e 3c 80 00 00 90 32 | = .....;...<....2| >>=20 >> exc >> 013e8910 00 00 09 00 41 eb 30 00 0a 00 00 00 00 00 00 00 = |....A.0.........| >> The 0x00009000 above is the framep->exc (exception code) >> (decrementer in this case). >>=20 >> 013e8920 eb 10 25 b5 55 7e c4 22 00 00 00 00 00 00 00 04 = |..%.U~."........| >>=20 >> 013e8930 df 5e 59 50 00 00 00 01 00 00 00 01 00 eb a7 80 = |.^YP............| >> (trap [above] means no lr filled in here) >>=20 >> 013e8940 00 00 00 00 00 d4 ca 34 00 d2 6b f0 df 5e 59 50 = |.......4..k..^YP| >>=20 >> 013e8950 df 5e 59 70 00 8e 31 9c 00 00 00 02 00 eb a7 80 = |.^Yp..1.........| >> 0x008e318c <cpu_idle+0x48> bl 008ad8b8 <cpu_idleclock> >> 0x008e3190 <cpu_idle+0x4c> lwz r29,0(r29) >> 0x008e3194 <cpu_idle+0x50> mtctr r29 >> 0x008e3198 <cpu_idle+0x54> bctrl >> 0x008e319c <cpu_idle+0x58> bl 008ad7b8 <cpu_activeclock> >>=20 >> 013e8960 00 f2 d5 fc 00 00 00 01 00 d1 ca ac df 5e 59 70 = |.............^Yp| >>=20 >> 013e8970 df 5e 5a 50 00 53 6e 7c fa 50 05 af fa 50 05 af = |.^ZP.Sn|.P...P..| >> 0x00536e78 <sched_idletd+0x4d0> bl 008e3144 <cpu_idle> >> 0x00536e7c <sched_idletd+0x4d4> stw r28,136(r27) >>=20 >>=20 >> 013e8980 fa 50 05 af fa 50 05 af fa 50 05 af ff ff ff fe = |.P...P...P......| >> 013e8990 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff = |................| >> 013e89a0 ff ff ff ff ff ff ff ff ff ff ff ff fa 50 05 af = |.............P..| >> 013e89b0 fa 50 05 af 00 00 00 02 ff ff ff ff 00 00 01 f0 = |.P..............| >> 013e89c0 ff ff ff fe ff ff ff ff ff ff ff ff ff ff ff ff = |................| >> 013e89d0 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff = |................| >> 013e89e0 ff ff ff fe ff ff ff ff ff ff ff ff ff ff ff ff = |................| >> 013e89f0 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff = |................| >> 013e8a00 df 5e 5a 20 fa 50 05 af 00 00 00 00 00 00 00 00 |.^Z = .P..........| >> 013e8a10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 = |................| >> * >> 013e8a30 00 00 00 00 00 53 69 a8 df 5e 5a 98 00 00 00 00 = |.....Si..^Z.....| >> 013e8a40 01 47 96 e0 01 47 d6 c0 00 d1 b3 70 df 5e 5a 50 = |.G...G.....p.^ZP| >>=20 >> 013e8a50 df 5e 5a 80 00 4a 3c b4 df 5e 5a 60 df 5e 5a 60 = |.^Z..J<..^Z`.^Z`| >> 0x004a3ca0 <fork_exit+0xe4> bl 008ea420 <spinlock_exit> >> 0x004a3ca4 <fork_exit+0xe8> mr r3,r27 >> 0x004a3ca8 <fork_exit+0xec> mr r4,r26 >> 0x004a3cac <fork_exit+0xf0> mtctr r25 >> 0x004a3cb0 <fork_exit+0xf4> bctrl >> 0x004a3cb4 <fork_exit+0xf8> lwz r0,108(r28) >>=20 >> 013e8a60 df 5e 5a 80 00 00 00 00 00 00 00 00 00 00 00 00 = |.^Z.............| >> 013e8a70 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 = |................| >>=20 >> 013e8a80 00 00 00 00 00 8f 18 d0 00 53 69 a8 00 00 00 00 = |.........Si.....| >> 0x008f18c0 <fork_trampoline> lwz r3,8(r1) >> 0x008f18c4 <fork_trampoline+0x4> lwz r4,12(r1) >> 0x008f18c8 <fork_trampoline+0x8> lwz r5,16(r1) >> 0x008f18cc <fork_trampoline+0xc> bl 004a3bbc <fork_exit> >> 0x008f18d0 <fork_trampoline+0x10> addi r1,r1,16 >> 0x008f18d4 <fork_trampoline+0x14> b 001008f8 <trapexit> >>=20 >>=20 >> So that is an example context for the failure. >>=20 >> It has taken weeks to get this. It may be some >> time before I get another for comparison/contrast. >>=20 >> And sticking with the same build vs. trying some more >> to find a better way to get more evidence is not >> obvious to me at this point. >=20 > I should have mentioned that this is > TARGET_ARCH=3Dpowerpc but used on a so-called PowerMac > G5 "Quad Core". >=20 > I've had 2 more examples, with the same 0x0090a030 > srr0 and such (vmcore.0 and vmcore.1) (I have added > one more little block of code for detecting an earlier > problem symptom not being currently seen so the build > is slight different from the earlier report.) >=20 > Looking around in vmcore.0 I find 3 examples of > "00 90 a0 30" in areas overlapping with objdump -x's > coverage. . . >=20 > 00c77fd0 00 00 00 00 00 00 00 00 00 90 a0 30 00 08 10 32 = |...........0...2| > [ ] >=20 > (from sorted objdump -x output:) > 00c775a8 l O .data 00000dc0 seqprog > 00c78368 l O .data 0000000c seeprom_long_ewen >=20 > [ ] > 00f65820 41 eb 30 00 0a 00 00 00 00 90 a0 30 00 08 10 32 = |A.0........0...2| > . . . > 00f65870 00 90 a0 30 00 08 10 32 00 00 00 00 df 5d 30 00 = |...0...2.....]0.| > [ ] >=20 > (from sorted objdump -x output:) > 00f64780 g O .bss 00020000 __pcpu > 00f84780 l O .bss 00000004 ap_letgo >=20 >=20 > For vmcore.1 I went ahead and tried exploring > some with ddb --and had no new panics. . . > (Hand transcriptions of pictures) >=20 > fatal kernel trap: > exception =3D 0x700 (program) > srr0 =3D 0x90a030 > srr1 =3D 0x81032 > lr =3D 0x535ad0 > curthread =3D 0x147d6c0 > pic =3D 11, comm =3D idle: cpu0 >=20 > [ thread pid 11 tid 100003 ] > Stopped at _etext+0xb8fc: illegal instruction 0 > db> bt > 0xdf5e55d0: at sched_wakeup+0xa4 > 0xdf5e55f0: at setrunnable+0x9c > 0xdf5e5610: at sleepq_resume_thread+0x17c > 0xdf5e5640: at sleepq_timeout+0xc8 > 0xdf5e5680: at softclock_call_cc+0x1f0 > 0xdf5e56f0: at callout_process+0x27c > 0xdf5e57a0: at timercb+0x4c4 > 0xdf5e5820: at decr_intr+0xf0 > 0xdf5e5840: at powerpc_interrupt_0xf4 > 0xdf5e5870: at kernel DECR trap > by cpu_idle_60x+0x88 (so: srr0) > srr1=3D0x9032 > r1 =3D0xdf5e5930 > cr =3D0x40000042 > xer =3D0x20000000 > ctr =3D0x8e3bf8 > saved LR(0xfffffffd) is invalid. >=20 > db> show reg (but reformatted > r0 =3D0x4 > r1 =3D0xdf5e5590 > r2 =3D0x147d6c0 > r3 =3D0x54 testppc64size+0x34 > r4 =3D0x591d000 > r5 =3D0 > r6 =3D0 > r7 =3D0xf > r8 =3D0 > r9 =3D0xd4c03c cold > r10 =3D0x147d6c0 > r11 =3D0xdf5e55d0 > r12 =3D0 > r13 =3D0 > r14 =3D0xd4bdec sdt_probe_func > r15 =3D0xcb9898 std_lockstat___spin__release > r16 =3D0xd4c45c callwheelmask > r17 =3D0xd4c45c callwheelmask > r18 =3D0x55925 > r19 =3D0x559a4 > r20 =3D0x559 dsmisssize+0x469 > r21 =3D0x591d000 > r22 =3D0x566430 sleeppq_timeout > r23 =3D0x114 dsmisssize+0x24 > r24 =3D0 > r25 =3D0 > r26 =3D0x1 > r27 =3D0 > r28 =3D0xeba780 tdq_cpu > r29 =3D0x147d6c0 > r30 =3D0xd1caac > r31 =3D0xdf5e5590 > srr0 =3D0x90a030 > srr1 =3D0x81032 > lr =3D0x535ad0 shed_affinity+0x18 > ctr =3D0 > cr =3D0x20009034 > xer =3D0 > dar =3D0x419df5d4 > dsisr=3D0x24000000 > _etext+0xb8fc: illegal instruction 0 >=20 >=20 > Just for completeness: acttrace also showed: >=20 > Tracing command less pid 1144 tid 100150 td 0x5bc8a20 (CPU 3) > 0xd25a59f0: at powrpc_dispatch_intr+0xc8 > 0xd25a5a20: at openpic_dispatch+0x90 > 0xd25a5a50: at powerpc_interrupt+0xc0 > 0xd25a5a80: at user EXI trap > by 0x181c68c (so: ssr0) > r1 =3D0xffffdb30 > cr =3D0x44020624 > xer=3D0 > ctr=3D0x41989570 >=20 > Tracing command idle pid 11 tid 100004 td 0x147d360 (CPU 1) > saved LR(0x4c) in invalid >=20 > Tracing command idle pid 11 tid 100005 td 0x147d360 (CPU 2) > saved LR(0x4c) in invalid
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CF7FA043-23DC-4012-B7D4-3A0E21BE924A>