From owner-freebsd-ppc@freebsd.org Tue May 16 22:41:35 2017 Return-Path: Delivered-To: freebsd-ppc@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C1D08D6E5CD for ; Tue, 16 May 2017 22:41:35 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: from asp.reflexion.net (outbound-mail-210-7.reflexion.net [208.70.210.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 787D164C for ; Tue, 16 May 2017 22:41:35 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: (qmail 8486 invoked from network); 16 May 2017 22:44:59 -0000 Received: from unknown (HELO mail-cs-01.app.dca.reflexion.local) (10.81.19.1) by 0 (rfx-qmail) with SMTP; 16 May 2017 22:44:59 -0000 Received: by mail-cs-01.app.dca.reflexion.local (Reflexion email security v8.40.0) with SMTP; Tue, 16 May 2017 18:41:28 -0400 (EDT) Received: (qmail 14021 invoked from network); 16 May 2017 22:41:28 -0000 Received: from unknown (HELO iron2.pdx.net) (69.64.224.71) by 0 (rfx-qmail) with (AES256-SHA encrypted) SMTP; 16 May 2017 22:41:28 -0000 Received: from [192.168.1.106] (c-76-115-7-162.hsd1.or.comcast.net [76.115.7.162]) by iron2.pdx.net (Postfix) with ESMTPSA id C238DEC7C08; Tue, 16 May 2017 15:41:27 -0700 (PDT) From: Mark Millard Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: FYI: powerpc EXC_LAST==0x2f00 vs. EXC_DEBUG ==0x2f10 and such? Message-Id: <0BC58DBC-AC9B-46F8-8F3A-1AEB90622BC4@dsl-only.net> Date: Tue, 16 May 2017 15:41:27 -0700 To: FreeBSD PowerPC ML , Justin Hibbits X-Mailer: Apple Mail (2.3273) X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 May 2017 22:41:35 -0000 [Context: I'm having problems with production-style kernel builds for TARGET_ARCH=3Dpowerpc (used an old PowerMac G5 so-called "Quad Core") getting occasional panics that involve oddities like: frame->exc =3D=3D 0x903a64e in the fatal kernel trap information on the console display. I'm not claiming the below is related but while looking around to figure out how to investigate I ran into what I report below. (The first 2 are from a block of 4xx / 85xx EXC_'s). ] =46rom /usr/src/sys/powerpc/include/trap.h : #define EXC_DEBUG 0x2f10 /* Debug trap */ #define EXC_VECAST_E 0x2f20 /* Altivec Assist (Book-E) */ #define EXC_LAST 0x2f00 /* Last possible exception = vector */ #define EXC_AST 0x3000 /* Fake AST vector */ /* Trap was in user mode */ #define EXC_USER 0x10000 And also: /usr/src/sys/powerpc/ofw/ofw_machdep.c:char = save_trap_init[0x2f00]; /* EXC_LAST */ /usr/src/sys/powerpc/ofw/ofw_machdep.c:char = save_trap_of[0x2f00]; /* EXC_LAST */ /usr/src/sys/powerpc/include/profile.h:#define __PROFILE_VECTOR_TOP = (EXC_LAST + 0x100) These makes it look like EXC_LAST and some literal 0x2f00's might be insufficient for some contexts. If they are sufficient for those contexts some notes about the relationships of the beyond-last ones would seem appropriate. The other power specific references for EXC_LAST are below. Note the __syncicache ones and bcopy ones, for example. /usr/src/sys/powerpc/aim/mmu_oea.c: if (phys_avail[j] < = EXC_LAST) /usr/src/sys/powerpc/aim/mmu_oea.c: phys_avail[j] +=3D= EXC_LAST; /usr/src/sys/powerpc/aim/mmu_oea64.c: if (phys_avail[j] < = EXC_LAST) /usr/src/sys/powerpc/aim/mmu_oea64.c: phys_avail[j] +=3D= EXC_LAST; /usr/src/sys/powerpc/aim/aim_machdep.c: for (trap =3D EXC_RST; trap < = EXC_LAST; trap +=3D 0x20) /usr/src/sys/powerpc/aim/aim_machdep.c: __syncicache(EXC_RSVD, EXC_LAST = - EXC_RSVD); /usr/src/sys/powerpc/powerpc/trap.c: { EXC_LAST, NULL } /usr/src/sys/powerpc/powerpc/trap.c: for (pe =3D powerpc_exceptions; = pe->vector !=3D EXC_LAST; pe++) { /usr/src/sys/powerpc/ofw/ofw_machdep.c: bcopy((void *)EXC_RST, = save_trap_vec, EXC_LAST - EXC_RST); /usr/src/sys/powerpc/ofw/ofw_machdep.c: bcopy(restore_trap_vec, (void = *)EXC_RST, EXC_LAST - EXC_RST); /usr/src/sys/powerpc/ofw/ofw_machdep.c: __syncicache(EXC_RSVD, EXC_LAST = - EXC_RSVD); =3D=3D=3D Mark Millard markmi at dsl-only.net From owner-freebsd-ppc@freebsd.org Wed May 17 22:22:08 2017 Return-Path: Delivered-To: freebsd-ppc@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3E39BD71B0F for ; Wed, 17 May 2017 22:22:08 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: from asp.reflexion.net (outbound-mail-210-6.reflexion.net [208.70.210.6]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id E41B413AF for ; Wed, 17 May 2017 22:22:07 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: (qmail 11817 invoked from network); 17 May 2017 22:22:01 -0000 Received: from unknown (HELO rtc-sm-01.app.dca.reflexion.local) (10.81.150.1) by 0 (rfx-qmail) with SMTP; 17 May 2017 22:22:01 -0000 Received: by rtc-sm-01.app.dca.reflexion.local (Reflexion email security v8.40.0) with SMTP; Wed, 17 May 2017 18:22:01 -0400 (EDT) Received: (qmail 9438 invoked from network); 17 May 2017 22:22:01 -0000 Received: from unknown (HELO iron2.pdx.net) (69.64.224.71) by 0 (rfx-qmail) with (AES256-SHA encrypted) SMTP; 17 May 2017 22:22:01 -0000 Received: from [192.168.1.106] (c-76-115-7-162.hsd1.or.comcast.net [76.115.7.162]) by iron2.pdx.net (Postfix) with ESMTPSA id EE6FDEC903F; Wed, 17 May 2017 15:22:00 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: llvm FreeBSD powerpc ABI target bug fix: Re: [Bug 26519] Clang 4.0.0's "Target: powerpc-unknown-freebsd11.0" code generation is violating the SVR4 ABI (SEGV can result) From: Mark Millard In-Reply-To: <893ECA11-7C80-4D24-A496-92ADC7978A07@FreeBSD.org> Date: Wed, 17 May 2017 15:22:00 -0700 Cc: FreeBSD Toolchain , FreeBSD PowerPC ML , Roman Divacky Content-Transfer-Encoding: quoted-printable Message-Id: <408D3509-3D62-4413-986B-6C1171FB6138@dsl-only.net> References: <0103401A-CEEA-4992-A45E-E60EA151119B@dsl-only.net> <893ECA11-7C80-4D24-A496-92ADC7978A07@FreeBSD.org> To: Dimitry Andric X-Mailer: Apple Mail (2.3273) X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 May 2017 22:22:08 -0000 I got a notice this morning of the latest fix for covering the TARGET_ARCH=3Dpowerpc stack-handling issues for llvm bugzilla 26519: > Comment # 33 on bug 26519 from Krzysztof Parzyszek > The fix has been committed in master in r303257. >=20 > I opened PR33070 to request merging it into 4.0.1. >=20 > You are receiving this mail because: > =E2=80=A2 You reported the bug. I've been using a version of the patch for some time and for buildworld it appears that with it powerpc and powerpc64 have a similar status: the one known area not working is handling of thrown C++ exceptions --for example the required dwarf information is incomplete so programs crash. (I have one powerpc64 patch in use that is not applied upstream or in FreeBSD that is essential for the powerpc64 status. See the later side notes for the tiny patch.) For buildkernel there is a difference for TARGET_ARCH=3Dpowerpc vs. TARGET_ARCH=3Dpowerpc64 : A) powerpc46 works for building and running the kernel and world on the old G5 PowerMacs. B) powerpc FreeBSD on the same machines fails at the /sbin/init attempt and then gets an alignment exception. (I've not tried a G4 or G3 yet.) As of yet I've no clue why (B) is an issue. Side notes: Note 0: The patch I was given a fair time ago that is required for TARGET_ARCH=3Dpowerpc64 is: # svnlite diff /usr/src/contrib/llvm/tools/ Index: /usr/src/contrib/llvm/tools/lld/ELF/Target.cpp =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- /usr/src/contrib/llvm/tools/lld/ELF/Target.cpp (revision = 317820) +++ /usr/src/contrib/llvm/tools/lld/ELF/Target.cpp (working copy) @@ -1070,7 +1070,8 @@ } =20 PPC64TargetInfo::PPC64TargetInfo() { - PltRel =3D GotRel =3D R_PPC64_GLOB_DAT; + GotRel =3D R_PPC64_GLOB_DAT; + PltRel =3D R_PPC64_JMP_SLOT; RelativeRel =3D R_PPC64_RELATIVE; GotEntrySize =3D 8; (Thanks to Roman Divacky.) Note 1: While a pure gcc 4.2.1 buildworld buildkernel with a debug kernel is working for booting and using, a production style kernel gets occasional panics on the old G5 PowerMacs. (The powerpc64 builds work fine on the same machines.) (I've not tried any G4's or a G3 yet.) It also appears that small changes in memory layout details (from trying to get better evidence) change the behavior/failure-mode/ details. I do not expect to find out what is going on any time soon. The same problems existed when buildworld was via clang 4. A kernel from a clang buildkernel does not get far enough for me to see what it would do for the issue. As this issue is more fundamental to general operation it has been getting much of my FreeBSD time. =3D=3D=3D Mark Millard markmi at dsl-only.net From owner-freebsd-ppc@freebsd.org Thu May 18 10:04:35 2017 Return-Path: Delivered-To: freebsd-ppc@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id CBBEBD700DF for ; Thu, 18 May 2017 10:04:35 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: from asp.reflexion.net (outbound-mail-210-7.reflexion.net [208.70.210.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 914F71F82 for ; Thu, 18 May 2017 10:04:35 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: (qmail 8202 invoked from network); 18 May 2017 10:08:00 -0000 Received: from unknown (HELO rtc-sm-01.app.dca.reflexion.local) (10.81.150.1) by 0 (rfx-qmail) with SMTP; 18 May 2017 10:08:00 -0000 Received: by rtc-sm-01.app.dca.reflexion.local (Reflexion email security v8.40.0) with SMTP; Thu, 18 May 2017 06:04:27 -0400 (EDT) Received: (qmail 9924 invoked from network); 18 May 2017 10:04:27 -0000 Received: from unknown (HELO iron2.pdx.net) (69.64.224.71) by 0 (rfx-qmail) with (AES256-SHA encrypted) SMTP; 18 May 2017 10:04:27 -0000 Received: from [192.168.1.106] (c-76-115-7-162.hsd1.or.comcast.net [76.115.7.162]) by iron2.pdx.net (Postfix) with ESMTPSA id AC7E2EC7B25; Thu, 18 May 2017 03:04:26 -0700 (PDT) From: Mark Millard Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: head -r317820: An example TARGET_ARCH=powerpc panic with interesting information reported (PowerMac G5 so-called "Quad Core" context) Message-Id: <3D469253-A16F-4723-B459-38BE01FFB051@dsl-only.net> Date: Thu, 18 May 2017 03:04:26 -0700 Cc: freebsd-hackers@freebsd.org To: FreeBSD PowerPC ML X-Mailer: Apple Mail (2.3273) X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 May 2017 10:04:35 -0000 Context: The context is a TARGET_ARCH=powerpc build used on an old PowerMac G5 so-called "Quad Core". Currently I'm using head -r317820 for investigating. I've been having problems with production-style kernel builds for TARGET_ARCH=powerpc getting occasional fatal kernel traps --but the debug kernel works and does not report anything odd. I see this even with pure gcc 4.2.1 based builds, not just my clang experiments. Also all the TARGET_ARCH=powerpc64 builds seem to work fine on the same machines. All the trap reports indicate pid 11 and one of the cpu idle threads, more cpu 0 than the others but not limited to one cpu/thread. Interesting evidence based on the trap frame: I caught a data storage interrupt fatal kernel trap from which the ddb information is interesting . . . For the fatal kernel trap notice for pid 11's cpu0 thread (there is a "timeout stopping cpus"): srr0 indicates 0x5e58e0 which is: sched_choose+0x100: lwz r30, -0x8(r11) in the build then in use. But lr is listed as: lr 0x5358d0, which is before srr0, lr pointing at: sched_choose+0xf0: lwz r0,4(r11) *** Does this ordering of lr vs. srr0 content make sense? Is it evidence of a problem? *** The code in the area looks like: 005358b8 li r0,-1 005358bc stb r0,152(r29) 005358c0 mfsprg r9,0 005358c4 lwz r28,4(r9) 005358c8 mr r3,r28 005358cc lwz r11,0(r1) => 005358d0 lwz r0,4(r11) 005358d4 mtlr r0 005358d8 lwz r28,-16(r11) 005358dc lwz r29,-12(r11) => 005358e0 lwz r30,-8(r11) 005358e4 lwz r31,-4(r11) 005358e8 mr r1,r11 005358ec blr The r3, r28, r11, r9, and r0 values in the trap frame do not seem to match the code. For example at 005358d0 the above would have r3=r28 but the "show reg" shows: r3 0x0 r28 0x7c2803c6 The r1 value and vmcore.3 content indicate 0x0147d6c0 for the later "lwz r28,-16(r11)". "show reg" reported both of the following for the trap frame: r1 0xdf5e58c0 r11 0xdf5e58c0 which would indicate r11 not being from the result of "lwz r11,0(r1)". vmcore.3 content indicates two things: A) The trap frame shows the 0xdf5e58c0 for r11. B) 0(r1) would have been 0xdf5e58e0 . In vmcore.3 0xdf5e58e0 has an associated lr value next to it: 0x5358ac, matching up with the return from the last bl in sched_choose: 005358a0 b 005358ac 005358a4 mr r4,r28 005358a8 bl 00500010 005358ac lbz r0,622(r28) So it looks as expected in vmcore.3 . Then either: a) The trap from is from prior to the "lwz r11,0(r1)" result. b) The 0(r1) access was not based on coherent/up-to-date cache/memory contents and so filled r11 with junk. c) The r11 value in the trap frame has been corrupted some other way. The trap frame shows: r9 0x1f8 But 0x1f8 is much smaller than what I see on a live system for sprg0: more like 0xf65f00 . And the 4(r9) would not get to a 0x7c2803c6 value from what I see in vmcore.3 . The trap frame's r0 0x4bfca6c1 does not fit with either "li r0,-1" nor the code: => 005358d0 lwz r0,4(r11) 005358d4 mtlr r0 (The trap frame reports lr as 0x5358d0.) (r0's value only exists in two places in vmcore.3, one being the trap frame.) The vmcore.3 and r1 would suggest the result of: 005358cc lwz r11,0(r1) => 005358d0 lwz r0,4(r11) 005358d4 mtlr r0 005358d8 lwz r28,-16(r11) 005358dc lwz r29,-12(r11) => 005358e0 lwz r30,-8(r11) 005358e4 lwz r31,-4(r11) 005358e8 mr r1,r11 should be the sequence of register assignments: r11 = 0xdf5e58e0 r0 = 0x00500460 lr = 0x00500460 r28 = 0x0147d6c0 r29 = 0x00d4d7c8 r30 = 0x00d1d6c4 r31 = 0xdf5e58e0 r1 = 0xdf5e58e0 None of which show up in the trap frame. But even the trap frame's 005358d0 would suggest the first of the above. As for the virtual address reported for the failure: virtual address = 0x7c2803ba There is only one register listed with a value near that: r28 0x7c2803a6 (This value is in vmcore.3 something like 677 times.) and: 0x7c2803ba - 0x7c2803a6 == 0x14 (20 decimal). But it does not fit for the code getting to 0x7c2803ba from r28. 0x7c2803ba only exists in 3 or so places in vmcore.3 (including the trap frame instance). I'm not sure how 0x7c2803ba ended up in the dar (or its spot in the trap frame) in vmcore.3 . Supporting detail: Looking around in the vmcore.3 file showed that the area I found with the proper back chain and lr links was present (this was debug.minidump=0), complete with having one of 3 instances of the failing virtual memory address (0x7c2803ba) near by --happening to be the dar value from the trap frame when I looked in detail. (Of course in vmcore.3 I'm seeing physical memory addresses offset by the dump's header size for locations of memory. So I'm learning what the mapping was by finding the region with the content.) Notes: I have evidence that changes in the memory layout of the kernel (such as by adding something to potentially use in investigating) changes the details of the failure behavior. I have had contexts where the failures would happen when the PowerMac involved was booted but not in active use. In other contexts I've had that not fail but something like "find / -name dog -print | more" leads to failure before completing. In the other context such things completed just fine. libkvm was never updated to deal with the powerpc and powerpc64 relocatable kernel format changes. On the PowerMac it seems to leave the kernel at its default place but the format is ET_DYN. More than just testing for ET_DYN is required in libkvm. And it appears that for powerpc and powerpc64 it never supported debug.minidump=0 . The open Bugzilla 219153 reports this issue and has more notes. === Mark Millard markmi at dsl-only.net From owner-freebsd-ppc@freebsd.org Sat May 20 04:48:53 2017 Return-Path: Delivered-To: freebsd-ppc@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5EF51D74E7D for ; Sat, 20 May 2017 04:48:53 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: from asp.reflexion.net (outbound-mail-210-9.reflexion.net [208.70.210.9]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 0FA811C80 for ; Sat, 20 May 2017 04:48:52 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: (qmail 20439 invoked from network); 20 May 2017 04:42:11 -0000 Received: from unknown (HELO mail-cs-02.app.dca.reflexion.local) (10.81.19.2) by 0 (rfx-qmail) with SMTP; 20 May 2017 04:42:11 -0000 Received: by mail-cs-02.app.dca.reflexion.local (Reflexion email security v8.40.0) with SMTP; Sat, 20 May 2017 00:42:11 -0400 (EDT) Received: (qmail 30179 invoked from network); 20 May 2017 04:42:11 -0000 Received: from unknown (HELO iron2.pdx.net) (69.64.224.71) by 0 (rfx-qmail) with (AES256-SHA encrypted) SMTP; 20 May 2017 04:42:11 -0000 Received: from [192.168.1.106] (c-76-115-7-162.hsd1.or.comcast.net [76.115.7.162]) by iron2.pdx.net (Postfix) with ESMTPSA id 5C899EC86B0; Fri, 19 May 2017 21:42:10 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: TARGET_ARCH=powerpc head -r317820 production-style kernel: periodic panics always in pid=11 (the Idle threads) From: Mark Millard In-Reply-To: <831804AB-1BEB-40C7-BA8B-94DF07E314E5@dsl-only.net> Date: Fri, 19 May 2017 21:42:08 -0700 Cc: FreeBSD PowerPC ML , FreeBSD Current Content-Transfer-Encoding: 7bit Message-Id: <1F50B6D9-4E41-4367-860D-E2A0E13AE661@dsl-only.net> References: <831804AB-1BEB-40C7-BA8B-94DF07E314E5@dsl-only.net> To: Justin Hibbits , Nathan Whitehorn X-Mailer: Apple Mail (2.3273) X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 May 2017 04:48:53 -0000 On 2017-May-9, at 2:00 PM, Mark Millard wrote: . . . > fatal kernel trap: > exception = 0x903a64e (unknown) > srr0 = 0x7ff760 > srr1 = 0xc1007c > lr = 0x907f > curthread = 0x147d6c0 > pid = 11, comm = idle: cpu0 > [ thread pid 11 tid 100003 ] > Stopped at ffs_truncate+0x1080: stw r11, 0xf8(r31) > > 1 contains (cpu1 instead of cpu0, so different tid): > > fatal kernel trap: > exception = 0x903a64e (unknown) > srr0 = 0x7ff760 > srr1 = 0xc1007c > lr = 0x907f > curthread = 0x147d360 > pid = 11, comm = idle: cpu1 > [ thread pid 11 tid 100004 ] > Stopped at ffs_truncate+0x1080: stw r11, 0xf8(r31) > > 1 contains: I've discovered where to find the trapframe in the vmcore.* files for these specific examples with 0x903a64e as the exception and such. In the vmcore the memory image starts at byte offset 0x1000. To see the values reported the only place in the image file to start that produces those values at the offsets for in side the powerpc trapframe is: offset 0x1001 in the vmcore.* file. So memory address 0x1 is being used as the trapframe address when that odd exception information is being displayed. Yep: misaligned. The decoding is not of the actual trapframe: it is garbage that is not to be believed. Note: I lucked out after the above and got a somewhat different odd trap information that lead to actually getting a backtrace that included the actual pid 11 cpu 1 kernel thread stack bt associated with that odd information display. I'll send a separate reply for that information as it will take some transcribing from camera pictures and such. === Mark Millard markmi at dsl-only.net From owner-freebsd-ppc@freebsd.org Sat May 20 05:24:52 2017 Return-Path: Delivered-To: freebsd-ppc@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9E904D75DFB for ; Sat, 20 May 2017 05:24:52 +0000 (UTC) (envelope-from chmeeedalf@gmail.com) Received: from mail-qk0-x230.google.com (mail-qk0-x230.google.com [IPv6:2607:f8b0:400d:c09::230]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 5993C1EAD for ; Sat, 20 May 2017 05:24:52 +0000 (UTC) (envelope-from chmeeedalf@gmail.com) Received: by mail-qk0-x230.google.com with SMTP id y201so75666352qka.0 for ; Fri, 19 May 2017 22:24:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc; bh=tQeTNwca8rGPM/9wA6vTLqDXLn7V4GJ5jUV6sb7sv98=; b=RhcJ70dKNpigTuRNy0XR6mTQrh6nAeI5vxjXSvCMj+luVunKcTvdHeBmnHiwYkIYOj CY+oyJJTtBuVZnH1O6IawAIDF2CaPKnm6/3kbygoE0zvmxa9927E7+3NBhs518ZCrE18 zqm6mXEI237G3+KvsAhcMuUAlBhttYvEy+EJRPXwwsaE07c2wL9kR7Mfr5N4Ka3RRP3z pH7m4AJQB3QlvLv+CzSMQZIxEhUfb+WSUQT7hiKZzbO/j1tA9vPrzenmlquuPX03DC9Y bL/Q2Isg9duNj01E1sldo7VeO28O+m9Dihqh574KH3B+S+gmhM5Q8n7KhNQ3RJQUufB2 CT0Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc; bh=tQeTNwca8rGPM/9wA6vTLqDXLn7V4GJ5jUV6sb7sv98=; b=MTYHHOjqkGrC31ho6jg+FPZEz4oXvR0N7N1CSgTMejhEKqXGkoAi6eJ3Q8I9UzdwAb /0vNRBHBj5gXmMwgFp82DywvlRL4AUXXfIZP2YuflghinF4vnt9up6dAIc6bxwRil4B5 AxNSUnw6a5vBKIYp921WAOWP5X4AdudnZTUVI4UnenLMRg8CcmG9/bT343AO+o58yUn9 FX4tywCFEZcYS8EiLAGnwKoGpt21Wb5rTxUcYX/OX+b46BQNzzIZs7jy0aPtCSf5ykuy MhQl6KEjdMu5N33b8xZcW0j+1aMM2Ze7Xi14RXlYNmItpVCnTdRWyCLhuI+WTt0W4Cyz Y0LQ== X-Gm-Message-State: AODbwcDgaVAxcHSt25Ae7GkUuppyoqRLugYqS81XUyyp+zjnmErgMD08 4Bkp9CDB3BWUCkVbXM+OBqLMc79bOQ== X-Received: by 10.233.216.194 with SMTP id u185mr12876910qkf.105.1495257891510; Fri, 19 May 2017 22:24:51 -0700 (PDT) MIME-Version: 1.0 Sender: chmeeedalf@gmail.com Received: by 10.12.168.203 with HTTP; Fri, 19 May 2017 22:24:50 -0700 (PDT) In-Reply-To: <0BC58DBC-AC9B-46F8-8F3A-1AEB90622BC4@dsl-only.net> References: <0BC58DBC-AC9B-46F8-8F3A-1AEB90622BC4@dsl-only.net> From: Justin Hibbits Date: Sat, 20 May 2017 00:24:50 -0500 X-Google-Sender-Auth: B1acCeLLeAc9VKbght21svV9fDc Message-ID: Subject: Re: FYI: powerpc EXC_LAST==0x2f00 vs. EXC_DEBUG ==0x2f10 and such? To: Mark Millard Cc: FreeBSD PowerPC ML Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.23 X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 May 2017 05:24:52 -0000 On Tuesday, May 16, 2017, Mark Millard wrote: > [Context: > > I'm having problems with production-style kernel builds > for TARGET_ARCH=powerpc (used an old PowerMac G5 so-called > "Quad Core") getting occasional panics that involve > oddities like: > > frame->exc == 0x903a64e > > in the fatal kernel trap information on the console > display. > > I'm not claiming the below is related but while looking > around to figure out how to investigate I ran into > what I report below. (The first 2 are from a block of > 4xx / 85xx EXC_'s). > ] > > > From /usr/src/sys/powerpc/include/trap.h : > > #define EXC_DEBUG 0x2f10 /* Debug trap */ > #define EXC_VECAST_E 0x2f20 /* Altivec Assist (Book-E) */ > > #define EXC_LAST 0x2f00 /* Last possible exception vector > */ > > #define EXC_AST 0x3000 /* Fake AST vector */ > > /* Trap was in user mode */ > #define EXC_USER 0x10000 > > And also: > > /usr/src/sys/powerpc/ofw/ofw_machdep.c:char > save_trap_init[0x2f00]; /* EXC_LAST */ > /usr/src/sys/powerpc/ofw/ofw_machdep.c:char > save_trap_of[0x2f00]; /* EXC_LAST */ > > /usr/src/sys/powerpc/include/profile.h:#define __PROFILE_VECTOR_TOP > (EXC_LAST + 0x100) > > These makes it look like EXC_LAST and some literal > 0x2f00's might be insufficient for some contexts. > Nope EXC_LAST is correct as-is. It's the last possible exception vector for AIM, as that uses physical pages at those addresses for the exception vectors. Anything above EXC_LAST is an artificial exception. Now, it does look odd, so I should move the EXC_DEBUG and EXC_VECAST_E down below the EXC_LAST for sorting purposes. - Justin From owner-freebsd-ppc@freebsd.org Sat May 20 05:30:27 2017 Return-Path: Delivered-To: freebsd-ppc@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 68B8DD75F57 for ; Sat, 20 May 2017 05:30:27 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: from asp.reflexion.net (outbound-mail-210-10.reflexion.net [208.70.210.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 204F91F77 for ; Sat, 20 May 2017 05:30:26 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: (qmail 6737 invoked from network); 20 May 2017 05:31:41 -0000 Received: from unknown (HELO rtc-sm-01.app.dca.reflexion.local) (10.81.150.1) by 0 (rfx-qmail) with SMTP; 20 May 2017 05:31:41 -0000 Received: by rtc-sm-01.app.dca.reflexion.local (Reflexion email security v8.40.0) with SMTP; Sat, 20 May 2017 01:30:25 -0400 (EDT) Received: (qmail 28947 invoked from network); 20 May 2017 05:30:25 -0000 Received: from unknown (HELO iron2.pdx.net) (69.64.224.71) by 0 (rfx-qmail) with (AES256-SHA encrypted) SMTP; 20 May 2017 05:30:25 -0000 Received: from [192.168.1.106] (c-76-115-7-162.hsd1.or.comcast.net [76.115.7.162]) by iron2.pdx.net (Postfix) with ESMTPSA id 9D5D5EC86B0; Fri, 19 May 2017 22:30:24 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: FYI: powerpc EXC_LAST==0x2f00 vs. EXC_DEBUG ==0x2f10 and such? From: Mark Millard In-Reply-To: Date: Fri, 19 May 2017 22:30:24 -0700 Cc: FreeBSD PowerPC ML Content-Transfer-Encoding: quoted-printable Message-Id: <44ACFC7B-445A-4648-B387-E31021A8D363@dsl-only.net> References: <0BC58DBC-AC9B-46F8-8F3A-1AEB90622BC4@dsl-only.net> To: Justin Hibbits X-Mailer: Apple Mail (2.3273) X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 May 2017 05:30:27 -0000 On 2017-May-19, at 10:24 PM, Justin Hibbits = wrote: On Tuesday, May 16, 2017, Mark Millard wrote: > . . .=46rom /usr/src/sys/powerpc/include/trap.h : >=20 > #define EXC_DEBUG 0x2f10 /* Debug trap */ > #define EXC_VECAST_E 0x2f20 /* Altivec Assist (Book-E) */ >=20 > #define EXC_LAST 0x2f00 /* Last possible exception = vector */ >=20 > #define EXC_AST 0x3000 /* Fake AST vector */ >=20 > /* Trap was in user mode */ > #define EXC_USER 0x10000 >=20 > And also: >=20 > /usr/src/sys/powerpc/ofw/ofw_machdep.c:char = save_trap_init[0x2f00]; /* EXC_LAST */ > /usr/src/sys/powerpc/ofw/ofw_machdep.c:char = save_trap_of[0x2f00]; /* EXC_LAST */ >=20 > /usr/src/sys/powerpc/include/profile.h:#define __PROFILE_VECTOR_TOP = (EXC_LAST + 0x100) >=20 > These makes it look like EXC_LAST and some literal > 0x2f00's might be insufficient for some contexts. >=20 > Nope EXC_LAST is correct as-is. It's the last possible exception = vector for AIM, as that uses physical pages at those addresses for the = exception vectors. Anything above EXC_LAST is an artificial exception. = Now, it does look odd, so I should move the EXC_DEBUG and EXC_VECAST_E = down below the EXC_LAST for sorting purposes. Thanks for checking. One other point: save_trap_init[0x2f00] does not include 0x2500 save_trap_of[0x2f00] does not include 0x2500 But. . . > #define EXC_LAST 0x2f00 /* Last possible exception = vector */ indicates that 0x2500 is included. Is it actually the count/bound and not the last possible? =3D=3D=3D Mark Millard markmi at dsl-only.net From owner-freebsd-ppc@freebsd.org Sat May 20 05:34:16 2017 Return-Path: Delivered-To: freebsd-ppc@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id BB357D75107 for ; Sat, 20 May 2017 05:34:16 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: from asp.reflexion.net (outbound-mail-210-11.reflexion.net [208.70.210.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 6E7551285 for ; Sat, 20 May 2017 05:34:15 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: (qmail 21715 invoked from network); 20 May 2017 05:34:14 -0000 Received: from unknown (HELO mail-cs-01.app.dca.reflexion.local) (10.81.19.1) by 0 (rfx-qmail) with SMTP; 20 May 2017 05:34:14 -0000 Received: by mail-cs-01.app.dca.reflexion.local (Reflexion email security v8.40.0) with SMTP; Sat, 20 May 2017 01:34:14 -0400 (EDT) Received: (qmail 17158 invoked from network); 20 May 2017 05:34:14 -0000 Received: from unknown (HELO iron2.pdx.net) (69.64.224.71) by 0 (rfx-qmail) with (AES256-SHA encrypted) SMTP; 20 May 2017 05:34:14 -0000 Received: from [192.168.1.106] (c-76-115-7-162.hsd1.or.comcast.net [76.115.7.162]) by iron2.pdx.net (Postfix) with ESMTPSA id E29B5EC8FD9; Fri, 19 May 2017 22:34:13 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: FYI: powerpc EXC_LAST==0x2f00 vs. EXC_DEBUG ==0x2f10 and such? From: Mark Millard In-Reply-To: <44ACFC7B-445A-4648-B387-E31021A8D363@dsl-only.net> Date: Fri, 19 May 2017 22:34:13 -0700 Cc: FreeBSD PowerPC ML Content-Transfer-Encoding: quoted-printable Message-Id: <9A2218D5-4795-4A83-A257-61CA3EDBC776@dsl-only.net> References: <0BC58DBC-AC9B-46F8-8F3A-1AEB90622BC4@dsl-only.net> <44ACFC7B-445A-4648-B387-E31021A8D363@dsl-only.net> To: Justin Hibbits X-Mailer: Apple Mail (2.3273) X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 May 2017 05:34:16 -0000 On 2017-May-19, at 10:30 PM, Mark Millard wrote: > On 2017-May-19, at 10:24 PM, Justin Hibbits = wrote: >=20 > On Tuesday, May 16, 2017, Mark Millard wrote: >> . . .=46rom /usr/src/sys/powerpc/include/trap.h : >>=20 >> #define EXC_DEBUG 0x2f10 /* Debug trap */ >> #define EXC_VECAST_E 0x2f20 /* Altivec Assist (Book-E) */ >>=20 >> #define EXC_LAST 0x2f00 /* Last possible exception = vector */ >>=20 >> #define EXC_AST 0x3000 /* Fake AST vector */ >>=20 >> /* Trap was in user mode */ >> #define EXC_USER 0x10000 >>=20 >> And also: >>=20 >> /usr/src/sys/powerpc/ofw/ofw_machdep.c:char = save_trap_init[0x2f00]; /* EXC_LAST */ >> /usr/src/sys/powerpc/ofw/ofw_machdep.c:char = save_trap_of[0x2f00]; /* EXC_LAST */ >>=20 >> /usr/src/sys/powerpc/include/profile.h:#define __PROFILE_VECTOR_TOP = (EXC_LAST + 0x100) >>=20 >> These makes it look like EXC_LAST and some literal >> 0x2f00's might be insufficient for some contexts. >>=20 >> Nope EXC_LAST is correct as-is. It's the last possible exception = vector for AIM, as that uses physical pages at those addresses for the = exception vectors. Anything above EXC_LAST is an artificial exception. = Now, it does look odd, so I should move the EXC_DEBUG and EXC_VECAST_E = down below the EXC_LAST for sorting purposes. >=20 > Thanks for checking. >=20 > One other point: >=20 > save_trap_init[0x2f00] does not include 0x2500 > save_trap_of[0x2f00] does not include 0x2500 I meant "does not include 0x2f00" in both places. [It is a day for taking things slowly. . .] > But. . . >=20 >> #define EXC_LAST 0x2f00 /* Last possible exception = vector */ >=20 > indicates that 0x2500 is included. Is it actually > the count/bound and not the last possible? Same here: 0x2f00, not 0x2500. =3D=3D=3D Mark Millard markmi at dsl-only.net From owner-freebsd-ppc@freebsd.org Sat May 20 09:01:59 2017 Return-Path: Delivered-To: freebsd-ppc@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9A80FD752DF for ; Sat, 20 May 2017 09:01:59 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: from asp.reflexion.net (outbound-mail-210-10.reflexion.net [208.70.210.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 60D4610AA for ; Sat, 20 May 2017 09:01:58 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: (qmail 12573 invoked from network); 20 May 2017 09:05:32 -0000 Received: from unknown (HELO mail-cs-01.app.dca.reflexion.local) (10.81.19.1) by 0 (rfx-qmail) with SMTP; 20 May 2017 09:05:32 -0000 Received: by mail-cs-01.app.dca.reflexion.local (Reflexion email security v8.40.0) with SMTP; Sat, 20 May 2017 05:01:57 -0400 (EDT) Received: (qmail 23378 invoked from network); 20 May 2017 09:01:57 -0000 Received: from unknown (HELO iron2.pdx.net) (69.64.224.71) by 0 (rfx-qmail) with (AES256-SHA encrypted) SMTP; 20 May 2017 09:01:57 -0000 Received: from [192.168.1.106] (c-76-115-7-162.hsd1.or.comcast.net [76.115.7.162]) by iron2.pdx.net (Postfix) with ESMTPSA id 5EA73EC81B0; Sat, 20 May 2017 02:01:56 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: TARGET_ARCH=powerpc head -r317820 production-style kernel: periodic panics always in pid=11 (the Idle threads) From: Mark Millard In-Reply-To: <1F50B6D9-4E41-4367-860D-E2A0E13AE661@dsl-only.net> Date: Sat, 20 May 2017 02:01:55 -0700 Cc: FreeBSD PowerPC ML , FreeBSD Current Content-Transfer-Encoding: quoted-printable Message-Id: <81F7A2D8-4C3A-426D-B957-9DC937006D88@dsl-only.net> References: <831804AB-1BEB-40C7-BA8B-94DF07E314E5@dsl-only.net> <1F50B6D9-4E41-4367-860D-E2A0E13AE661@dsl-only.net> To: Justin Hibbits , Nathan Whitehorn X-Mailer: Apple Mail (2.3273) X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 May 2017 09:01:59 -0000 On 2017-May-19, at 9:42 PM, Mark Millard wrote: > On 2017-May-9, at 2:00 PM, Mark Millard wrote: >=20 > . . . >> fatal kernel trap: >> exception =3D 0x903a64e (unknown) >> srr0 =3D 0x7ff760 >> srr1 =3D 0xc1007c >> lr =3D 0x907f >> curthread =3D 0x147d6c0 >> pid =3D 11, comm =3D idle: cpu0 >> [ thread pid 11 tid 100003 ] >> Stopped at ffs_truncate+0x1080: stw r11, 0xf8(r31) >>=20 >> 1 contains (cpu1 instead of cpu0, so different tid): >>=20 >> fatal kernel trap: >> exception =3D 0x903a64e (unknown) >> srr0 =3D 0x7ff760 >> srr1 =3D 0xc1007c >> lr =3D 0x907f >> curthread =3D 0x147d360 >> pid =3D 11, comm =3D idle: cpu1 >> [ thread pid 11 tid 100004 ] >> Stopped at ffs_truncate+0x1080: stw r11, 0xf8(r31) >>=20 >> 1 contains: >=20 > I've discovered where to find the trapframe > in the vmcore.* files for these specific > examples with 0x903a64e as the exception > and such. >=20 > In the vmcore the memory image starts at > byte offset 0x1000. >=20 > To see the values reported the only > place in the image file to start that > produces those values at the offsets > for in side the powerpc trapframe is: >=20 > offset 0x1001 in the vmcore.* file. >=20 > So memory address 0x1 is being used > as the trapframe address when that > odd exception information is being > displayed. Yep: misaligned. >=20 > The decoding is not of the actual > trapframe: it is garbage that is > not to be believed. >=20 >=20 > Note: I lucked out after the above and > got a somewhat different odd trap information > that lead to actually getting a backtrace > that included the actual pid 11 cpu 1 kernel > thread stack bt associated with that odd > information display. Typo: That should have been "cpu 2". > I'll send a separate reply for that information > as it will take some transcribing from camera > pictures and such. As indicated, I got a different odd trap report that gave a backtrace. . . fatal user trap exception =3D 0x4210000 (unknown) srr0 =3D 0xc1007c09 srr1 =3D 0x3a64e80 lr =3D 0xc0807fc9 curthread =3D 0x147d000 pid =3D 11, comm =3D idle: cpu 2 Now at this point it attempted to db_print_loc_and_inst and got another exception (at offset +0x60 in the routine). So the backtrace has both the consequences of that and what lead up to that: an EXI trap was attempting to report trap frame information but was using a bad address for the supposed frame. The details of the backtrace: panic: data storage interrupt trap cpuid =3D 2 time =3D 145187154 KDB: stack backtrace 0xdf5ef2c0: at kdb_backtrace+0x5c 0xdf5ef3a0: at panic+0x54 0xdf5ef3f0: at trap_fatal+0x1cc 0xdf5ef420: at powerpc_interrupt+0x180 0xdf5ef5c0: kernel DSI read trap @ 0xc1007c09 by db_disasm+0x30: srr1=3D0x1032 r1 =3D0xdf5ef6b0 cr =3D0x24009022 xer =3D0 ctr =3D0x1852cc sr =3D0x40000000 0xdf5ef6b0: at 0x1007480 0xdf5ef6d0: at db_print_loc_and_inst+0x60 0xdf5ef700: at db_trap+0x104 0xdf5ef790: at kdb_trap+0x1bc 0xdf5ef810: at trap_fatal+0x1b0 0xdf5ef840: at trap+0x1184 0xdf5ef870: kernel EXI trap by cpu_idle_60x+0x88: srr1=3D0x1032 r1 =3D0xdf5ef930 cr =3D0x40000042 xer =3D0x20000000 ctr =3D0x8e3bd8 saved LR(0x2) is invalid. So an EXI trap was attempting to report a trap frame. (Note: the LR's for pid 11 cpu threads normally report an invalid LR in ddb.) The actual EXI trapframe starts at 013f0878 in vmcore.5: 013f0870 df 5e f9 30 00 10 08 f8 00 04 90 32 df 5e f9 30 = |.^.0.......2.^.0| 013f0880 01 47 d0 00 00 00 00 00 25 94 48 3f 00 00 00 00 = |.G......%.H?....| 013f0890 25 94 48 3f 00 4a a9 c8 00 00 00 00 00 00 00 44 = |%.H?.J.........D| 013f08a0 01 fc a0 55 00 00 90 32 df 5d 1d 00 00 00 00 00 = |...U...2.]......| 013f08b0 00 d4 bd ec 00 cb 98 98 00 c9 66 bc 00 c4 5d 08 = |..........f...].| 013f08c0 00 c9 66 bc 00 d4 c5 3c df 5e f9 e0 00 eb a7 80 = |..f....<.^......| 013f08d0 00 c9 66 bc 01 47 d0 00 df 5e f9 8c 00 00 00 06 = |..f..G...^......| 013f08e0 00 00 00 06 00 eb b5 80 00 00 00 00 00 8e 3b d8 = |..............;.| 013f08f0 00 d2 6b f0 df 5e f9 30 00 8e 3b f4 40 00 00 42 = |..k..^.0..;.@..B| 013f0900 20 00 00 00 00 8e 3b d8 00 8e 3c 60 00 00 90 32 | = .....;...<`...2| 013f0910 00 00 05 00 41 a1 d5 d4 42 00 00 00 00 00 00 00 = |....A...B.......| So: r0 =3D 0x00049032 r1 =3D 0xdf5ef930 r2 =3D 0x0147d000 r3 =3D 0x00000000 r4 =3D 0x2594483f r5 =3D 0x00000000 r6 =3D 0x2594483f r7 =3D 0x004aa9c8 r8 =3D 0x00000000 r9 =3D 0x00000044 r10 =3D 0x01fca055 r11 =3D 0x00009032 r12 =3D 0xdf5d1d00 r13 =3D 0x00000000 r14 =3D 0x00d4bdec r15 =3D 0x00cb9898 r16 =3D 0x00c966bc r17 =3D 0x00c45d08 r18 =3D 0x00c966bc r19 =3D 0x00d4c53c r20 =3D 0xdf5ef9e0 r21 =3D 0x00eba780 r22 =3D 0x00c966bc r23 =3D 0x1047d000 r24 =3D 0xdf5ef98c r25 =3D 0x00000006 (this value shows up later in a bad spot) r26 =3D 0x00000006 (this value shows up next to that) r27 =3D 0x00ebb580 r28 =3D 0x00000000 r29 =3D 0x008e3bd8 r30 =3D 0x00d26bf0 r31 =3D 0xdf5ef930 lr =3D 0x008e3bf4 cr =3D 0x40000042 xer =3D 0x20000000 ctr =3D 0x008e3bd8 srr0 =3D 0x008e3c60 srr1 =3D 0x00009032 exc =3D 0x00000500 dar =3D 0x41a1d5d4 dsisr =3D 0x42000000 Other elements of the stack leading to this are: 013f0920 45 8b a7 b5 b1 fd 96 be 00 00 00 00 00 00 00 04 = |E...............| 013f0930 df 5e f9 50 00 00 00 06 00 00 00 06 00 eb b5 80 = |.^.P............| (odd lr value; value repeats above, not ; here. r25/r26 a multiple ; pair saved of 4 even. ; in those? matches r25 ; matches r26 in trapframe; in trapframe) 013f0940 00 00 00 00 00 d4 ca 34 00 d2 6b f0 df 5e f9 50 = |.......4..k..^.P| 013f0950 df 5e f9 70 00 8e 31 7c 00 00 00 02 00 eb b5 80 = |.^.p..1|........| 013f0960 00 f2 d5 fc 00 00 00 06 00 d1 ca ac df 5e f9 70 = |.............^.p| 013f0970 df 5e fa 50 00 53 6e 58 df 5e f9 80 00 00 00 00 = |.^.P.SnX.^......| . . . 013f0a50 df 5e fa 80 00 4a 3c b4 df 5e fa 60 fa 50 05 af = |.^...J<..^.`.P..| 013f0a60 df 5e fa 80 00 00 00 00 00 00 00 00 00 00 00 00 = |.^..............| 013f0a70 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 = |................| 013f0a80 00 00 00 00 00 8f 18 90 00 53 69 84 00 00 00 00 = |.........Si.....| =46rom which I get from the stack backpointer/lr-value pairs(via objdump on the kernel): 00000006: ???????????????????? 008e317c: cpu_idle+0x58 00536e58: scheduletd+0x4d4 004a3cb4: fork_exit+0xf8 008f1890: fork_trampoline+0x10 But cpu_idle+0x58 is the bl ,cpu_activeclock> in the below and the prior bctrl after the bl is what made the call. 008e3124 stwu r1,-32(r1) 008e3128 mflr r0 008e312c stw r29,20(r1) 008e3130 stw r30,24(r1) 008e3134 stw r31,28(r1) 008e3138 stw r0,36(r1) 008e313c mr r31,r1 008e3140 bcl- 20,4*cr7+so,008e3144 008e3144 mflr r30 008e3148 lwz r0,-36(r30) 008e314c add r30,r0,r30 008e3150 lwz r29,-32768(r30) 008e3154 lwz r0,0(r29) 008e3158 cmpwi cr7,r0,0 008e315c beq- cr7,008e3198 008e3160 cmpwi cr7,r3,0 008e3164 bne- cr7,008e3188 008e3168 bl 005002a4 008e316c bl 008ad894 008e3170 lwz r29,0(r29) 008e3174 mtctr r29 008e3178 bctrl 008e317c bl 008ad794 But ctr was reported as: 0x8e3bd8 which is cpu_idle_60x. (So no surprises so far.) The code through the reported cpu_idle_60x+0x88 is: 008e3bd8 stwu r1,-32(r1) 008e3bdc mflr r0 008e3be0 stw r30,24(r1) 008e3be4 stw r31,28(r1) 008e3be8 stw r0,36(r1) 008e3bec mr r31,r1 008e3bf0 bcl- 20,4*cr7+so,008e3bf4 = 008e3bf4 mflr r30 008e3bf8 lwz r0,-32(r30) 008e3bfc add r30,r0,r30 008e3c00 lwz r9,-32756(r30) 008e3c04 lwz r0,0(r9) 008e3c08 cmpwi cr7,r0,0 008e3c0c beq- cr7,008e3c78 008e3c10 mfmsr r11 008e3c14 mfsprg r0,7 008e3c18 rlwinm r9,r0,16,16,31 008e3c1c cmpwi cr7,r9,68 008e3c20 beq- cr7,008e3c4c 008e3c24 cmplwi cr7,r9,68 008e3c28 bgt- cr7,008e3c40 008e3c2c cmpwi cr7,r9,57 008e3c30 beq- cr7,008e3c4c 008e3c34 cmpwi cr7,r9,60 008e3c38 bne+ cr7,008e3c64 008e3c3c b 008e3c4c 008e3c40 addi r0,r9,-32768 008e3c44 cmplwi cr7,r0,4 008e3c48 bgt- cr7,008e3c64 008e3c4c oris r0,r11,4 008e3c50 dssall 008e3c54 sync =20 008e3c58 mtmsr r0 008e3c5c isync 008e3c60 b 008e3c78 So that is the context for the EXI trap. The "mtsmr r0" merged in PSL_POW to the msr value when it was originally not set. (r11 vs. r0 value comparison in the trapframe.) Why it would be going from without POW for so long and then merging in POW I do not know. (Or may be I just did not find code that turns POW back off on occasion.) Still setting POW in msr seems to be what is unique to the failure context. Mkes me wonder if smu_doorbell_intr and its irq is somehow involved. There is no obvious, local tie to r25 and r26 in/for the spots identified earlier for the 013f0930 line of the thread stack area. Interestingly when I looked up POW in mtsmsr use what I found reported: Synchronization Required Prior: implementation dependent Synchronization Required After: implementation dependent The source code looks like: static void cpu_idle_60x(sbintime_t sbt) { register_t msr; uint16_t vers; if (!powerpc_pow_enabled) return; msr =3D mfmsr(); vers =3D mfpvr() >> 16; #ifdef AIM switch (vers) { case IBM970: case IBM970FX: case IBM970MP: case MPC7447A: case MPC7448: case MPC7450: case MPC7455: case MPC7457: __asm __volatile("\ dssall; sync; mtmsr %0; isync" :: "r"(msr | PSL_POW)); break; default: powerpc_sync(); mtmsr(msr | PSL_POW); isync(); break; } #endif } I've not yet figured out how to check that the details are right here for IBM970MP used via 32-bit FreeBSD. =3D=3D=3D Mark Millard markmi at dsl-only.net