Date: Sun, 11 Jun 2017 16:02:53 -0700 From: Mark Millard <markmi@dsl-only.net> To: Justin Hibbits <jhibbits@FreeBSD.org>, Nathan Whitehorn <nwhitehorn@freebsd.org>, FreeBSD PowerPC ML <freebsd-ppc@freebsd.org>, freebsd-hackers@freebsd.org Subject: Re: A different 32-bit powerpc head -r317820 panic on old PowerMac G5: dual backtraces from "timeout stopping cpus" (dump failed though): any comments? Message-ID: <29CCA1EC-242D-42E7-97E9-6F2F67178DF3@dsl-only.net> In-Reply-To: <1F1E52BD-375E-47CC-BF06-ECB1092121B4@dsl-only.net> References: <D69CB244-69E2-4319-BD63-07BC7F763279@dsl-only.net> <1F1E52BD-375E-47CC-BF06-ECB1092121B4@dsl-only.net>
next in thread | previous in thread | raw e-mail | index | archive | help
On 2017-Jun-6, at 11:09 AM, Mark Millard <markmi@dsl-only.net> wrote: > . . . > FYI: I'm currently doing an approximate > binary search for localizing part of the panic problem. This effort failed. More after the reminder of the technique as it was when I started to try this. > This is based on the classic panics that are instead > from jumping to a non-code area. . . > > At a given point in my other experiments I was > getting: > > srr0=0x90a0f0 etext+0xb8fc > > Adding (unused) code somewhat before that etext > (so increasing etext) got: > > srr0=0x90a0f0 etext+0xb8a8 > (The additional code was larger than I now use.) > > But instead adding some code earlier (by around > 0x100000 in this example) got: > > srr0=0x90a110 etext+0xb8fc > > So comparing to the starting conditions in > each case: > > The bad-address accessed in one case stayed > constant but the etext offset decreased: in essence > the only thing that happened is etext increased > (matching the offset decrease). > > In the other case the etext offset stayed constant > but the bad-address and etext increased by the > same amount. > > . . . > > Currently I'm adding code by adding: > > void HACKISH_EXTRA_CODE(void) {} > > to one .c file from /usr/src/sys/. . . based which > file gets to within a ballpark of a more accurate > binary search position. (Large binary search > jumps currently: I'm not being picky about where > in the .c the addition is made yet.) The reason for the failure is that the behavioral changes and failure modes changed depending where HACKISH_EXTRA_CODE was added (over a very wide span of addresses for where the code was tried). Overall I was unable to have a criteria for picking between larger addresses and smaller addresses in the search in a way that targeted getting near a boundary having two specific, distinct behaviors on each side of the boundary. Also adding code to panic instead of accessing or changing inappropriate memory for failures seen in some failures again changed the behavior observed, no longer accessing or corrupting the same way. So for the binary search I had to revert such extra problem-detection code. Very memory-layout dependent. At this point I'm not hopeful of providing any better evidence than I have in my various prior list messages. I doubt anyone can pick anything out based on just those from the last several weeks. At most if something is noticed the reports might be able to be checked for "would this now identified code-problem have possibly contributed to those reports?". (Even that use seems unlikely.) === Mark Millard markmi at dsl-only.net
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?29CCA1EC-242D-42E7-97E9-6F2F67178DF3>