Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 13 Jun 2013 10:44:35 -0700
From:      Justin Hibbits <jhibbits@freebsd.org>
To:        Adam Martin <adamartin@freebsd.org>
Cc:        FreeBSD PowerPC ML <freebsd-ppc@freebsd.org>
Subject:   Re: Strange panic on ppc64
Message-ID:  <CAHSQbTAV19xUasD7v_gw5wcLGoVvefyP5F%2BUbiGeX1=raiVrig@mail.gmail.com>
In-Reply-To: <CAJTQnqYaeBt690G0Nxv8gO1PpmTan3rXbARgTb-s63EEo_LiQQ@mail.gmail.com>
References:  <CAHSQbTAZTc9puGaH0rbhyY11s0%2BL0xGjSabK1kj65UMm1t7j3w@mail.gmail.com> <51AF6661.3060007@freebsd.org> <CAHSQbTBjza0u7nZf4z%2BxpTCcWj-TW-ZigV2-CZexuBOYQX5=3A@mail.gmail.com> <CAHSQbTCvFXDZPsOnmogc0FkZeMXwOP6h40F2kFUu2s6UmffyPw@mail.gmail.com> <51B345BE.5030905@freebsd.org> <CAHSQbTDnwne3KJWN7xjcUw4PhF-uiD4B-4y1Lf90Bfou-2Ppvw@mail.gmail.com> <51B4A389.4020607@freebsd.org> <CAHSQbTACtejaRKiG4qScSV_EdTC8y_k5Qghx_FYebWzstBP61g@mail.gmail.com> <51B5D28C.505@freebsd.org> <51B5D539.8050102@freebsd.org> <CAHSQbTCposTE1AwHS0Ov=FT4w8gNkgpE4x_7-cHhyzMDfZr5UA@mail.gmail.com> <CAHSQbTB6bXpqFM5n8FMmpbbfKik0szDvp9M6KfCWreXKHTaR1g@mail.gmail.com> <CAJTQnqYaeBt690G0Nxv8gO1PpmTan3rXbARgTb-s63EEo_LiQQ@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Jun 13, 2013 at 10:14 AM, Adam Martin <adamartin@freebsd.org> wrote:

> Could the way it enters OFW have a race, maybe?  Prints would put enough
> of a delay to ameliorate it.
>
> You are on SMP, and maybe the ofw entry code was only thought of as single
> core?
>
> Or maybe I'm off the mark as usual.  :-P
>
> --
> A
> On Jun 12, 2013 11:31 PM, "Justin Hibbits" <jhibbits@freebsd.org> wrote:
>
>> On Mon, Jun 10, 2013 at 9:20 PM, Justin Hibbits <jhibbits@freebsd.org
>> >wrote:
>>
>> > On Mon, Jun 10, 2013 at 6:31 AM, Nathan Whitehorn <
>> nwhitehorn@freebsd.org>wrote:
>> >
>> >> On 06/10/13 08:20, Nathan Whitehorn wrote:
>> >> > This is now getting interesting. Reading the tea leaves, what has
>> >> > happened is that the kernel has called into Open Firmware. Open
>> Firmware
>> >> > has then crashed early on, before setting up its own trap handlers,
>> >> > which has then flung you back into FreeBSD's handlers with a totally
>> >> > bogus environment, causing a second panic, which then causes a
>> *third*
>> >> > panic when trying to acquire a lock. It would be interesting to know
>> >> > what the OF environment looked like and what commands it was trying
>> to
>> >> > execute (in r3), but that may be tricky to get...
>> >> > -Nathan
>> >> > _______________________________________________
>> >>
>> >> One other point: you can trace this pretty easily by just putting
>> >> something like:
>> >>
>> >> if (pmap_bootstrapped) printf("Open Firmware call %p\n", args);
>> >>
>> >> in the top of openfirmware(). If I understood the debugger output
>> >> correctly, something should be making a firmware call immediately
>> before
>> >> the crash.
>> >>
>> >> As a random guess about what is happening, it is possible OF is trying
>> >> to allocate memory for itself. We just ignore the possibility that it
>> >> might want to do that at present, but that is not necessarily a good
>> >> assumption.
>> >> -Nathan
>> >>
>> >
>> > I added that, both on entry and exit.  I also have it printing out the
>> > name of the ofw call, since the first item is always a pointer to the
>> > name.  I'll be able to report more tomorrow.
>> >
>> > - Justin
>> >
>>
>> Since putting those printf()s in, my machine's been up for close to 48
>> hours without a hitch, and I've done a buildworld, plus a whole package
>> repo rebuild for my G4 (900 packages), 4 concurrent jobs (load average
>> regularly over 6).  I did see a ton of OF getprop calls for the first 12
>> hours of being up, but none since.
>>
>> I'll try some multiple concurrent buildworlds after poudriere finishes.
>> This is very odd.
>>
>> - Justin
>>
>
I'm thinking that may be the case.  When I get back from my trip I'll
replace the printf()s with sync, and try exercising it again.

Oh, and I was wrong about no OFW 'getprop' calls since 12 hours in.  During
the concurrent buildworlds, I saw a bunch, only while building ports with
poudriere did I see none, which is really bizarre.  I may also add a sysctl
to enable printing stack traces on entering OF, since the only call made
since going multiuser is getprop, it'd be nice to know what keeps getting
properties.

- Justin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAHSQbTAV19xUasD7v_gw5wcLGoVvefyP5F%2BUbiGeX1=raiVrig>