From owner-freebsd-ppc@FreeBSD.ORG Thu Jun 13 17:44:36 2013 Return-Path: Delivered-To: freebsd-ppc@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id EC4D4491; Thu, 13 Jun 2013 17:44:36 +0000 (UTC) (envelope-from chmeeedalf@gmail.com) Received: from mail-bk0-x233.google.com (mail-bk0-x233.google.com [IPv6:2a00:1450:4008:c01::233]) by mx1.freebsd.org (Postfix) with ESMTP id 5192D1A0F; Thu, 13 Jun 2013 17:44:36 +0000 (UTC) Received: by mail-bk0-f51.google.com with SMTP id ji1so4606780bkc.10 for ; Thu, 13 Jun 2013 10:44:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=8QrAkN1uj5VUzhbvY5ChFWgyNJV6R0faylBRF+ybOwA=; b=Sz30CY2/OTsSAB586MStGTCTp+04mdOVFUMq4qg/nIdE2E6gU5Oavs/4RKAIJWUNd8 4wlT6LCD4/pbHWgqGWhjUQvQtWHG99H7hLhwhXrS4Qu61yKlVW0uoH82zQ9GwSjYrGuS CpZvQ6qMpYouQsf5tnexFE5n9osWVvuqQhkfiUKWidQFJ2eex/KuOSyub6DM1cLTbHJV 8AUDndpVirExHX2/F10FbE5yXtMqnYAjIXEcKBRpdG5XV2Zs0Oz83OTlTku7ZQObeQqc 6oYxHzBIVsuGl9WJqIlzpspaHzrvkAl43aEYvlwHjaACuZtdwRhbd8MLEfC+0/BrBWWk JFNg== MIME-Version: 1.0 X-Received: by 10.204.233.137 with SMTP id jy9mr325409bkb.29.1371145475297; Thu, 13 Jun 2013 10:44:35 -0700 (PDT) Sender: chmeeedalf@gmail.com Received: by 10.204.236.132 with HTTP; Thu, 13 Jun 2013 10:44:35 -0700 (PDT) In-Reply-To: References: <51AF6661.3060007@freebsd.org> <51B345BE.5030905@freebsd.org> <51B4A389.4020607@freebsd.org> <51B5D28C.505@freebsd.org> <51B5D539.8050102@freebsd.org> Date: Thu, 13 Jun 2013 10:44:35 -0700 X-Google-Sender-Auth: zPe7Mh4NP4zJyB082cxeIlF4orM Message-ID: Subject: Re: Strange panic on ppc64 From: Justin Hibbits To: Adam Martin Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: FreeBSD PowerPC ML X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 13 Jun 2013 17:44:37 -0000 On Thu, Jun 13, 2013 at 10:14 AM, Adam Martin wrote: > Could the way it enters OFW have a race, maybe? Prints would put enough > of a delay to ameliorate it. > > You are on SMP, and maybe the ofw entry code was only thought of as single > core? > > Or maybe I'm off the mark as usual. :-P > > -- > A > On Jun 12, 2013 11:31 PM, "Justin Hibbits" wrote: > >> On Mon, Jun 10, 2013 at 9:20 PM, Justin Hibbits > >wrote: >> >> > On Mon, Jun 10, 2013 at 6:31 AM, Nathan Whitehorn < >> nwhitehorn@freebsd.org>wrote: >> > >> >> On 06/10/13 08:20, Nathan Whitehorn wrote: >> >> > This is now getting interesting. Reading the tea leaves, what has >> >> > happened is that the kernel has called into Open Firmware. Open >> Firmware >> >> > has then crashed early on, before setting up its own trap handlers, >> >> > which has then flung you back into FreeBSD's handlers with a totally >> >> > bogus environment, causing a second panic, which then causes a >> *third* >> >> > panic when trying to acquire a lock. It would be interesting to know >> >> > what the OF environment looked like and what commands it was trying >> to >> >> > execute (in r3), but that may be tricky to get... >> >> > -Nathan >> >> > _______________________________________________ >> >> >> >> One other point: you can trace this pretty easily by just putting >> >> something like: >> >> >> >> if (pmap_bootstrapped) printf("Open Firmware call %p\n", args); >> >> >> >> in the top of openfirmware(). If I understood the debugger output >> >> correctly, something should be making a firmware call immediately >> before >> >> the crash. >> >> >> >> As a random guess about what is happening, it is possible OF is trying >> >> to allocate memory for itself. We just ignore the possibility that it >> >> might want to do that at present, but that is not necessarily a good >> >> assumption. >> >> -Nathan >> >> >> > >> > I added that, both on entry and exit. I also have it printing out the >> > name of the ofw call, since the first item is always a pointer to the >> > name. I'll be able to report more tomorrow. >> > >> > - Justin >> > >> >> Since putting those printf()s in, my machine's been up for close to 48 >> hours without a hitch, and I've done a buildworld, plus a whole package >> repo rebuild for my G4 (900 packages), 4 concurrent jobs (load average >> regularly over 6). I did see a ton of OF getprop calls for the first 12 >> hours of being up, but none since. >> >> I'll try some multiple concurrent buildworlds after poudriere finishes. >> This is very odd. >> >> - Justin >> > I'm thinking that may be the case. When I get back from my trip I'll replace the printf()s with sync, and try exercising it again. Oh, and I was wrong about no OFW 'getprop' calls since 12 hours in. During the concurrent buildworlds, I saw a bunch, only while building ports with poudriere did I see none, which is really bizarre. I may also add a sysctl to enable printing stack traces on entering OF, since the only call made since going multiuser is getprop, it'd be nice to know what keeps getting properties. - Justin