From owner-freebsd-ppc@FreeBSD.ORG Tue Jun 11 03:33:11 2013 Return-Path: Delivered-To: freebsd-ppc@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 9944D418; Tue, 11 Jun 2013 03:33:11 +0000 (UTC) (envelope-from superbisquit@gmail.com) Received: from mail-ob0-x230.google.com (mail-ob0-x230.google.com [IPv6:2607:f8b0:4003:c01::230]) by mx1.freebsd.org (Postfix) with ESMTP id 4ACA11891; Tue, 11 Jun 2013 03:33:11 +0000 (UTC) Received: by mail-ob0-f176.google.com with SMTP id v19so11107657obq.21 for ; Mon, 10 Jun 2013 20:33:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=wY878D0FRfqzjnaYoSFnCSPgUiOseFDOjw7zPWs+Jq0=; b=sVqSGuw6j3k8qbEirBV72WVLh/pux307RzwTdS3IWShDy9uC+RJU59onAwKlKcvRK5 TrbpdyHpzFpxuEuhc5/8CQ7U+FctqkK+Mdk7gO44odLeoLRKeK4C3/OemYr7jbBrI1cv 84l8X7LIFOtovSrtyBMxYNvjJze37TWVOuqsygJphEvYrsJc7m8xJ5hcJRHq50j40evq qdTnKEdWorZmELssSGGT0IHFCUZVFu1rljT/szkgs77I2b5P6IS4dnFXAzxaXhJIhfW4 Re4ljYw3Qi7NXOa4TtplurJXB6TY2jxdEowHirrQLmFM6JDL6zuem5sMbhhyDoGxH6H9 LixA== MIME-Version: 1.0 X-Received: by 10.182.65.100 with SMTP id w4mr10378973obs.70.1370921590898; Mon, 10 Jun 2013 20:33:10 -0700 (PDT) Received: by 10.182.53.231 with HTTP; Mon, 10 Jun 2013 20:33:10 -0700 (PDT) In-Reply-To: <51B5D539.8050102@freebsd.org> References: <51AF6661.3060007@freebsd.org> <51B345BE.5030905@freebsd.org> <51B4A389.4020607@freebsd.org> <51B5D28C.505@freebsd.org> <51B5D539.8050102@freebsd.org> Date: Mon, 10 Jun 2013 23:33:10 -0400 Message-ID: Subject: Re: Strange panic on ppc64 From: Super Bisquit To: Nathan Whitehorn Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: Justin Hibbits , FreeBSD PowerPC ML X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 11 Jun 2013 03:33:11 -0000 1. Is the Open Firmware located on the CPU itself or is it part of the logic board? 2. Would he be able to flash a newer version from a remote machine? The answers to these two questions may help myself and others in the near future. Thanks, Desmond On Mon, Jun 10, 2013 at 9:31 AM, Nathan Whitehorn wrote: > On 06/10/13 08:20, Nathan Whitehorn wrote: > > On 06/09/13 16:21, Justin Hibbits wrote: > >> On Sun, Jun 9, 2013 at 8:47 AM, Nathan Whitehorn > >> > wrote: > >> > >> On 06/08/13 17:33, Justin Hibbits wrote: > >>> > >>> > >>> On Sat, Jun 8, 2013 at 7:54 AM, Nathan Whitehorn > >>> > wrote: > >>> > >>> On 06/08/13 09:21, Justin Hibbits wrote: > >>>> > >>>> > >>>> On Wed, Jun 5, 2013 at 9:47 AM, Justin Hibbits > >>>> > wrote: > >>>> > >>>> Will do, when I get it panicking again. > >>>> > >>>> - Justin > >>>> > >>>> On Jun 5, 2013 9:46 AM, "Nathan Whitehorn" > >>>> > > >>>> wrote: > >>>> > >>>> On 06/04/13 22:35, Justin Hibbits wrote: > >>>> > >>>> After a string of seemingly random hangs, I > >>>> added invariants (but not > >>>> witness) to my custom kernel config, and I get > >>>> the following panic, > >>>> recreated from a fuzzy cell phone picture: > >>>> > >>>> > >>>> [thread pid -1 tid 1006665719 ] > >>>> Stopped at 0: illegal instruction 0 > >>>> db> panic: mutex ohci1 owned at > >>>> > /usr/home/chmeee/freebsd/head/sys/dev/usb/usb_transfer.c:2280 > >>>> cpuid = 0 > >>>> Uptime: 9h8m1s > >>>> > >>>> ... > >>>> panic: msleep1 > >>>> cpu = 0 > >>>> KDB: enter: panic > >>>> [ thread pid -1 tid 100665719 ] > >>>> .... > >>>> > >>>> The first question I have is how the hell it got > >>>> such a strange PID/TID, > >>>> memory corruption my guess, something is > >>>> stomping on the pcpu or something, > >>>> and I think these hangs have only happened since > >>>> I added a lot more memory > >>>> (up to 12G from 4G, Andreas Tobler was seeing > >>>> hangs as well), so it might > >>>> be something in the moea64 pmap code, but that's > >>>> pure speculation on my > >>>> part. Then the other panic messages, owned > >>>> mutex and panic in msleep1. I > >>>> enabled more trace code, so hopefully the next > >>>> time it panics I can collect > >>>> better data. > >>>> > >>>> - Justin > >>>> _______________________________________________ > >>>> freebsd-ppc@freebsd.org > >>>> mailing list > >>>> > http://lists.freebsd.org/mailman/listinfo/freebsd-ppc > >>>> To unsubscribe, send any mail to > >>>> "freebsd-ppc-unsubscribe@freebsd.org > >>>> " > >>>> > >>>> > >>>> Could you post the output from show reg? It looks > >>>> like it tried to jump to a null pointer there. > >>>> -Nathan > >>>> > >>>> > >>>> Well, it's hard to do get that output, because I just hit > >>>> that 'mutex owned' panic, and here's the backtrace: > >>> > >>> The mutex thing is spurious -- it was already panicing and > >>> then paniced again trying to panic. Can you get the backtrace > >>> for the original panic (it should be different) and the > >>> values of the registers? > >>> -Nathan > >>> > >>> > >>> Here you go: > >>> > >>> [ thread pid -1 tid 1006665719 ] > >>> Stopped at 0: illegal instruction 0 > >>> db:0:kdb.enter.default> show reg > >>> r0 0 > >>> r1 0 > >>> r2 0xab63d0 M_MACTEMP > >>> r3 0xbb12e0 > >>> r4 0x741f18 .ofwcall+0xa8 > >>> r5 0 > >>> r6 0xa4f1a8 > >>> r7 0x1 > >>> r8 0x1 > >>> r9 0xc10500 __pcpu > >>> r10 0x1c35ec0 > >>> r11 0 > >>> r12 0x2000d032 > >>> r13 0x342eb000 > >>> r14 0x10014200 > >>> r15 0xffffffffffffcb58 > >>> r16 0x2 > >>> r17 0x2 > >>> r18 0xffffffffffffcb50 > >>> r19 0 > >>> r20 0xc000000013231478 > >>> r21 0xc00000014c0ce200 > >>> r22 0 > >>> r23 0x64 dbsize+0x10 > >>> r24 0xc00000014c0cdf70 > >>> r25 0xb62cb8 smp_no_rendevous_barrier > >>> r26 0 > >>> r27 0x741f18 .ofwcall+0xa8 > >>> r28 0x741f18 .ofwcall+0xa8 > >>> r29 0x2000d032 > >>> r30 0x9000000000001032 > >>> r31 0xc0cad8 mac_labeled > >>> srr0 0x102ca4 k_trap+0x28 > >>> srr1 0x9000000000001032 > >>> lr 0x102c74 u_trap+0x10 > >>> ctr 0xff846d78 > >>> cr 0x2000f1b0 > >>> xer 0 > >>> dar 0xfffffffffffffd60 > >>> dsisr 0x42000000 > >>> 0: illegal instruction 0 > >>> db:0:kdb.enter.default> bt > >>> Tracing pid -1 tid 1006665719 td 0 > >>> (nothing) > >> Well, that is all kinds of messed up. It appears to have halted > >> while handling a userland trap due to an implicit branch caused by > >> bad translations when it restores the kernel SRs. Could you see > >> what 'show pcpu' does? Does that information look valid at all? I > >> suspect it has become corrupted somehow. > >> -Nathan > >> > >> > >> Here's the full log from dconschat, from bootup to panic. > >> Unfortunately, not everything I wanted to print would print, and I > >> can't type anything once it panics, because it panics when reading the > >> keyboard, so I have to add everything as a ddb enter script. Here's > >> what I've added so far (doesn't do everything as you can see from the > >> transcript): > >> > >> script kdb.enter.default=show reg; bt; show pcpu; ps; run > >> lockinfo; alltrace; show all procs; show files; show malloc; show > >> allchains > >> > >> - Justin > > This is now getting interesting. Reading the tea leaves, what has > > happened is that the kernel has called into Open Firmware. Open Firmware > > has then crashed early on, before setting up its own trap handlers, > > which has then flung you back into FreeBSD's handlers with a totally > > bogus environment, causing a second panic, which then causes a *third* > > panic when trying to acquire a lock. It would be interesting to know > > what the OF environment looked like and what commands it was trying to > > execute (in r3), but that may be tricky to get... > > -Nathan > > _______________________________________________ > > One other point: you can trace this pretty easily by just putting > something like: > > if (pmap_bootstrapped) printf("Open Firmware call %p\n", args); > > in the top of openfirmware(). If I understood the debugger output > correctly, something should be making a firmware call immediately before > the crash. > > As a random guess about what is happening, it is possible OF is trying > to allocate memory for itself. We just ignore the possibility that it > might want to do that at present, but that is not necessarily a good > assumption. > -Nathan > _______________________________________________ > freebsd-ppc@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-ppc > To unsubscribe, send any mail to "freebsd-ppc-unsubscribe@freebsd.org" >