Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 12 Nov 2013 13:13:28 -0800
From:      Justin Hibbits <jhibbits@freebsd.org>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        FreeBSD PowerPC ML <freebsd-ppc@freebsd.org>
Subject:   Re: Strange panic on ppc64
Message-ID:  <CAHSQbTCnFnVBtL%2B9VOn%2B9zNMJqt=cLyKB6AYjDzjHrddZq65ug@mail.gmail.com>
In-Reply-To: <20131112205142.GY59496@kib.kiev.ua>
References:  <CAHSQbTD6%2BDd-So88gSArTtpcA=w4D-GibGpoFLoHQuFPjUrKuA@mail.gmail.com> <20131112205142.GY59496@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Nov 12, 2013 at 12:51 PM, Konstantin Belousov
<kostikbel@gmail.com>wrote:

> On Tue, Nov 12, 2013 at 08:32:31AM -0800, Justin Hibbits wrote:
> > The log is attached.  I'm not sure what exactly is going on here.  The
> > conditions were: building something on zfs, while also accessing files
> over
> > NFS.  It seems each of those individually is fine, but doing both it
> brings
> > my system down.  I _think_ the actual panic message (recursed on
> > non-recursive mutex) is a red herring, since it already trapped in the
> > kernel, twice.  Any clues?  It's 100% reproducible by me.
> >
> This does not seems related to NFS or ZFS proper.  What happens is
> that tc_windup() executing in the interupt context decided to enter
> a debugger.  I am not sure why the debugger is entered.
>
> Apart from this, the situation is clear:
> the interrupt happens while the referenced mutex was owned. The debugger
> is entered, and tries to read a char from keyboard, which is USB. For
> USB to function, it has to access a lot of the kernel services, in
> particular, busdma, which, in turn, requires some pmap calls, and you
> end up accessing the same mutex.
>
> The bug there is that code executed from interrupt or debugger context
> must not lock mutexes, or generally, call into top-half of the kernel
> (now top half is essentially the whole kernel).  I am not sure if
> USB could ever work in such mode.
>

I discussed this with Nathan on IRC earlier.  You're right that it's not
related to NFS nor ZFS, at least not directly.  It's actually most likely a
stack overflow, since currently there are only 4 pages for stack, so when
it takes the DECR trap it ends up blowing the stack.  This is only made
evident because ZFS is very stack hungry.  I'm upping the stack to 8 pages,
and testing tonight.

As for your assessment of the situation, you're spot on, and I have no idea
how to properly fix it.

- Justin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAHSQbTCnFnVBtL%2B9VOn%2B9zNMJqt=cLyKB6AYjDzjHrddZq65ug>