From owner-freebsd-ppc@FreeBSD.ORG Tue Nov 12 21:51:54 2013 Return-Path: Delivered-To: freebsd-ppc@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 28636386 for ; Tue, 12 Nov 2013 21:51:54 +0000 (UTC) Received: from mail-bk0-x22d.google.com (mail-bk0-x22d.google.com [IPv6:2a00:1450:4008:c01::22d]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id A27C12A4B for ; Tue, 12 Nov 2013 21:51:53 +0000 (UTC) Received: by mail-bk0-f45.google.com with SMTP id r7so2529854bkg.18 for ; Tue, 12 Nov 2013 13:51:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=wZC+gPS5yfOR+qtXtl/PR/HUQszRtNkgghpjabppEtI=; b=Pcnb9GUeTLqc6/6YJIJfTEXewOCI5p7j/4uesNY73Nu3DJupMa5dBVyJXC04+RZp6I AFF3G7r0rgPPXFrWJkT2DC9Yu3BkHqbpZM7yOx9TaSHNCPUyPZu/XhYvWzE4KYy9DDqB 2Usry28Ohq4HeRAyQNXZ98uVJK9hG282c5/vdS6YmeORUSj6EiiUvdqR5AGKlRewkaef rQoOE/6vmIyIjMPC0UO83r1HQQkuRQrV/uDw6yT1eRB9hfQPW+sIBeiPmRw5tbqH5b1j WeK/T7WfU83i+PW4CPtnA/881p/evXXta8C5+VjJqJK5Nn/fU05Mt3ZlvC2EKUeC3BHo yxbw== MIME-Version: 1.0 X-Received: by 10.204.167.140 with SMTP id q12mr26516676bky.2.1384293111979; Tue, 12 Nov 2013 13:51:51 -0800 (PST) Sender: chmeeedalf@gmail.com Received: by 10.205.72.198 with HTTP; Tue, 12 Nov 2013 13:51:51 -0800 (PST) Received: by 10.205.72.198 with HTTP; Tue, 12 Nov 2013 13:51:51 -0800 (PST) In-Reply-To: <20131112214655.GZ59496@kib.kiev.ua> References: <20131112205142.GY59496@kib.kiev.ua> <20131112214655.GZ59496@kib.kiev.ua> Date: Tue, 12 Nov 2013 13:51:51 -0800 X-Google-Sender-Auth: -X-TKjdES6zOU8MQ9gXJ-ba94j0 Message-ID: Subject: Re: Strange panic on ppc64 From: Justin Hibbits To: Konstantin Belousov Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.16 Cc: FreeBSD PowerPC ML X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 12 Nov 2013 21:51:54 -0000 On Nov 12, 2013 1:47 PM, "Konstantin Belousov" wrote: > > On Tue, Nov 12, 2013 at 01:13:28PM -0800, Justin Hibbits wrote: > > On Tue, Nov 12, 2013 at 12:51 PM, Konstantin Belousov > > wrote: > > > > > On Tue, Nov 12, 2013 at 08:32:31AM -0800, Justin Hibbits wrote: > > > > The log is attached. I'm not sure what exactly is going on here. The > > > > conditions were: building something on zfs, while also accessing files > > > over > > > > NFS. It seems each of those individually is fine, but doing both it > > > brings > > > > my system down. I _think_ the actual panic message (recursed on > > > > non-recursive mutex) is a red herring, since it already trapped in the > > > > kernel, twice. Any clues? It's 100% reproducible by me. > > > > > > > This does not seems related to NFS or ZFS proper. What happens is > > > that tc_windup() executing in the interupt context decided to enter > > > a debugger. I am not sure why the debugger is entered. > > > > > > Apart from this, the situation is clear: > > > the interrupt happens while the referenced mutex was owned. The debugger > > > is entered, and tries to read a char from keyboard, which is USB. For > > > USB to function, it has to access a lot of the kernel services, in > > > particular, busdma, which, in turn, requires some pmap calls, and you > > > end up accessing the same mutex. > > > > > > The bug there is that code executed from interrupt or debugger context > > > must not lock mutexes, or generally, call into top-half of the kernel > > > (now top half is essentially the whole kernel). I am not sure if > > > USB could ever work in such mode. > > > > > > > I discussed this with Nathan on IRC earlier. You're right that it's not > > related to NFS nor ZFS, at least not directly. It's actually most likely a > > stack overflow, since currently there are only 4 pages for stack, so when > > it takes the DECR trap it ends up blowing the stack. This is only made > > evident because ZFS is very stack hungry. I'm upping the stack to 8 pages, > > and testing tonight. > > > > As for your assessment of the situation, you're spot on, and I have no idea > > how to properly fix it. > > For stack overflow, I would not see the frames I talked about. > The panic clearly states that you get a recursion on mutex, and sleepable > mutex must not be locked from the interrupt or debugger context. Right, it is two issues. It entered the debugger with a stack overflow, then panicked on a mutex recursion. I'm addressing the stack overflow.