From owner-freebsd-ppc@FreeBSD.ORG Tue Nov 12 21:47:02 2013 Return-Path: Delivered-To: freebsd-ppc@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 39E6D259; Tue, 12 Nov 2013 21:47:02 +0000 (UTC) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 6E80729CD; Tue, 12 Nov 2013 21:47:01 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.7/8.14.7) with ESMTP id rACLktVR098299; Tue, 12 Nov 2013 23:46:55 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.8.3 kib.kiev.ua rACLktVR098299 Received: (from kostik@localhost) by tom.home (8.14.7/8.14.7/Submit) id rACLktLe098298; Tue, 12 Nov 2013 23:46:55 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Tue, 12 Nov 2013 23:46:55 +0200 From: Konstantin Belousov To: Justin Hibbits Subject: Re: Strange panic on ppc64 Message-ID: <20131112214655.GZ59496@kib.kiev.ua> References: <20131112205142.GY59496@kib.kiev.ua> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="P5QUT6hvaumhd+N6" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.22 (2013-10-16) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home Cc: FreeBSD PowerPC ML X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 12 Nov 2013 21:47:02 -0000 --P5QUT6hvaumhd+N6 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Nov 12, 2013 at 01:13:28PM -0800, Justin Hibbits wrote: > On Tue, Nov 12, 2013 at 12:51 PM, Konstantin Belousov > wrote: >=20 > > On Tue, Nov 12, 2013 at 08:32:31AM -0800, Justin Hibbits wrote: > > > The log is attached. I'm not sure what exactly is going on here. The > > > conditions were: building something on zfs, while also accessing files > > over > > > NFS. It seems each of those individually is fine, but doing both it > > brings > > > my system down. I _think_ the actual panic message (recursed on > > > non-recursive mutex) is a red herring, since it already trapped in the > > > kernel, twice. Any clues? It's 100% reproducible by me. > > > > > This does not seems related to NFS or ZFS proper. What happens is > > that tc_windup() executing in the interupt context decided to enter > > a debugger. I am not sure why the debugger is entered. > > > > Apart from this, the situation is clear: > > the interrupt happens while the referenced mutex was owned. The debugger > > is entered, and tries to read a char from keyboard, which is USB. For > > USB to function, it has to access a lot of the kernel services, in > > particular, busdma, which, in turn, requires some pmap calls, and you > > end up accessing the same mutex. > > > > The bug there is that code executed from interrupt or debugger context > > must not lock mutexes, or generally, call into top-half of the kernel > > (now top half is essentially the whole kernel). I am not sure if > > USB could ever work in such mode. > > >=20 > I discussed this with Nathan on IRC earlier. You're right that it's not > related to NFS nor ZFS, at least not directly. It's actually most likely= a > stack overflow, since currently there are only 4 pages for stack, so when > it takes the DECR trap it ends up blowing the stack. This is only made > evident because ZFS is very stack hungry. I'm upping the stack to 8 page= s, > and testing tonight. >=20 > As for your assessment of the situation, you're spot on, and I have no id= ea > how to properly fix it. For stack overflow, I would not see the frames I talked about. The panic clearly states that you get a recursion on mutex, and sleepable mutex must not be locked from the interrupt or debugger context. --P5QUT6hvaumhd+N6 Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (FreeBSD) iQIcBAEBAgAGBQJSgqHOAAoJEJDCuSvBvK1B7DgP/iSTNazC1PY5ogX6Tc+oV2SO 84QnWaf75ysiGKoyrqbWHFS+ehzsD1p8eWzuihdZ8YcE09T42FPWlMiQs0NkuKkI OkczwpIAQiqkhac5MD8ryVmpSc8PBa03zZgDlgYo2euROxT8HWxlEikKMp6WyL22 xCQCrX0+Ndcgps8OlEMMWI4IqQZBPGMomjT4/idO5Qh4i6acyT43piYSG5B8H2NH V97mAiADZXMZrLpxwklhDoEMrYA7t5EMvwZxm8ErtaNG/G36yxgRwtBhj4Hlry5U xUZH6YyhKSTTbLujfVckMfaB4Muos4g3G2gfqDdNDiwipJMQ8MyAE0ld8EFXaHoh fM9VhV0keWRGwknSH9eGei5z1zsgx/tVETW6lhXDJMevqeK0tx39Vs8Rdgoi1qPY js4Z2VZnOCqWR/y5I3ygH8jvxxyOfHmZd79vhCZdWHIZ44h9SfJDjCv7Ejjad47s NM/zhA4mc+sv9Zlkax5/zphNnLi9q/13TZojBrchptFTrXZ9getHYVkLi/kGRRn1 tpmHfO4djViq8BqzMwxCqyl+FP8rl9Co2jp1aRpu2axYit+G/zG2L8DxVhuH9Q0i 49Kb1rTw9NrH3ufctH1LTDcvVHpYx9WGay9Usjh/3JhhvybVlNAH1pmZURLxjbT3 Atm0m5iiVapi1jdnKoS6 =m8Qh -----END PGP SIGNATURE----- --P5QUT6hvaumhd+N6--