From owner-freebsd-current@FreeBSD.ORG Wed Apr 29 08:44:41 2009 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8E69A106566B for ; Wed, 29 Apr 2009 08:44:41 +0000 (UTC) (envelope-from alexander@leidinger.net) Received: from redbull.bpaserver.net (redbullneu.bpaserver.net [213.198.78.217]) by mx1.freebsd.org (Postfix) with ESMTP id 339248FC21 for ; Wed, 29 Apr 2009 08:44:40 +0000 (UTC) (envelope-from alexander@leidinger.net) Received: from outgoing.leidinger.net (pD9E2E031.dip.t-dialin.net [217.226.224.49]) by redbull.bpaserver.net (Postfix) with ESMTP id A31D82E1FB; Wed, 29 Apr 2009 10:44:36 +0200 (CEST) Received: from webmail.leidinger.net (webmail.leidinger.net [192.168.1.102]) by outgoing.leidinger.net (Postfix) with ESMTP id 35EAE1335E7; Wed, 29 Apr 2009 10:44:29 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=Leidinger.net; s=outgoing-alex; t=1240994670; bh=LWSrEnaHO/6YvQzzeBjvo9vbJA3x9F6+6 ILoDfZ+r+M=; h=Message-ID:Date:From:To:Cc:Subject:References: In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=kCmJE2udZJMUmuCrlrP4PDAPT36gi7o2OSV1W597PIw4kZppDeFW7tMjOD4OCoyJP QTQUfliTCeYFMRTdM3m4ahI2VaaQ2Sh6LzRFqSnxUmevMtFbyd8vd9j1i/VZbNyZbTT tWKoTglvNT7+wewlU5cFtzTxPqzJy/AavE+v9UgrBN4b/dmONefUA82aTFcjYV2PCjD FuvTBpkC4SvNOURKMkfZWZRyMmwP3fAdVScr5b9TMfLfXB2VRsUP8NlBg6J73/9Os61 u+kXGHgOfO0m3/PVn7TUPfRWH5JloM7xEi8UabCC+r0ze4icKXKJYfY+wfPXBNDRIQB BSbHvMU9w== Received: (from www@localhost) by webmail.leidinger.net (8.14.3/8.13.8/Submit) id n3T8iQM1028000; Wed, 29 Apr 2009 10:44:26 +0200 (CEST) (envelope-from Alexander@Leidinger.net) Received: from pslux.cec.eu.int (pslux.cec.eu.int [158.169.9.14]) by webmail.leidinger.net (Horde Framework) with HTTP; Wed, 29 Apr 2009 10:44:26 +0200 Message-ID: <20090429104426.153917n5occcc5m0@webmail.leidinger.net> X-Priority: 3 (Normal) Date: Wed, 29 Apr 2009 10:44:26 +0200 From: Alexander Leidinger To: Ben Kelly References: <08D7DC2A-68BE-47B6-8D5D-5DE6B48F87E5@wanderview.com> <4D8E4457-89AA-4F19-9960-E090D3B8E319@wanderview.com> In-Reply-To: <4D8E4457-89AA-4F19-9960-E090D3B8E319@wanderview.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; DelSp="Yes"; format="flowed" Content-Disposition: inline Content-Transfer-Encoding: 7bit User-Agent: Internet Messaging Program (IMP) 4.3.3 / FreeBSD-8.0 X-BPAnet-MailScanner-Information: Please contact the ISP for more information X-MailScanner-ID: A31D82E1FB.3C4F4 X-BPAnet-MailScanner: Found to be clean X-BPAnet-MailScanner-SpamCheck: not spam, ORDB-RBL, SpamAssassin (not cached, score=-14.746, required 6, BAYES_00 -15.00, DKIM_SIGNED 0.00, DKIM_VERIFIED -0.00, RDNS_DYNAMIC 0.10, TW_SK 0.08, TW_ZF 0.08) X-BPAnet-MailScanner-From: alexander@leidinger.net X-Spam-Status: No Cc: freebsd-current@freebsd.org Subject: Re: [patch] zfs livelock and thread priorities X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Apr 2009 08:44:41 -0000 Quoting Ben Kelly (from Tue, 28 Apr 2009 17:19:29 -0400): > > On Apr 28, 2009, at 4:52 PM, Ben Kelly wrote: > >> On Apr 28, 2009, at 2:11 PM, Artem Belevich wrote: >>> My system had eventually deadlocked overnight, though it took much >>> longer than before to reach that point. >>> >>> In the end I've got many many processes sleeping in zio_wait with no >>> disk activity whatsoever. >>> I'm not sure if that's the same issue or not. >>> >>> Here are stack traces for all processes -- http://pastebin.com/f364e1452 >>> I've got the core saved, so if you want me to dig out some more info, >>> let me know if/how I could help. >> >> It looks like there is a possible deadlock between zfs_zget() and >> zfs_zinactive(). They both acquire a lock via >> ZFS_OBJ_HOLD_ENTER(). The zfs_zinactive() path can get called >> indirectly from within zio_done(). The zfs_zget() can in turn >> block waiting for zio_done()'s completion while holding the object >> lock. >> >> The following patch might help: >> >> http://www.wanderview.com/svn/public/misc/zfs/zfs_zinactive_deadlock.diff >> >> This simply bails out of the inactive processing if the object lock >> is already held. I'm not sure if this is 100% correct or not as it >> cannot verify there are references to the vnode. I also tried >> executing the zfs_zinactive() logic in a taskqueue to avoid the >> deadlock, but that caused other deadlocks to occur. > > Sorry to reply to my own mail, but I came up with a better solution > that I think is correct. I just vref() the vnode and then vrele() > it again from a taskqueue to restart the zfs_zinactive() processing > if its still applicable. This sounds a little bit related to the issues we discussed in the unlimited arc cache growth thread. Maybe the high value for the arc cache was a red herring and this is the real problem for the panics / watchdog triggers I experience on the system in question. I'm preparing a kernel with this patch and your zfs-prio patch, but I don't think I can fully test it this week. If I'm lucky I can install the new kernel, but I don't think I can put load on the system this week. Bye, Alexander. -- The length of a marriage is inversely proportional to the amount spent on the wedding. http://www.Leidinger.net Alexander @ Leidinger.net: PGP ID = B0063FE7 http://www.FreeBSD.org netchild @ FreeBSD.org : PGP ID = 72077137