From owner-freebsd-current@FreeBSD.ORG Tue Apr 28 20:52:26 2009 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C8DE81065672 for ; Tue, 28 Apr 2009 20:52:26 +0000 (UTC) (envelope-from ben@wanderview.com) Received: from mail.wanderview.com (mail.wanderview.com [66.92.166.102]) by mx1.freebsd.org (Postfix) with ESMTP id 68B388FC15 for ; Tue, 28 Apr 2009 20:52:26 +0000 (UTC) (envelope-from ben@wanderview.com) Received: from harkness.in.wanderview.com (harkness.in.wanderview.com [10.76.10.150]) (authenticated bits=0) by mail.wanderview.com (8.14.3/8.14.3) with ESMTP id n3SKqNYM002413 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO); Tue, 28 Apr 2009 20:52:23 GMT (envelope-from ben@wanderview.com) Message-Id: From: Ben Kelly To: Artem Belevich In-Reply-To: Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v930.3) Date: Tue, 28 Apr 2009 16:52:23 -0400 References: <08D7DC2A-68BE-47B6-8D5D-5DE6B48F87E5@wanderview.com> X-Mailer: Apple Mail (2.930.3) X-Spam-Score: -1.44 () ALL_TRUSTED X-Scanned-By: MIMEDefang 2.64 on 10.76.20.1 Cc: freebsd-current@freebsd.org Subject: Re: [patch] zfs livelock and thread priorities X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Apr 2009 20:52:27 -0000 On Apr 28, 2009, at 2:11 PM, Artem Belevich wrote: > My system had eventually deadlocked overnight, though it took much > longer than before to reach that point. > > In the end I've got many many processes sleeping in zio_wait with no > disk activity whatsoever. > I'm not sure if that's the same issue or not. > > Here are stack traces for all processes -- http://pastebin.com/f364e1452 > I've got the core saved, so if you want me to dig out some more info, > let me know if/how I could help. It looks like there is a possible deadlock between zfs_zget() and zfs_zinactive(). They both acquire a lock via ZFS_OBJ_HOLD_ENTER(). The zfs_zinactive() path can get called indirectly from within zio_done(). The zfs_zget() can in turn block waiting for zio_done()'s completion while holding the object lock. The following patch might help: http://www.wanderview.com/svn/public/misc/zfs/zfs_zinactive_deadlock.diff This simply bails out of the inactive processing if the object lock is already held. I'm not sure if this is 100% correct or not as it cannot verify there are references to the vnode. I also tried executing the zfs_zinactive() logic in a taskqueue to avoid the deadlock, but that caused other deadlocks to occur. Hope that helps. - Ben