From owner-freebsd-current@FreeBSD.ORG Sat May 16 16:40:50 2009 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B0B91106566C for ; Sat, 16 May 2009 16:40:50 +0000 (UTC) (envelope-from ben@wanderview.com) Received: from mail.wanderview.com (mail.wanderview.com [66.92.166.102]) by mx1.freebsd.org (Postfix) with ESMTP id 297878FC19 for ; Sat, 16 May 2009 16:40:49 +0000 (UTC) (envelope-from ben@wanderview.com) Received: from harkness.in.wanderview.com (harkness.in.wanderview.com [10.76.10.150]) (authenticated bits=0) by mail.wanderview.com (8.14.3/8.14.3) with ESMTP id n4GGej64016950 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO); Sat, 16 May 2009 16:40:45 GMT (envelope-from ben@wanderview.com) Message-Id: <5D988481-068A-4AB3-952E-255BEA1D9DA7@wanderview.com> From: Ben Kelly To: Adam McDougall In-Reply-To: <20090516031332.GG82547@egr.msu.edu> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v930.3) Date: Sat, 16 May 2009 12:40:44 -0400 References: <08D7DC2A-68BE-47B6-8D5D-5DE6B48F87E5@wanderview.com> <20090516031332.GG82547@egr.msu.edu> X-Mailer: Apple Mail (2.930.3) X-Spam-Score: -1.44 () ALL_TRUSTED X-Scanned-By: MIMEDefang 2.64 on 10.76.20.1 Cc: freebsd-current@freebsd.org, Artem Belevich Subject: Re: [patch] zfs livelock and thread priorities X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 16 May 2009 16:40:50 -0000 On May 15, 2009, at 11:13 PM, Adam McDougall wrote: > On Tue, Apr 28, 2009 at 04:52:23PM -0400, Ben Kelly wrote: > On Apr 28, 2009, at 2:11 PM, Artem Belevich wrote: >> My system had eventually deadlocked overnight, though it took much >> longer than before to reach that point. >> >> In the end I've got many many processes sleeping in zio_wait with no >> disk activity whatsoever. >> I'm not sure if that's the same issue or not. >> >> Here are stack traces for all processes -- http://pastebin.com/f364e1452 >> I've got the core saved, so if you want me to dig out some more info, >> let me know if/how I could help. > > It looks like there is a possible deadlock between zfs_zget() and > zfs_zinactive(). They both acquire a lock via ZFS_OBJ_HOLD_ENTER(). > The zfs_zinactive() path can get called indirectly from within > zio_done(). The zfs_zget() can in turn block waiting for > zio_done()'s > completion while holding the object lock. > > The following patch might help: > > http://www.wanderview.com/svn/public/misc/zfs/zfs_zinactive_deadlock.diff > > This simply bails out of the inactive processing if the object lock > is > already held. I'm not sure if this is 100% correct or not as it > cannot verify there are references to the vnode. I also tried > executing the zfs_zinactive() logic in a taskqueue to avoid the > deadlock, but that caused other deadlocks to occur. > > Hope that helps. > > - Ben > > Its my understanding that the deadlock was fixed in -current, > how does that affect the usefulness of the thread priorities > patch? Should I continue testing it or is it effectively a > NOOP now? As far as I know the vnode release deadlock is unrelated to the thread prioritization patch. The particular problem I ran into that caused me to look at the priorities was a livelock. When the arc cache got low on memory sometimes user and txg threads would begin messaging each other in a seemingly infinite pattern waiting for space to be freed. Unfortunately, these threads were simultaneously starving out the spa_zio threads from actually flushing data to the disks. This effectively blocked all disk related activity and would wedge the box when the syncer process got into the mix as well. This condition doesn't happen on opensolaris because their use of explicit priorities ensures that the spa_zio threads take precedence over user and txg threads. Beyond this particular scenario it seems possible that there are other priority related problems lurking. ZFS in opensolaris is either explicitly or implicitly designed around the different threads having certain relative priorities. While it seems to mostly work without these priorities we are definitely opening ourselves up to untested corner cases by ignoring the prioritization. The one downside I have noticed to setting zfs thread priorities explicitly is a reduction in interactivity during heavy disk load. This is somewhat to be expected since the spa_zio threads are running at a higher priority than user threads. This has been an issue on opensolaris as well: http://bugs.opensolaris.org/view_bug.do?bug_id=6586537 The bug states that a fix is available, but I haven't had a chance to go back and see what they ended up doing to make things more responsive. Currently the thread priority patch for freebsd is a proof of concept. If people think its a valid approach I can try to clean it up so that it could be committed. The two main issues with it right now are: 1) It changes the kproc(9) API by adding a kproc_create_priority() function that allows you to set the priority of the newly created thread. I'm not sure how people feel about this. 2) It makes the opensolaris thread_create() function take freebsd priority values and sets the constants maxclsyspri and minclsyspri to somewhat arbitrary values. This means that if someone ports other opensolaris code over and passes priority values to thread_create without using these constants they will get unexpected behavior. This could be addressed by creating a mapping function from opensolaris priorities to freebsd priorities. > Also, I've been doing some fairly intense testing of zfs in > recent -current and I am tracking down a situation where > performance gets worse but I think I found a workaround. > I am gathering more data regarding the cause, workaround, > symptoms, and originating commit and will post about it soon. I'd be interested to here more about this. Thanks! - Ben