From owner-freebsd-stable@FreeBSD.ORG Wed Dec 16 17:09:48 2009 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9E84E1065679 for ; Wed, 16 Dec 2009 17:09:48 +0000 (UTC) (envelope-from arnaud.houdelette@tzim.net) Received: from golanth.tzim.net (unknown [IPv6:2001:41d0:1:d91f:21c:c0ff:fe4b:cf32]) by mx1.freebsd.org (Postfix) with ESMTP id 3CEA08FC13 for ; Wed, 16 Dec 2009 17:09:48 +0000 (UTC) Received: from 12rf.tzim.net ([82.232.60.244] helo=[192.168.0.14]) by golanth.tzim.net with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.70 (FreeBSD)) (envelope-from ) id 1NKxNi-000GDv-PK; Wed, 16 Dec 2009 18:09:47 +0100 Message-ID: <4B29145A.4080601@tzim.net> Date: Wed, 16 Dec 2009 18:09:46 +0100 From: Arnaud Houdelette User-Agent: Thunderbird 2.0.0.23 (Windows/20090812) MIME-Version: 1.0 To: Ben Kelly References: <4B290515.5080909@tzim.net> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Authenticated-User: tzim@tzim.net X-Authenticator: plain Cc: freebsd-stable@freebsd.org Subject: Re: Possible ZFS livelock or SCHED_ULE bug ? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Dec 2009 17:09:48 -0000 Ben Kelly wrote: > On Dec 16, 2009, at 11:04 AM, Arnaud Houdelette wrote: > > >> Hi all ! >> I got a UniProcessor AMD64 box, with 512 MB ram with 2 ZFS pools as a home-NAS. >> >> I got some IO issues since I moved from 7.2 to 8.0. >> With a GENERIC kernel (or a stripped down one), during high IO activity (as a make buildword can cause), I encounter random hangs or deadlocks. >> top show system CPU usage at 99%, the most CPU using process being [zfskern] ( {txg_thread_enter} if I switch to thread view). >> The box still respond to ping. Current processes can still run, but I can't run new ones. >> Sometimes, I can return to normal by Ctrl-C-ing the buildworld (or other operation), sometimes I can't, I got to reboot the box. >> >> The Issue seemed to become less frequent with 8.0-stable instead of 8.0-RELEASE, but still present (I get approximately 75% chance of hang with a buildworld). >> I got the hang with Prefetch enabled or disabled. Idem for ZIL. >> >> I tried to enable kernel dumps, but the box hangs saving the dump (root is on ZFS) or when starting kdbg on it. >> I recompiled kernel with SCHED_4BSD, and it seems I can't reproduce the hang. >> >> What do you think ? >> Did I misconfigured something ? >> > > This sounds similar to something I ran into on CURRENT last year: > > http://docs.freebsd.org/cgi/getmsg.cgi?fetch=832196+0+archive/2009/freebsd-current/20090322.freebsd-current > > The immediate problem was a priority inversion problem between the txg_thread_enter threads and the spa_zio threads. This should be solved (or at least mitigated) on 8.0 now that these threads have explicit priorities set. Can you check to see what priorities these threads are at on your machine? They should have priorities something like -8 for txg_thread_enter and -16 for spa_zio. > > - Ben > As far as I can tell, this is the priorities that I see on my machine. I'm doing another test. This once with ULE but without options SMP set. I'm currently building world, and so far, I did not encountered any hang. (and the system seems more responsive that with 4BSD). I'll keep testing and report... Arnaud