From owner-freebsd-current@FreeBSD.ORG Tue Apr 14 00:03:31 2009 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 94C81106566B for ; Tue, 14 Apr 2009 00:03:31 +0000 (UTC) (envelope-from artemb@gmail.com) Received: from mail-ew0-f171.google.com (mail-ew0-f171.google.com [209.85.219.171]) by mx1.freebsd.org (Postfix) with ESMTP id 018838FC19 for ; Tue, 14 Apr 2009 00:03:30 +0000 (UTC) (envelope-from artemb@gmail.com) Received: by ewy19 with SMTP id 19so2221549ewy.43 for ; Mon, 13 Apr 2009 17:03:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:received:in-reply-to :references:date:x-google-sender-auth:message-id:subject:from:to:cc :content-type:content-transfer-encoding; bh=7pnYCfue1mWRwmaJGQzYXAnmWeyso4iREcJXmUaOAAw=; b=EnuBnTGfvSbqIKDSp1ILgmXUo+bWN/rGqs/rgnPKA+KXMMtyNytm19Z2AzIqNW04gf rq0d/16Vo4NUnKAXD8WZ09BCZw8R+qlJ8UNKiKARWYNXEQgprw2rUxgOrsZ9VOfbXU0Y tNSVmVc6f7VDtKDileG4zA8yEt8EkdgoR3118= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; b=CTR6eVPnUqan2jKA07UOpXA/1zUCJnABZi/9CkuPeHOS3Dx+UCwYS0lb749kjadBvJ IgGamkwFxQDT+jUOwQ2UHZjWr7U5//1rR3XO9Zcr7dk/Sb3yUDNgbxXxJyq+71CrH5Yw vTdhGf8/f4FhHQ4nIdqHpfNvmsceo0XVb0v1A= MIME-Version: 1.0 Sender: artemb@gmail.com Received: by 10.211.168.5 with SMTP id v5mr5702693ebo.74.1239665782654; Mon, 13 Apr 2009 16:36:22 -0700 (PDT) In-Reply-To: References: <49C2CFF6.8070608@egr.msu.edu> Date: Mon, 13 Apr 2009 16:36:22 -0700 X-Google-Sender-Auth: e64b87325edab5ae Message-ID: From: Artem Belevich To: Ben Kelly Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: freebsd-current@freebsd.org Subject: Re: [patch] zfs livelock and thread priorities X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 14 Apr 2009 00:03:31 -0000 Tried your patch that used PRIBIO+{1,2} for priorities with -current r191008 and the kernel died with "spinlock held too long" panic. Actually, there apparently were two instances of panic on different cores.. Here's output of "alltrace" and "ps" after the crash: http://pastebin.com/f140f4596 I've reverted the change and kernel booted just fine. The box is quad-core with two ZFS pools -- one single-disk and another one is a two-disk mirror. Freebsd is installed on UFS partitions, ZFS is used for user stuff only. --Artem On Thu, Mar 19, 2009 at 5:19 PM, Ben Kelly wrote: > On Mar 19, 2009, at 7:06 PM, Adam McDougall wrote: >> >> I was really impressed with your diagnosis but didn't try your patch unt= il >> this afternoon. =A0I had not seen processes spin, but I have had zfs get= stuck >> roughly every 2 days on a somewhat busy ftp/rsync server until I turned = off >> zil again, then it was up for over 13 days when I decided to try this pa= tch. >> =A0This system boots from a ufs / and turns around to try mounting a zfs= root >> over top, but the first time it stalled for a few minutes at the root mo= unt >> and "gave up" with a spinlock held too long, second time same thing but = I >> didn't wait long enough for the spinlock error. Then I tried a power cyc= le >> just because, and the next two tries I got a page fault kernel panic. = =A0I'd >> try to give more details but right now im trying to get the server back = up >> with a livecd because I goofed and don't have an old kernel to fall back= on. >> =A0Just wanted to let you know, and thanks for getting as far as you did= ! > > Ouch! =A0Sorry you ran into that. > > I haven't seen these problems, but I keep my root partition on UFS and on= ly > use zfs for /usr, /var, etc. =A0Perhaps that explains the difference in > behavior. > > You could try changing the patch to use lower priorities. =A0To do this c= hange > compat/opensolaris/sys/proc.h so that it reads: > > =A0#define =A0 =A0 =A0 =A0minclsyspri =A0 =A0 PRI_MAX_REALTIME > =A0#define =A0 =A0 =A0 =A0maxclsyspri =A0 =A0(PRI_MAX_REALTIME - 4) > > This compiles and runs on my machine. =A0The theory here is that other ke= rnel > threads will be able to run as they used to, but the zfs threads will sti= ll > be fixed relative to one another. =A0Its really just a stab in the dark, > though. =A0I don't have any experience with the "zfs mounted on top of uf= s > root" configuration. =A0If this works we should try to see if we can repl= ace > PRI_MAX_REALTIME with PRI_MAX_KERN so that the zfs kernel threads run in = the > kernel priority range. > > If you could get a stack trace of the kernel panic that would be helpful. > =A0Also, if you have console access, can you break to debugger during the= boot > spinlock hang and get a backtrace of the blocked process? > > If you want to compare other aspects of your environment to mine I upload= ed > a bunch of info here: > > =A0http://www.wanderview.com/svn/public/misc/zfs_livelock > > Finally, I'm CC'ing the list and some other people so they are aware that > the patch runs the risk of a panic. > > I hope that helps. > > - Ben > _______________________________________________ > freebsd-current@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org= " >