From owner-freebsd-stable@FreeBSD.ORG Tue Mar 5 09:27:02 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 06421CC5 for ; Tue, 5 Mar 2013 09:27:02 +0000 (UTC) (envelope-from jdc@koitsu.org) Received: from qmta08.emeryville.ca.mail.comcast.net (qmta08.emeryville.ca.mail.comcast.net [IPv6:2001:558:fe2d:43:76:96:30:80]) by mx1.freebsd.org (Postfix) with ESMTP id DF22FD92 for ; Tue, 5 Mar 2013 09:27:01 +0000 (UTC) Received: from omta11.emeryville.ca.mail.comcast.net ([76.96.30.36]) by qmta08.emeryville.ca.mail.comcast.net with comcast id 7lT11l0020mlR8UA8lT1qF; Tue, 05 Mar 2013 09:27:01 +0000 Received: from koitsu.strangled.net ([67.180.84.87]) by omta11.emeryville.ca.mail.comcast.net with comcast id 7lT01l0081t3BNj8XlT0Md; Tue, 05 Mar 2013 09:27:01 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 58E0A73A31; Tue, 5 Mar 2013 01:27:00 -0800 (PST) Date: Tue, 5 Mar 2013 01:27:00 -0800 From: Jeremy Chadwick To: Steven Hartland Subject: Re: ZFS "stalls" -- and maybe we should be talking about defaults? Message-ID: <20130305092700.GA43045@icarus.home.lan> References: <513524B2.6020600@denninger.net> <89680320E0FA4C0A99D522EA2037CE6E@multiplay.co.uk> <20130305050539.GA52821@anubis.morrow.me.uk> <20130305053249.GA38107@icarus.home.lan> <545CD2ABE3D146F2B91963ADF6090CDE@multiplay.co.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <545CD2ABE3D146F2B91963ADF6090CDE@multiplay.co.uk> User-Agent: Mutt/1.5.21 (2010-09-15) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcast.net; s=q20121106; t=1362475621; bh=dYYv9BAeH+HWOmMEGRGBsAvlgYtOhLkfcnuVO5BYspQ=; h=Received:Received:Received:Date:From:To:Subject:Message-ID: MIME-Version:Content-Type; b=XbaVRiCBLxh+Jc6Zdu0n5wKjMt3lAPmeRpqI0phbgQIt4Pt8sRMcXUwFeJovjw2El oKMQ3fx79olshuCIrJ4K6uvvmwWbQJO4mm781+05++73/qF9nQ0fEqE5aAKPmLi2eq m1354A4KrG2NqDVYqiv9oAdED13X1LtcIbG7No/CtouQ0oiieEb0WEcpsr1u7a6n0s ehR1vXDgnalEoE+rxlmIaVz6apGkH7m3x+8bX1BWMzrVtmUXps6l4VDw7ChIzti6Ye qAysqwwWnH51dcTstkgLne20fTSZXm5g8NZfMd/wvrQRqzOzS74C4w1Ca33/K3MF3o 5R+89YDJxkuDQ== Cc: Ben Morrow , freebsd-stable@freebsd.org X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Mar 2013 09:27:02 -0000 On Tue, Mar 05, 2013 at 09:12:47AM -0000, Steven Hartland wrote: > > ----- Original Message ----- From: "Jeremy Chadwick" > > To: "Ben Morrow" > Cc: > Sent: Tuesday, March 05, 2013 5:32 AM > Subject: Re: ZFS "stalls" -- and maybe we should be talking about defaults? > > > >On Tue, Mar 05, 2013 at 05:05:47AM +0000, Ben Morrow wrote: > >>Quoth Karl Denninger : > >>> > Note that the machine is not booting from ZFS -- it is > >>booting from and > >>> has its swap on a UFS 2-drive mirror (handled by the disk adapter; looks > >>> like a single "da0" drive to the OS) and that drive stalls as well when > >>> it freezes. It's definitely a kernel thing when it happens as the OS > >>> would otherwise not have locked (just I/O to the user partitions) -- but > >>> it does. > >> > >>Is it still the case that mixing UFS and ZFS can cause problems, or were > >>they all fixed? I remember a while ago (before the arc usage monitoring > >>code was added) there were a number of reports of serious probles > >>running an rsync from UFS to ZFS. > > > >This problem still exists on stable/9. The behaviour manifests itself > >as fairly bad performance (I cannot remember if stalling or if just > >throughput rates were awful). I can only speculate as to what the root > >cause is, but my guess is that it has something to do with the two > >caching systems (UFS vs. ZFS ARC) fighting over large sums of memory. > > In our case we have no UFS, so this isn't the cause of the stalls. > Spec here is > * 64GB RAM > * LSI 2008 > * 8.3-RELEASE > * Pure ZFS > * Trigger MySQL doing a DB import, nothing else running. > * 4K disk alignment 1. Is compression enabled? Has it ever been enabled (on any fs) in the past (barring pool being destroyed + recreated)? 2. Is dedup enabled? Has it ever been enabled (on any fs) in the past (barring pool being destroyed + recreated)? I can speculate day and night about what could cause this kind of issue, honestly. The possibilities are quite literally infinite, and all of them require folks deeply familiar with both FreeBSD's ZFS as well as very key/major parts of the kernel (ranging from VM to interrupt handlers to I/O subsystem). (This next comment isn't for you, Steve, you already know this :-) ) The way different pieces of the kernel interact with one another is fairly complex; the kernel is not simple. Things I think that might prove useful: * Describing the stall symptoms; what all does it impact? Can you switch VTYs on console when its happening? Network I/O (e.g. SSH'd into the same box and just holding down a letter) showing stalls then catching up? Things of this nature. * How long the stall is in duration (ex. if there's some way to roughly calculate this using "date" in a shell script) * Contents of /etc/sysctl.conf and /boot/loader.conf (re: "tweaking" of the system) * "sysctl -a | grep zfs" before and after a stall -- do not bother with those "ARC summaries" scripts please, at least not for this * "vmstat -z" before and after a stall * "vmstat -m" before and after a stall * "vmstat -s" before and after a stall * "vmstat -i" before, after, AND during a stall Basically, every person who experiences this problem needs to treat every situation uniquely -- no "me too" -- and try to find reliable 100% test cases for it. That's the only way bugs of this nature (i.e. of a complex nature) get fixed. -- | Jeremy Chadwick jdc@koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB |