Date: Tue, 5 Mar 2013 01:27:00 -0800 From: Jeremy Chadwick <jdc@koitsu.org> To: Steven Hartland <killing@multiplay.co.uk> Cc: Ben Morrow <ben@morrow.me.uk>, freebsd-stable@freebsd.org Subject: Re: ZFS "stalls" -- and maybe we should be talking about defaults? Message-ID: <20130305092700.GA43045@icarus.home.lan> In-Reply-To: <545CD2ABE3D146F2B91963ADF6090CDE@multiplay.co.uk> References: <513524B2.6020600@denninger.net> <89680320E0FA4C0A99D522EA2037CE6E@multiplay.co.uk> <20130305050539.GA52821@anubis.morrow.me.uk> <20130305053249.GA38107@icarus.home.lan> <545CD2ABE3D146F2B91963ADF6090CDE@multiplay.co.uk>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Mar 05, 2013 at 09:12:47AM -0000, Steven Hartland wrote: > > ----- Original Message ----- From: "Jeremy Chadwick" > <jdc@koitsu.org> > To: "Ben Morrow" <ben@morrow.me.uk> > Cc: <freebsd-stable@freebsd.org> > Sent: Tuesday, March 05, 2013 5:32 AM > Subject: Re: ZFS "stalls" -- and maybe we should be talking about defaults? > > > >On Tue, Mar 05, 2013 at 05:05:47AM +0000, Ben Morrow wrote: > >>Quoth Karl Denninger <karl@denninger.net>: > >>> > Note that the machine is not booting from ZFS -- it is > >>booting from and > >>> has its swap on a UFS 2-drive mirror (handled by the disk adapter; looks > >>> like a single "da0" drive to the OS) and that drive stalls as well when > >>> it freezes. It's definitely a kernel thing when it happens as the OS > >>> would otherwise not have locked (just I/O to the user partitions) -- but > >>> it does. > >> > >>Is it still the case that mixing UFS and ZFS can cause problems, or were > >>they all fixed? I remember a while ago (before the arc usage monitoring > >>code was added) there were a number of reports of serious probles > >>running an rsync from UFS to ZFS. > > > >This problem still exists on stable/9. The behaviour manifests itself > >as fairly bad performance (I cannot remember if stalling or if just > >throughput rates were awful). I can only speculate as to what the root > >cause is, but my guess is that it has something to do with the two > >caching systems (UFS vs. ZFS ARC) fighting over large sums of memory. > > In our case we have no UFS, so this isn't the cause of the stalls. > Spec here is > * 64GB RAM > * LSI 2008 > * 8.3-RELEASE > * Pure ZFS > * Trigger MySQL doing a DB import, nothing else running. > * 4K disk alignment 1. Is compression enabled? Has it ever been enabled (on any fs) in the past (barring pool being destroyed + recreated)? 2. Is dedup enabled? Has it ever been enabled (on any fs) in the past (barring pool being destroyed + recreated)? I can speculate day and night about what could cause this kind of issue, honestly. The possibilities are quite literally infinite, and all of them require folks deeply familiar with both FreeBSD's ZFS as well as very key/major parts of the kernel (ranging from VM to interrupt handlers to I/O subsystem). (This next comment isn't for you, Steve, you already know this :-) ) The way different pieces of the kernel interact with one another is fairly complex; the kernel is not simple. Things I think that might prove useful: * Describing the stall symptoms; what all does it impact? Can you switch VTYs on console when its happening? Network I/O (e.g. SSH'd into the same box and just holding down a letter) showing stalls then catching up? Things of this nature. * How long the stall is in duration (ex. if there's some way to roughly calculate this using "date" in a shell script) * Contents of /etc/sysctl.conf and /boot/loader.conf (re: "tweaking" of the system) * "sysctl -a | grep zfs" before and after a stall -- do not bother with those "ARC summaries" scripts please, at least not for this * "vmstat -z" before and after a stall * "vmstat -m" before and after a stall * "vmstat -s" before and after a stall * "vmstat -i" before, after, AND during a stall Basically, every person who experiences this problem needs to treat every situation uniquely -- no "me too" -- and try to find reliable 100% test cases for it. That's the only way bugs of this nature (i.e. of a complex nature) get fixed. -- | Jeremy Chadwick jdc@koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB |
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130305092700.GA43045>