Date: Sat, 26 Jan 2013 04:00:01 GMT From: Jeremy Chadwick <jdc@koitsu.org> To: freebsd-fs@FreeBSD.org Subject: Re: kern/169480: [zfs] ZFS stalls on heavy I/O Message-ID: <201301260400.r0Q401QP059909@freefall.freebsd.org>
next in thread | raw e-mail | index | archive | help
The following reply was made to PR kern/169480; it has been noted by GNATS. From: Jeremy Chadwick <jdc@koitsu.org> To: Harry Coin <hgcoin@gmail.com> Cc: bug-followup@FreeBSD.org, levent.serinol@mynet.com Subject: Re: kern/169480: [zfs] ZFS stalls on heavy I/O Date: Fri, 25 Jan 2013 19:55:26 -0800 Recommendations: 1. Instead of /dev/random use /dev/zero. /dev/random is not blazing fast given it has to harvest lots of entropy from places. If you're doing I/O speed testing just use /dev/zero. The speed difference is quite big. 2. For dd, instead of bs=512 use bs=64k. bs=512 isn't very ideal; these are direct I/O writes of 512 bytes each, which is dog slow. I repeat: dog slow. Linux does this differently. 3. During the dd, in another VTY or window, use "gstat -I500ms" and watch the I/O speeds for your ada[2345] disks during the dd. They should be hitting peaks between 60-150MBytes/sec under the far right "Kbps" field (far left=read, far right=write). The large potential speed variance has to do with how much data you already have on the pool, i.e. MHDDs get slower as the actuator arms move inward towards the spindle motor. That's why you might see, for example, 150MBytes/sec when reading/writing to low-numbered LBAs but slower speeds when writing to high-numbered LBAs. This speed will be "bursty" and "sporadic" due to the how ZFS ARC works. The interval at which "things are flushed to disk" is based on the vfs.zfs.txg.timeout sysctl, which on FreeBSD 9.1-RELEASE should default to 5 (5 seconds). 4. "zpool iostat -v {pool}" does not provide accurate speed indications for the same reason "iostat" doesn't show ""valid"" (it does but not what most people would hope for) information while "iostat 1" would. You need to run it with an interval, i.e. "zpool iostat -v {pool} 1" and let it run for a while while doing I/O. But I recommend using gstat like I said, simply because the interval can be set at 500ms (0.5s) and you get a better idea of what your peak I/O speed is. If you find a single disk that is **always** performing badly, then that disk is your bottleneck and I can help you with analysis of its problem. 5. Your "zpool scrub" speed being 14MBytes/second indicates you are no where close to your ideal I/O speed. It should not be that slow unless you're doing tons of I/O at the same time as the scrub. Also, scrubs take longer now due to the disabling of the vdev cache (and that's not a FreeBSD thing, it's that way in Illumos too, and it's a sensitive topic to discuss). 6. On FreeBSD 9.1-RELEASE generally speaking you should not have to tune any sysctls. The situation was different in 8.x and 9.0. Your system only has 4GB of RAM so prefetching automatically gets disabled, by the way, just in case you were wondering about that (there were problems with prefetch in older releases). 7. You should probably keep "top -s 1" running, and you might even consider using "top -S -s 1" to see system/kernel threads (they're in brackets). This isn't going to tell you downright what's making things slow though. "vmstat -i" during heavy I/O would be useful too, just in case somehow you have a shared interrupt that's being pegged hard (for example I've seen SATA controllers and USB controllers sharing an interrupt, even with APICs, where the USB layer is busted churning out 1000 ints/sec and thus affecting SATA I/O speed). 8. If you want to compare systems I'm happy to do so, although I have less disks than you do (3 in raidz1, WD Red 1TB drives). However my system is not a Pentium D-class processor; it's a Core 2 Quad Q9500. The D-class stuff is fairly old. I have some other theories as well but one thing at a time. -- | Jeremy Chadwick jdc@koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB |
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201301260400.r0Q401QP059909>