Date: Tue, 4 Oct 2011 02:43:45 -0400 From: Dave Cundiff <syshackmin@gmail.com> To: questions@freebsd.org Subject: ZFS Write Lockup Message-ID: <CAKHEz2a%2BRFmcCyEMnooDmb8vERA-qg0A474LZ9mLtPvoij8Xmw@mail.gmail.com>
next in thread | raw e-mail | index | archive | help
Hi, I'm running 8.2-RELEASE and running into an IO lockup on ZFS that is happening pretty regularly. The system is stock except for the following set in loader.conf vm.kmem_size="30G" vfs.zfs.arc_max="22G" kern.hz=100 I know the kmem settings aren't SUPPOSED to be necessary now, buy my ZFS boxes were crashing until I added them. The machine has 24 gigs of RAM. The kern.hz=100 was to stretch out the l2arc bug that pops up at 28days with it set to 1000. [root@san2 ~]# zpool status pool: san state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM san ONLINE 0 0 0 da1 ONLINE 0 0 0 logs mirror ONLINE 0 0 0 ad6s1b ONLINE 0 0 0 ad14s1b ONLINE 0 0 0 cache ad6s1d ONLINE 0 0 0 ad14s1d ONLINE 0 0 0 errors: No known data errors Here's a zpool iostat from a machine in trouble. san 9.08T 3.55T 0 0 0 7.92K san 9.08T 3.55T 0 447 0 5.77M san 9.08T 3.55T 0 309 0 2.83M san 9.08T 3.55T 0 0 0 0 san 9.08T 3.55T 62 0 2.22M 0 san 9.08T 3.55T 0 2 0 23.5K san 9.08T 3.55T 0 0 0 0 san 9.08T 3.55T 0 0 0 0 san 9.08T 3.55T 0 254 0 6.62M san 9.08T 3.55T 0 249 0 3.16M san 9.08T 3.55T 0 0 0 0 san 9.08T 3.55T 34 0 491K 0 san 9.08T 3.55T 0 6 0 62.7K san 9.08T 3.55T 0 0 0 0 san 9.08T 3.55T 0 85 0 6.59M san 9.08T 3.55T 0 0 0 0 san 9.08T 3.55T 0 452 0 4.88M san 9.08T 3.55T 109 0 3.12M 0 san 9.08T 3.55T 0 0 0 0 san 9.08T 3.55T 0 0 0 7.84K san 9.08T 3.55T 0 434 0 6.41M san 9.08T 3.55T 0 0 0 0 san 9.08T 3.55T 0 304 0 2.90M san 9.08T 3.55T 37 0 628K 0 Its supposed to look like san 9.07T 3.56T 162 167 3.75M 6.09M san 9.07T 3.56T 5 0 47.4K 0 san 9.07T 3.56T 19 0 213K 0 san 9.07T 3.56T 120 0 3.26M 0 san 9.07T 3.56T 92 0 741K 0 san 9.07T 3.56T 114 0 2.86M 0 san 9.07T 3.56T 72 0 579K 0 san 9.07T 3.56T 14 0 118K 0 san 9.07T 3.56T 24 0 213K 0 san 9.07T 3.56T 25 0 324K 0 san 9.07T 3.56T 8 0 126K 0 san 9.07T 3.56T 28 0 505K 0 san 9.07T 3.56T 15 0 126K 0 san 9.07T 3.56T 11 0 158K 0 san 9.07T 3.56T 19 0 356K 0 san 9.07T 3.56T 198 0 3.55M 0 san 9.07T 3.56T 21 0 173K 0 san 9.07T 3.56T 18 0 150K 0 san 9.07T 3.56T 23 0 260K 0 san 9.07T 3.56T 9 0 78.3K 0 san 9.07T 3.56T 21 0 173K 0 san 9.07T 3.56T 2 4.59K 16.8K 142M san 9.07T 3.56T 12 0 103K 0 san 9.07T 3.56T 26 454 312K 4.35M san 9.07T 3.56T 111 0 3.34M 0 san 9.07T 3.56T 28 0 870K 0 san 9.07T 3.56T 75 0 3.88M 0 san 9.07T 3.56T 43 0 1.22M 0 san 9.07T 3.56T 26 0 270K 0 I don't know what triggers the problem but I know how to fix it. If I perform a couple snapshot deletes the IO will come back in line every single time. Fortunately I have LOTS of snapshots to delete. [root@san2 ~]# zfs list -r -t snapshot | wc -l 5236 [root@san2 ~]# zfs list -r -t volume | wc -l 17 Being fairly new to FreeBSD and ZFS I'm pretty clueless on where to begin tracking this down. I've been staring at gstat trying to see if a zvol is getting a big burst of writes that may be flooding the drive controller but I haven't caught anything yet. top -S -H shows zio_write_issue threads consuming massive amounts of CPU during the lockup. Normally they sit around 5-10%. Any suggestions on where I could start to track this down would be greatly appreciated. Thanks, -- Dave Cundiff System Administrator A2Hosting, Inc http://www.a2hosting.com
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAKHEz2a%2BRFmcCyEMnooDmb8vERA-qg0A474LZ9mLtPvoij8Xmw>