From owner-freebsd-fs@FreeBSD.ORG Wed Aug 6 17:03:18 2008 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id F40441065676 for ; Wed, 6 Aug 2008 17:03:17 +0000 (UTC) (envelope-from matt@corp.spry.com) Received: from nf-out-0910.google.com (nf-out-0910.google.com [64.233.182.191]) by mx1.freebsd.org (Postfix) with ESMTP id 7F46D8FC1B for ; Wed, 6 Aug 2008 17:03:17 +0000 (UTC) (envelope-from matt@corp.spry.com) Received: by nf-out-0910.google.com with SMTP id h3so6842nfh.33 for ; Wed, 06 Aug 2008 10:03:15 -0700 (PDT) Received: by 10.210.18.18 with SMTP id 18mr2828197ebr.95.1218042195276; Wed, 06 Aug 2008 10:03:15 -0700 (PDT) Received: from matts.spry.com ( [64.79.222.10]) by mx.google.com with ESMTPS id j8sm3165184gvb.1.2008.08.06.10.03.13 (version=TLSv1/SSLv3 cipher=RC4-MD5); Wed, 06 Aug 2008 10:03:14 -0700 (PDT) Message-Id: From: Matt Simerson To: freebsd-fs@freebsd.org In-Reply-To: <20080806112944.6793fc11@twoflower.in.publishing.hu> Content-Type: text/plain; charset=ISO-8859-1; format=flowed; delsp=yes Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Apple Message framework v928.1) Date: Wed, 6 Aug 2008 10:03:08 -0700 References: <20253C48-38CB-4A77-9C59-B993E7E5D78A@corp.spry.com> <62D3072A-E41A-4CFC-971D-9924958F38C7@corp.spry.com> <20080806112944.6793fc11@twoflower.in.publishing.hu> X-Mailer: Apple Mail (2.928.1) Subject: Re: ZFS hang issue and prefetch_disable - UPDATE X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Aug 2008 17:03:18 -0000 On Aug 6, 2008, at 2:29 AM, CZUCZY Gergely wrote: > A few weeks ago, i was exactly referring to this. Somewhere around =20 > here: > http://lists.freebsd.org/pipermail/freebsd-fs/2008-July/004796.html > > The thing, that it works on pointyhat, and it works on kris@'s box, =20= > is just an > IWFM-level, not the proof of any stability, reliability. > > FreeBSD is a quite stable OS, the code has a relatively good quality =20= > as far as > I've seen it, and it's quite stable. Somewhy the ZFS port seems to =20 > be an > exception, it's refused to be merged properly and the issues to be =20 > solved. > > No matter how much someone tunes ZFS, no matter what you disable, =20 > it's not > garanteed, not even on the tiniest level ever, to not to freeze your =20= > box, not > to throw a panic, to keep your data and everything. You want/expect guarantees of stability with experimental features? I =20= think someone needs their expectations calibrated. > Many of us has reported this, bot noone looked into it. Because you haven't seen proof that someone looked into doesn't mean =20 nobody has. You are not being fair nor respectful to the time that =20 others are investing in ZFS. > use something else, but that's not the point. The point is, I don't =20= > see a > meaning of a port of this quality. I know it's quite complex and =20 > whatnot, but > at this level, it cannot be run in a production environment. It's =20 > missing > reliability. If you don't see the value of ZFS, don't use it. I'm not complaining =20 because ZFS isn't stable. I'd like it to be, but the best way I can =20 help is provide detailed information about my setup and under what =20 conditions the feature has problems. By doing so, I'm providing useful =20= data. Denigrating the authors because ZFS doesn't meet your =20 expectations doesn't help anybody, so please don't do that. Matt > No matter how much you hack it, there's always a not-so-impossible =20 > chance, that > it will shot you in your back, when you're not watching. > > I hope the latest ZFS patches will solve a lot of issues, and we =20 > won't see > problems like this anymore. > > On Thu, 31 Jul 2008 13:58:26 -0700 > Matt Simerson wrote: > >> >> My announcement that vfs.zfs.prefetch_disable=3D1 resulted in a = stable >> system was premature. >> >> One of my backup servers (see specs below) hung. When I got onto the >> console via KVM, it looked normal with no errors but didn't respond =20= >> to >> Control-Alt-Delete. After a power cycle, zpool status showed 8 =20 >> disks >> FAULTED and the action state was: http://www.sun.com/msg/ZFS-8000-5E >> >> Basically, that meant my ZFS file system and 7.5TB of data was gone. >> Ouch. >> >> I'm using a pair of ARECA 1231ML RAID controllers. Previously, I had >> them configured in JBOD with raidz2. This time around, I configured >> both controllers with one 12 disk RAID 6 volume. Now FreeBSD just =20 >> sees >> two 10TB disks which I stripe with ZFS: zpool create back01 /dev/ >> da0 /dev/da1 >> >> I also did a bit more fiddling with /boot/loader.conf. Previous I =20 >> had: >> >> vm.kmem_size=3D"1536M" >> vm.kmem_size_max=3D"1536M" >> vfs.zfs.prefetch_disable=3D1 >> >> This resulted in ZFS using 1.1GB of RAM (as measured using the >> technique described on the wiki) during normal use. The system in >> question hung during the nightly processing (which backs up some =20 >> other >> systems via rsync) and my suspicions are that when I/O load picked =20= >> up, >> it exhausted the available kernel memory and hung the system. So =20 >> now I >> have these settings on one system: >> >> vm.kmem_size=3D"1536M" >> vm.kmem_size_max=3D"1536M" >> vfs.zfs.arc_min=3D"16M" >> vfs.zfs.arc_max=3D"64M" >> vfs.zfs.prefetch_disable=3D1 >> >> and the same except vfs.zfs.arc_max=3D"256M" on the other. The one = with >> 64M uses 256MB of RAM for ZFS and the one set at 256M uses 600MB of >> RAM. These are measured under heavy network and disk IO load being >> generated by multiple rsync processes pulling backups from remote >> nodes and storing it on ZFS. I am using ZFS compression. >> >> I get much better performance now with RAID 6 on the controller and >> ZFS striping than using raidz2. >> >> Unless tuning the arc_ settings made the difference. Either way, the >> system I just rebuilt is now quite a bit faster with RAID 6 than JBOD >> + raidz2. >> >> Hopefully tuning vfs.zfs.arc_max will result in stability. If it >> doesn't, my next choice is upgrading to -HEAD with the recent ZFS >> patch or ditching ZFS entirely and using geom_stripe. I don't like >> either option. >> >> Matt >> >> >>> From: Matt Simerson >>> Date: July 22, 2008 1:25:42 PM PDT >>> To: freebsd-fs@freebsd.org >>> Subject: ZFS hang issue and prefetch_disable >>> >>> Symptoms >>> >>> Deadlocks under heavy IO load on the ZFS file system with >>> prefetch_disable=3D0. Setting vfs.zfs.prefetch_disable=3D1 results = in a >>> stable system. >>> >>> Configuration >>> >>> Two machines. Identically built. Both exhibit identical behavior. >>> 8 cores (2 x E5420) x 2.5GHz, 16 GB RAM, 24 x 1TB disks. >>> FreeBSD 7.0 amd64 >>> dmesg: http://matt.simerson.net/computing/zfs/dmesg.txt >>> >>> Boot disk is a read only 1GB compact flash >>> # cat /etc/fstab >>> /dev/ad0s1a / ufs ro,noatime 2 2 >>> >>> # df -h / >>> Filesystem 1K-blocks Used Avail Capacity Mounted on >>> /dev/ad0s1a 939M 555M 309M 64% / >>> >>> RAM has been boosted as suggested in ZFS Tuning Guide >>> # cat /boot/loader.conf >>> vm.kmem_size=3D 1610612736 >>> vm.kmem_size_max=3D 1610612736 >>> vfs.zfs.prefetch_disable=3D1 >>> >>> I haven't mucked much with the other memory settings as I'm using >>> amd64 and according to the FreeBSD ZFS wiki, that isn't necessary. >>> I've tried higher settings for kmem but that resulted in a failed >>> boot. I have ample RAM And would love to use as much as possible for >>> network and disk I/O buffers as that's principally all this system >>> does. >>> >>> Disks & ZFS options >>> >>> Sun's "Best Practices" suggests limiting the number of disks in a >>> raidz pool to no more than 6-10, IIRC. ZFS is configured as shown: >>> http://matt.simerson.net/computing/zfs/zpool.txt >>> >>> I'm using all of the ZFS default properties except: atime=3Doff, >>> compression=3Don. >>> >>> Environment >>> >>> I'm using these machines as backup servers. I wrote an application >>> that generates a list of the thousands of VPS accounts we host. For >>> each host, it generates a rsnapshot configuration file and backs up >>> up their VPS to these systems via rsync. The application manages >>> concurrency and will spawn additional rsync processes if system i/o >>> load is below a defined threshhold. Which is to say, I can crank up >>> or down the amount of disk IO the system sees. >>> >>> With vfs.zfs.prefetch_disable=3D0, I can trigger a hang within a few >>> hours (no more than a day). If I keep the i/o load (measured via >>> iostat) down to a low level (< 200 iops) then I still get hangs but >>> less frequently (1-6 days). The only way I have found to prevent >>> the hangs is by setting vfs.zfs.prefetch_disable=3D1. >> >> _______________________________________________ >> freebsd-fs@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > > > --=20 > =DCdv=F6lettel, > > Czuczy Gergely > Harmless Digital Bt > mailto: gergely.czuczy@harmless.hu > Tel: +36-30-9702963