From owner-freebsd-fs@FreeBSD.ORG Wed Aug 6 09:29:48 2008 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1C53F1065671 for ; Wed, 6 Aug 2008 09:29:48 +0000 (UTC) (envelope-from phoemix@harmless.hu) Received: from marvin.harmless.hu (marvin.harmless.hu [195.56.55.204]) by mx1.freebsd.org (Postfix) with ESMTP id AED068FC21 for ; Wed, 6 Aug 2008 09:29:47 +0000 (UTC) (envelope-from phoemix@harmless.hu) Received: from fw.publishing.hu ([82.131.181.62] helo=twoflower.in.publishing.hu) by marvin.harmless.hu with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.69 (FreeBSD)) (envelope-from ) id 1KQfKz-000KoZ-TI; Wed, 06 Aug 2008 11:29:46 +0200 Date: Wed, 6 Aug 2008 11:29:44 +0200 From: CZUCZY Gergely To: Matt Simerson Message-ID: <20080806112944.6793fc11@twoflower.in.publishing.hu> In-Reply-To: <62D3072A-E41A-4CFC-971D-9924958F38C7@corp.spry.com> References: <20253C48-38CB-4A77-9C59-B993E7E5D78A@corp.spry.com> <62D3072A-E41A-4CFC-971D-9924958F38C7@corp.spry.com> Organization: Harmless Digital X-Mailer: Claws Mail 3.5.0 (GTK+ 2.12.11; i386-portbld-freebsd6.3) Mime-Version: 1.0 Content-Type: multipart/signed; boundary="Sig_//.JVBhPRnbKd9=4bfY14EBp"; protocol="application/pgp-signature"; micalg=PGP-SHA1 Sender: Czuczy Gergely Cc: freebsd-fs@freebsd.org Subject: Re: ZFS hang issue and prefetch_disable - UPDATE X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Aug 2008 09:29:48 -0000 --Sig_//.JVBhPRnbKd9=4bfY14EBp Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hello, A few weeks ago, i was exactly referring to this. Somewhere around here: http://lists.freebsd.org/pipermail/freebsd-fs/2008-July/004796.html The thing, that it works on pointyhat, and it works on kris@'s box, is just= an IWFM-level, not the proof of any stability, reliability. FreeBSD is a quite stable OS, the code has a relatively good quality as far= as I've seen it, and it's quite stable. Somewhy the ZFS port seems to be an exception, it's refused to be merged properly and the issues to be solved. No matter how much someone tunes ZFS, no matter what you disable, it's not garanteed, not even on the tiniest level ever, to not to freeze your box, n= ot to throw a panic, to keep your data and everything. Many of us has reported this, bot noone looked into it. I know, we're free = to use something else, but that's not the point. The point is, I don't see a meaning of a port of this quality. I know it's quite complex and whatnot, b= ut at this level, it cannot be run in a production environment. It's missing reliability. No matter how much you hack it, there's always a not-so-impossible chance, = that it will shot you in your back, when you're not watching. I hope the latest ZFS patches will solve a lot of issues, and we won't see problems like this anymore. On Thu, 31 Jul 2008 13:58:26 -0700 Matt Simerson wrote: >=20 > My announcement that vfs.zfs.prefetch_disable=3D1 resulted in a stable =20 > system was premature. >=20 > One of my backup servers (see specs below) hung. When I got onto the =20 > console via KVM, it looked normal with no errors but didn't respond to =20 > Control-Alt-Delete. After a power cycle, zpool status showed 8 disks =20 > FAULTED and the action state was: http://www.sun.com/msg/ZFS-8000-5E >=20 > Basically, that meant my ZFS file system and 7.5TB of data was gone. =20 > Ouch. >=20 > I'm using a pair of ARECA 1231ML RAID controllers. Previously, I had =20 > them configured in JBOD with raidz2. This time around, I configured =20 > both controllers with one 12 disk RAID 6 volume. Now FreeBSD just sees =20 > two 10TB disks which I stripe with ZFS: zpool create back01 /dev/=20 > da0 /dev/da1 >=20 > I also did a bit more fiddling with /boot/loader.conf. Previous I had: >=20 > vm.kmem_size=3D"1536M" > vm.kmem_size_max=3D"1536M" > vfs.zfs.prefetch_disable=3D1 >=20 > This resulted in ZFS using 1.1GB of RAM (as measured using the =20 > technique described on the wiki) during normal use. The system in =20 > question hung during the nightly processing (which backs up some other =20 > systems via rsync) and my suspicions are that when I/O load picked up, =20 > it exhausted the available kernel memory and hung the system. So now I =20 > have these settings on one system: >=20 > vm.kmem_size=3D"1536M" > vm.kmem_size_max=3D"1536M" > vfs.zfs.arc_min=3D"16M" > vfs.zfs.arc_max=3D"64M" > vfs.zfs.prefetch_disable=3D1 >=20 > and the same except vfs.zfs.arc_max=3D"256M" on the other. The one with = =20 > 64M uses 256MB of RAM for ZFS and the one set at 256M uses 600MB of =20 > RAM. These are measured under heavy network and disk IO load being =20 > generated by multiple rsync processes pulling backups from remote =20 > nodes and storing it on ZFS. I am using ZFS compression. >=20 > I get much better performance now with RAID 6 on the controller and =20 > ZFS striping than using raidz2. >=20 > Unless tuning the arc_ settings made the difference. Either way, the =20 > system I just rebuilt is now quite a bit faster with RAID 6 than JBOD =20 > + raidz2. >=20 > Hopefully tuning vfs.zfs.arc_max will result in stability. If it =20 > doesn't, my next choice is upgrading to -HEAD with the recent ZFS =20 > patch or ditching ZFS entirely and using geom_stripe. I don't like =20 > either option. >=20 > Matt >=20 >=20 > > From: Matt Simerson > > Date: July 22, 2008 1:25:42 PM PDT > > To: freebsd-fs@freebsd.org > > Subject: ZFS hang issue and prefetch_disable > > > > Symptoms > > > > Deadlocks under heavy IO load on the ZFS file system with =20 > > prefetch_disable=3D0. Setting vfs.zfs.prefetch_disable=3D1 results in = a =20 > > stable system. > > > > Configuration > > > > Two machines. Identically built. Both exhibit identical behavior. > > 8 cores (2 x E5420) x 2.5GHz, 16 GB RAM, 24 x 1TB disks. > > FreeBSD 7.0 amd64 > > dmesg: http://matt.simerson.net/computing/zfs/dmesg.txt > > > > Boot disk is a read only 1GB compact flash > > # cat /etc/fstab > > /dev/ad0s1a / ufs ro,noatime 2 2 > > > > # df -h / > > Filesystem 1K-blocks Used Avail Capacity Mounted on > > /dev/ad0s1a 939M 555M 309M 64% / > > > > RAM has been boosted as suggested in ZFS Tuning Guide > > # cat /boot/loader.conf > > vm.kmem_size=3D 1610612736 > > vm.kmem_size_max=3D 1610612736 > > vfs.zfs.prefetch_disable=3D1 > > > > I haven't mucked much with the other memory settings as I'm using =20 > > amd64 and according to the FreeBSD ZFS wiki, that isn't necessary. =20 > > I've tried higher settings for kmem but that resulted in a failed =20 > > boot. I have ample RAM And would love to use as much as possible for =20 > > network and disk I/O buffers as that's principally all this system =20 > > does. > > > > Disks & ZFS options > > > > Sun's "Best Practices" suggests limiting the number of disks in a =20 > > raidz pool to no more than 6-10, IIRC. ZFS is configured as shown: > > http://matt.simerson.net/computing/zfs/zpool.txt > > > > I'm using all of the ZFS default properties except: atime=3Doff, =20 > > compression=3Don. > > > > Environment > > > > I'm using these machines as backup servers. I wrote an application =20 > > that generates a list of the thousands of VPS accounts we host. For =20 > > each host, it generates a rsnapshot configuration file and backs up =20 > > up their VPS to these systems via rsync. The application manages =20 > > concurrency and will spawn additional rsync processes if system i/o =20 > > load is below a defined threshhold. Which is to say, I can crank up =20 > > or down the amount of disk IO the system sees. > > > > With vfs.zfs.prefetch_disable=3D0, I can trigger a hang within a few =20 > > hours (no more than a day). If I keep the i/o load (measured via =20 > > iostat) down to a low level (< 200 iops) then I still get hangs but =20 > > less frequently (1-6 days). The only way I have found to prevent =20 > > the hangs is by setting vfs.zfs.prefetch_disable=3D1. >=20 > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" --=20 =C3=9Cdv=C3=B6lettel, Czuczy Gergely Harmless Digital Bt mailto: gergely.czuczy@harmless.hu Tel: +36-30-9702963 --Sig_//.JVBhPRnbKd9=4bfY14EBp Content-Type: application/pgp-signature; name=signature.asc Content-Disposition: attachment; filename=signature.asc -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.3 (FreeBSD) iD8DBQFImW8IzrC0WyuMkpsRAke8AJ0V3tIeTpnA5POMJmWXxb0DW2sjrwCbB5aG s3DJQYYnSDAPXKN4qHeXBns= =1kIB -----END PGP SIGNATURE----- --Sig_//.JVBhPRnbKd9=4bfY14EBp--