From owner-freebsd-fs@FreeBSD.ORG Wed Jul 23 07:50:39 2008 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B19FB106564A for ; Wed, 23 Jul 2008 07:50:39 +0000 (UTC) (envelope-from pjd@garage.freebsd.pl) Received: from mail.garage.freebsd.pl (chello087206045140.chello.pl [87.206.45.140]) by mx1.freebsd.org (Postfix) with ESMTP id 336088FC2D for ; Wed, 23 Jul 2008 07:50:38 +0000 (UTC) (envelope-from pjd@garage.freebsd.pl) Received: by mail.garage.freebsd.pl (Postfix, from userid 65534) id 33AD445CA6; Wed, 23 Jul 2008 09:50:37 +0200 (CEST) Received: from localhost (pjd.wheel.pl [10.0.1.1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.garage.freebsd.pl (Postfix) with ESMTP id 81AC6456AB; Wed, 23 Jul 2008 09:50:24 +0200 (CEST) Date: Wed, 23 Jul 2008 09:50:30 +0200 From: Pawel Jakub Dawidek To: Matt Simerson Message-ID: <20080723075030.GA3603@garage.freebsd.pl> References: <5E8D64DE-EC9B-4B11-BCB4-17BA63650BB7@corp.spry.com> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="ew6BAiZeqk4r7MaW" Content-Disposition: inline In-Reply-To: <5E8D64DE-EC9B-4B11-BCB4-17BA63650BB7@corp.spry.com> User-Agent: Mutt/1.4.2.3i X-PGP-Key-URL: http://people.freebsd.org/~pjd/pjd.asc X-OS: FreeBSD 8.0-CURRENT i386 X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on mail.garage.freebsd.pl X-Spam-Level: X-Spam-Status: No, score=-5.9 required=3.0 tests=ALL_TRUSTED,BAYES_00 autolearn=ham version=3.0.4 Cc: freebsd-fs@freebsd.org Subject: Re: ZFS hang issue and prefetch_disable X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 Jul 2008 07:50:39 -0000 --ew6BAiZeqk4r7MaW Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Jul 22, 2008 at 01:57:27PM -0700, Matt Simerson wrote: > Symptoms >=20 > Deadlocks under heavy IO load on the ZFS file system with =20 > prefetch_disable=3D0. Setting vfs.zfs.prefetch_disable=3D1 results in a = =20 > stable system. >=20 > Configuration >=20 > Two machines. Identically built. Both exhibit identical behavior. > 8 cores (2 x E5420) x 2.5GHz, 16 GB RAM, 24 x 1TB disks. > FreeBSD 7.0 amd64 > dmesg: http://matt.simerson.net/computing/zfs/dmesg.txt Very nice:) > Boot disk is a read only 1GB compact flash > # cat /etc/fstab > /dev/ad0s1a / ufs ro,noatime 2 2 >=20 > # df -h / > Filesystem 1K-blocks Used Avail Capacity Mounted on > /dev/ad0s1a 939M 555M 309M 64% / >=20 > RAM has been boosted as suggested in ZFS Tuning Guide > # cat /boot/loader.conf > vm.kmem_size=3D 1610612736 > vm.kmem_size_max=3D 1610612736 > vfs.zfs.prefetch_disable=3D1 >=20 > I haven't mucked much with the other memory settings as I'm using =20 > amd64 and according to the FreeBSD ZFS wiki, that isn't necessary. =20 > I've tried higher settings for kmem but that resulted in a failed =20 > boot. I have ample RAM And would love to use as much as possible for =20 > network and disk I/O buffers as that's principally all this system does. >=20 > Disks & ZFS options >=20 > Sun's "Best Practices" suggests limiting the number of disks in a =20 > raidz pool to no more than 6-10, IIRC. ZFS is configured as shown:=20 > http://matt.simerson.net/computing/zfs/zpool.txt >=20 > I'm using all of the ZFS default properties except: atime=3Doff, =20 > compression=3Don. >=20 > Environment >=20 > I'm using these machines as backup servers. I wrote an application =20 > that generates a list of the thousands of VPS accounts we host. For =20 > each host, it generates a rsnapshot configuration file and backs up up = =20 > their VPS to these systems via rsync. The application manages =20 > concurrency and will span additional rsync processes if system i/o =20 > load is below a defined thresh-hold. Which is to say, I can crank up =20 > or down the amount of network and disk IO the system sees. >=20 > With vfs.zfs.prefetch_disable=3D1, a hang will occur within a few hours = =20 I guess you wanted '0' here? > (no more than a day). If I keep the i/o load (measured via iostat) =20 > down to a low level (< 200 iops) then I still get hangs but less =20 > frequently (1-6 days). The only way I have found to prevent the hangs = =20 > is by setting vfs.zfs.prefetch_disable=3D1. This is more or less a known problem. It is related to low memory/kva conditions. Alan Cox is working on vm.kmem_size limitation. I saw Kris using ZFS with some very large vm.kmem_size. Not sure if all the code is already committed, but this would be something you should definiatelly try on your hardware. I've also the most recent ZFS version in perforce that is beeing tested by few other guys and I'd like to commit it to HEAD soon (depends on test results of course). There are plenty improvements and some may fix your problem too. BTW. Do you see prefetch helpful for your workloads? I always turn it off on my systems, because it has negative impact on performance, but maybe my hardware is too weak to take advantage out of it. One more thing. There was a small bug in prefetch code, but I've no idea if it is related to hangs you are seeing. If that's not a problem for you, can you try this patch: http://people.freebsd.org/~pjd/patches/dmu_zfetch.c.patch If you want to play with tunning ZFS prefetch, you might find this patches useful (taken from perforce version): http://people.freebsd.org/~pjd/patches/dmu_zfetch.c.2.patch http://people.freebsd.org/~pjd/patches/quad.patch --=20 Pawel Jakub Dawidek http://www.wheel.pl pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! --ew6BAiZeqk4r7MaW Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.4 (FreeBSD) iD8DBQFIhuLFForvXbEpPzQRAgpCAJ0cXQnUcpq4Rnp6muBk0HS0iVEGNgCeL69/ TDT9zL1T0cpNKUSWuOqzz2Y= =Zblm -----END PGP SIGNATURE----- --ew6BAiZeqk4r7MaW--