From owner-freebsd-fs@FreeBSD.ORG Thu May 3 20:26:17 2007 Return-Path: X-Original-To: freebsd-fs@freebsd.org Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 9B0B616A400 for ; Thu, 3 May 2007 20:26:17 +0000 (UTC) (envelope-from pjd@garage.freebsd.pl) Received: from mail.garage.freebsd.pl (arm132.internetdsl.tpnet.pl [83.17.198.132]) by mx1.freebsd.org (Postfix) with ESMTP id CF79113C484 for ; Thu, 3 May 2007 20:26:16 +0000 (UTC) (envelope-from pjd@garage.freebsd.pl) Received: by mail.garage.freebsd.pl (Postfix, from userid 65534) id B13CF48803; Thu, 3 May 2007 21:07:01 +0200 (CEST) Received: from localhost (154.81.datacomsa.pl [195.34.81.154]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.garage.freebsd.pl (Postfix) with ESMTP id 4C21645B26; Thu, 3 May 2007 21:06:55 +0200 (CEST) Date: Thu, 3 May 2007 21:06:26 +0200 From: Pawel Jakub Dawidek To: Bakul Shah Message-ID: <20070503190626.GB7177@garage.freebsd.pl> References: <20070502052243.485FE5B51@mail.bitblocks.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="zx4FCpZtqtKETZ7O" Content-Disposition: inline In-Reply-To: <20070502052243.485FE5B51@mail.bitblocks.com> X-PGP-Key-URL: http://people.freebsd.org/~pjd/pjd.asc X-OS: FreeBSD 7.0-CURRENT i386 User-Agent: mutt-ng/devel-r804 (FreeBSD) X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on mail.garage.freebsd.pl X-Spam-Level: X-Spam-Status: No, score=-2.6 required=3.0 tests=BAYES_00 autolearn=ham version=3.0.4 Cc: freebsd-fs@freebsd.org Subject: Re: ZFS vs UFS2 overhead and may be a bug? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 03 May 2007 20:26:17 -0000 --zx4FCpZtqtKETZ7O Content-Type: text/plain; charset=iso-8859-2 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, May 01, 2007 at 10:22:43PM -0700, Bakul Shah wrote: > Here is a surprising result for ZFS. >=20 > I ran the following script on both ZFS and UF2 filesystems. >=20 > $ dd SPACY# 10G zero bytes allocated > $ truncate -s 10G HOLEY # no space allocated >=20 > $ time dd /dev/null bs=3D1m # A1 > $ time dd /dev/null bs=3D1m # A2 > $ time cat SPACY >/dev/null # B1 > $ time cat HOLEY >/dev/null # B2 > $ time md5 SPACY # C1 > $ time md5 HOLEY # C2 >=20 > I have summarized the results below. >=20 > ZFS UFS2 > Elapsed System Elapsed System Test > dd SPACY bs=3D1m 110.26 22.52 340.38 19.11 A1 > dd HOLEY bs=3D1m 22.44 22.41 24.24 24.13 A2 >=20 > cat SPACY 119.64 33.04 342.77 17.30 B1 > cat HOLEY 222.85 222.08 22.91 22.41 B2 >=20 > md5 SPACY 210.01 77.46 337.51 25.54 C1=09 > md5 HOLEY 856.39 801.21 82.11 28.31 C2 >=20 >=20 > A1, A2: > Numbers are more or less as expected. When doing large > reads, reading from "holes" takes far less time than from a > real disk. We also see that UFS2 disk is about 3 times > slower for sequential reads. >=20 > B1, B2: > UFS2 numbers are as expected but ZFS numbers for the HOLEY > file are much too high. Why should *not* going to a real > disk cost more? We also see that UFS2 handles holey files 10 > times more efficiently than ZFS! >=20 > C1, C2: > Again UFS2 numbers and C1 numbers for ZFS are as expected. > but C2 numbers for ZFS are very high. md5 uses BLKSIZ (=3D=3D > 1k) size reads and does hardly any other system calls. For > ZFS each syscall takes 76.4 microseconds while UFS2 syscalls > are 2.7 us each! zpool iostat shows there is no IO to the > real disk so this implies that for the HOLEY case zfs read > calls have a significantly higher overhead or there is a bug. >=20 > Basically C tests just confirm what we find in B tests. Interesting. There are two problems. First is that cat(1) uses st_blksize to find out best size of I/O request and we force it to PAGE_SIZE, which is very, very wrong for ZFS - it should be equal to recordsize. I need to find discussion about this: /* * According to www.opengroup.org, the meaning of st_blksize is=20 * "a filesystem-specific preferred I/O block size for this=20 * object. In some filesystem types, this may vary from file * to file" * Default to PAGE_SIZE after much discussion. * XXX: min(PAGE_SIZE, vp->v_bufobj.bo_bsize) may be more * correct. */ sb->st_blksize =3D PAGE_SIZE; For example cp(1) just uses MAXBSIZE, which is also not really good, but at least MAXBSIZE is much bigger than PAGE_SIZE (it's 64kB). So bascially what you observed with cat(1) is equivalent of running dd(1) with bs=3D4k. I tested it on Solaris and this is not FreeBSD-specific problem, the same is on Solaris. Is there a chance you could send your observations to zfs-discuss@opensolaris.org, but just comparsion between dd(1) with bs=3D128k and bs=3D4k (the other tests might be confusing). --=20 Pawel Jakub Dawidek http://www.wheel.pl pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! --zx4FCpZtqtKETZ7O Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (FreeBSD) iD8DBQFGOjKyForvXbEpPzQRAoQwAKDgChwpzr9EYsqBHvY4hqU+Mx1CJQCgy/py mvP2jD6v75vTaL1Cge4kHns= =iCzX -----END PGP SIGNATURE----- --zx4FCpZtqtKETZ7O--