Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 3 May 2007 21:06:26 +0200
From:      Pawel Jakub Dawidek <pjd@FreeBSD.org>
To:        Bakul Shah <bakul@bitblocks.com>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: ZFS vs UFS2 overhead and may be a bug?
Message-ID:  <20070503190626.GB7177@garage.freebsd.pl>
In-Reply-To: <20070502052243.485FE5B51@mail.bitblocks.com>
References:  <20070502052243.485FE5B51@mail.bitblocks.com>

next in thread | previous in thread | raw e-mail | index | archive | help

--zx4FCpZtqtKETZ7O
Content-Type: text/plain; charset=iso-8859-2
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Tue, May 01, 2007 at 10:22:43PM -0700, Bakul Shah wrote:
> Here is a surprising result for ZFS.
>=20
> I ran the following script on both ZFS and UF2 filesystems.
>=20
> $ dd </dev/zero bs=3D1m count=3D10240 >SPACY# 10G zero bytes allocated
> $ truncate -s 10G HOLEY			# no space allocated
>=20
> $ time dd <SPACY >/dev/null bs=3D1m	# A1
> $ time dd <HOLEY >/dev/null bs=3D1m	# A2
> $ time cat SPACY >/dev/null		# B1
> $ time cat HOLEY >/dev/null		# B2
> $ time md5 SPACY			# C1
> $ time md5 HOLEY			# C2
>=20
> I have summarized the results below.
>=20
> 		      ZFS	     UFS2
> 		Elapsed System	Elapsed	System	       Test
> dd SPACY bs=3D1m  110.26   22.52	340.38	 19.11		A1
> dd HOLEY bs=3D1m   22.44   22.41	 24.24	 24.13		A2
>=20
> cat SPACY	119.64   33.04	342.77	 17.30		B1
> cat HOLEY	222.85  222.08	 22.91	 22.41		B2
>=20
> md5 SPACY	210.01	 77.46	337.51	 25.54		C1=09
> md5 HOLEY	856.39	801.21	 82.11	 28.31		C2
>=20
>=20
> A1, A2:
> Numbers are more or less as expected.  When doing large
> reads, reading from "holes" takes far less time than from a
> real disk.  We also see that UFS2 disk is about 3 times
> slower for sequential reads.
>=20
> B1, B2:
> UFS2 numbers are as expected but ZFS numbers for the HOLEY
> file are much too high.  Why should *not* going to a real
> disk cost more?  We also see that UFS2 handles holey files 10
> times more efficiently than ZFS!
>=20
> C1, C2:
> Again UFS2 numbers and C1 numbers for ZFS are as expected.
> but C2 numbers for ZFS are very high.  md5 uses BLKSIZ (=3D=3D
> 1k) size reads and does hardly any other system calls.  For
> ZFS each syscall takes 76.4 microseconds while UFS2 syscalls
> are 2.7 us each!  zpool iostat shows there is no IO to the
> real disk so this implies that for the HOLEY case zfs read
> calls have a significantly higher overhead or there is a bug.
>=20
> Basically C tests just confirm what we find in B tests.

Interesting. There are two problems. First is that cat(1) uses
st_blksize to find out best size of I/O request and we force it to
PAGE_SIZE, which is very, very wrong for ZFS - it should be equal to
recordsize. I need to find discussion about this:

	/*
	 * According to www.opengroup.org, the meaning of st_blksize is=20
	 *   "a filesystem-specific preferred I/O block size for this=20
	 *    object.  In some filesystem types, this may vary from file
	 *    to file"
	 * Default to PAGE_SIZE after much discussion.
	 * XXX: min(PAGE_SIZE, vp->v_bufobj.bo_bsize) may be more
	 * correct.
	 */

	sb->st_blksize =3D PAGE_SIZE;

For example cp(1) just uses MAXBSIZE, which is also not really good, but
at least MAXBSIZE is much bigger than PAGE_SIZE (it's 64kB).

So bascially what you observed with cat(1) is equivalent of running
dd(1) with bs=3D4k.

I tested it on Solaris and this is not FreeBSD-specific problem, the
same is on Solaris. Is there a chance you could send your observations
to zfs-discuss@opensolaris.org, but just comparsion between dd(1) with
bs=3D128k and bs=3D4k (the other tests might be confusing).

--=20
Pawel Jakub Dawidek                       http://www.wheel.pl
pjd@FreeBSD.org                           http://www.FreeBSD.org
FreeBSD committer                         Am I Evil? Yes, I Am!

--zx4FCpZtqtKETZ7O
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (FreeBSD)

iD8DBQFGOjKyForvXbEpPzQRAoQwAKDgChwpzr9EYsqBHvY4hqU+Mx1CJQCgy/py
mvP2jD6v75vTaL1Cge4kHns=
=iCzX
-----END PGP SIGNATURE-----

--zx4FCpZtqtKETZ7O--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070503190626.GB7177>