From owner-svn-src-all@FreeBSD.ORG Tue Dec 7 09:42:48 2010 Return-Path: Delivered-To: svn-src-all@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5A5E0106566C; Tue, 7 Dec 2010 09:42:48 +0000 (UTC) (envelope-from pjd@garage.freebsd.pl) Received: from mail.garage.freebsd.pl (60.wheelsystems.com [83.12.187.60]) by mx1.freebsd.org (Postfix) with ESMTP id C36AF8FC14; Tue, 7 Dec 2010 09:42:46 +0000 (UTC) Received: by mail.garage.freebsd.pl (Postfix, from userid 65534) id 62F8845E49; Tue, 7 Dec 2010 10:42:45 +0100 (CET) Received: from localhost (pdawidek.whl [10.0.1.1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.garage.freebsd.pl (Postfix) with ESMTP id B310D45C9C; Tue, 7 Dec 2010 10:42:39 +0100 (CET) Date: Tue, 7 Dec 2010 10:42:40 +0100 From: Pawel Jakub Dawidek To: John Baldwin Message-ID: <20101207094240.GB1700@garage.freebsd.pl> References: <201012061218.oB6CI3oW032770@svn.freebsd.org> <20101206195327.GD1936@garage.freebsd.pl> <201012061518.49835.jhb@freebsd.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="St7VIuEGZ6dlpu13" Content-Disposition: inline In-Reply-To: <201012061518.49835.jhb@freebsd.org> User-Agent: Mutt/1.4.2.3i X-PGP-Key-URL: http://people.freebsd.org/~pjd/pjd.asc X-OS: FreeBSD 9.0-CURRENT amd64 X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on mail.garage.freebsd.pl X-Spam-Level: X-Spam-Status: No, score=-5.9 required=4.5 tests=ALL_TRUSTED,BAYES_00 autolearn=ham version=3.0.4 Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org, Ivan Voras Subject: Re: svn commit: r216230 - head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Dec 2010 09:42:48 -0000 --St7VIuEGZ6dlpu13 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Dec 06, 2010 at 03:18:49PM -0500, John Baldwin wrote: > On Monday, December 06, 2010 2:53:27 pm Pawel Jakub Dawidek wrote: > > On Mon, Dec 06, 2010 at 08:35:36PM +0100, Ivan Voras wrote: > > > Please persuade me on technical grounds why ashift, a property > > > intended for address alignment, should not be set in this way. If your > > > answer is "I don't know but you are still wrong because I say so" I > > > will respect it and back it out but only until I/we discuss the > > > question with upstream ZFS developers. > >=20 > > No. You persuade me why changing ashift in ZFS, which, as the comment > > clearly states is "device's minimum transfer size" is better and not > > hackish than presenting the disk with properly configured sector size. > > This can not only affect disks that still use 512 bytes sectors, but > > doesn't fix the problem at all. It just works around the problem in ZFS > > when configured on top of raw disks. > >=20 > > What about other file systems? What about other GEOM classes? GELI is > > great example here, as people use ZFS on top of GELI alot. GELI > > integrity verification works in a way that not reporting disk sector > > size properly will have huge negative performance impact. ZFS' ashift > > won't change that. >=20 > I am mostly on your side here, but I wonder if GELI shouldn't prefer the= =20 > stripesize anyway? For example, if you ran GELI on top of RAID-5 I imagi= ne it=20 > would be far more performant for it to use stripe-size logical blocks ins= tead=20 > of individual sectors for the underlying media. Not exactly. GELI with authentication stores checksum in the same sector as data. This way we have less than 512 bytes of data per sector. To still be able to provide power of 2 sectors GELI and not to lose too much space, GELI has to present larger sector to the upper layers. For example with 512 bytes sectors of the underlying provider, GELI presents 4kB sector to the upper layers, but every 4kB GELI sector is build from nine 512 bytes sector of the underlying provider. I'm not sure if my description is readable:) If you are interested, take a look at the top of g_eli_integrity.c. It might be better described in there. > The RAID-5 argument also suggests that other filesystems should probably > prefer stripe sizes to physical sector sizes when picking block sizes, et= c. I'm not so sure. Stripe size of RAID5 tends to be too large to do that. By using 128kB ashift we will lose way too much space when it comes to smaller files and metadata. Stripesize is just a hit what alignment is optimal, but it is optional - consumer can decide to ignore it if we care more about space than performance, for example. Sectorsize on the other hand is not a hint, but really the smallest block a provider can handle. --=20 Pawel Jakub Dawidek http://www.wheelsystems.com pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! --St7VIuEGZ6dlpu13 Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (FreeBSD) iEYEARECAAYFAkz+AY8ACgkQForvXbEpPzQsiACgllXLryvJBUmB0kL+84UCW8nF QucAoMbkjXl44/fOaZA72zK4DGGXOyvY =ZdA3 -----END PGP SIGNATURE----- --St7VIuEGZ6dlpu13--