From owner-freebsd-questions@freebsd.org Fri Sep 11 14:16:50 2015 Return-Path: Delivered-To: freebsd-questions@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id E4A359CC6BE for ; Fri, 11 Sep 2015 14:16:49 +0000 (UTC) (envelope-from milios@ccsys.com) Received: from cargobay.net (cargobay.net [198.178.123.147]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id C528C1138 for ; Fri, 11 Sep 2015 14:16:49 +0000 (UTC) (envelope-from milios@ccsys.com) Received: from [192.168.0.4] (cblmdm72-240-160-19.buckeyecom.net [72.240.160.19]) by cargobay.net (Postfix) with ESMTPSA id CB4CC6BE; Fri, 11 Sep 2015 14:12:11 +0000 (UTC) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (1.0) Subject: Re: followup storage question From: "Chad J. Milios" X-Mailer: iPhone Mail (12H321) In-Reply-To: <55F2D086.6060509@hiwaay.net> Date: Fri, 11 Sep 2015 10:16:46 -0400 Cc: FreeBSD Questions !!!! Content-Transfer-Encoding: quoted-printable Message-Id: <3B589E85-4C75-4021-9B37-E022BC33AFA4@ccsys.com> References: <55F2D086.6060509@hiwaay.net> To: "William A. Mahaffey III" X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 Sep 2015 14:16:50 -0000 > On Sep 11, 2015, at 8:59 AM, William A. Mahaffey III wrot= e: >=20 >=20 >=20 > The Wiki page https://wiki.freebsd.org/RootOnZFS/GPTZFSBoot/9.0-RELEASE il= lustrates using gnop to enforce 4K alignment of gpt partitions for subsequen= t use by ZFS. However the gpart commands also use the '-a 4k' arguments, ali= gning partitions on 4k boundaries as I understand things. Is the gnop comman= d also necessary ? TIA & have a nice weekend. >=20 >=20 > --=20 >=20 > William A. Mahaffey III Yes, handling separately both facets of the same underlying issue is necessa= ry. Those facets being the partition's alignment upon the outer device and t= he partition's block size that the device node reports to ZFS. The latter can be done a different way, effectively, in later versions of Fre= eBSD there is a sysctl, vfs.zfs.min_auto_ashift which you can set to 12 for 4= 096 byte blocks or 9 for the default 512 bytes. (The ashift value is the exp= onent over the number 2 to get the number of bytes in a block.) The old gnop way still works just fine so I still use that method, personall= y. This definitely only has to be done when vdev(s) are added/created/replac= ed* on the pool, not on every mount/import, by then ZFS clearly listens to t= he formatting metadata it stamped on the vdev instead of what the ioctls of t= he device node say and so will always write larger and correctly aligned blo= cks. (I'm not sure the reverse direction, not a typical use, if it holds tru= e without gnop every time, and I know the min_auto_ashift won't help there, b= eing if for some reason you intend gnop for simulating smaller blocks to ZFS= from larger device node blocks, say you wanted to allow a certain amount of= write amplification for more efficiently storing lots of small files/direct= ories/metadata. In that case you may need to enable the gnop every time. I'm= not sure because I don't run any pools that may but I know you can if you w= ant for that reason, space overhead. It'd take some testing and actual measu= rement for me to confidently decide gnop can be subsequently skipped after t= he vdev initialization if going in that opposite direction was your goal. Ma= ybe someone chimes in here to let us know for sure. At any rate, gnop is by i= ts nature just about the fastest and lightest geom class under the sun and I= believe you can keep running thousands of instances busily in production an= d see no noticeable overhead.) *Yes, mind the gnop or sysctl for ashift whenever replacing as well, it's a v= dev property not copied as part of the data resilvering, it's decided by ZFS= for each vdev independently even though having mixed pools seems totally un= intuitive. I've seen where it's been forgotten at replace time. Then when yo= u do use it, it's sort of a pain to get gnop/ZFS to relinquish the vdev if y= ou do an online replace and then want to try to clear off the gnop mode. I'd= just leave it on there and upon reboot it'll disappear and ZFS will pick up= the real vdev and properly do what you want with it. There should be no pro= blem with years of uptime in the meantime and then coming up slightly differ= ently on next boot bypassing gnop and with all correct ashift.=