From owner-freebsd-stable@FreeBSD.ORG Thu Sep 11 02:26:43 2014 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 658EB948 for ; Thu, 11 Sep 2014 02:26:43 +0000 (UTC) Received: from webmail2.jnielsen.net (webmail2.jnielsen.net [50.114.224.20]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "webmail2.jnielsen.net", Issuer "freebsdsolutions.net" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 28E7BCBD for ; Thu, 11 Sep 2014 02:26:42 +0000 (UTC) Received: from [192.168.2.123] (c-50-160-123-105.hsd1.ut.comcast.net [50.160.123.105]) (authenticated bits=0) by webmail2.jnielsen.net (8.14.9/8.14.9) with ESMTP id s8B2QbxC010931 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO); Wed, 10 Sep 2014 20:26:40 -0600 (MDT) (envelope-from lists@jnielsen.net) X-Authentication-Warning: webmail2.jnielsen.net: Host c-50-160-123-105.hsd1.ut.comcast.net [50.160.123.105] claimed to be [192.168.2.123] Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (1.0) Subject: Re: getting to 4K disk blocks in ZFS From: John Nielsen X-Mailer: iPhone Mail (11D257) In-Reply-To: <540FF3C4.6010305@ish.com.au> Date: Wed, 10 Sep 2014 20:26:35 -0600 Content-Transfer-Encoding: quoted-printable Message-Id: References: <540FF3C4.6010305@ish.com.au> To: Aristedes Maniatis Cc: freebsd-stable X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 11 Sep 2014 02:26:43 -0000 > On Sep 10, 2014, at 12:46 AM, Aristedes Maniatis wrote: >=20 > As we all know, it is important to ensure that modern disks are set up pro= perly with the correct block size. Everything is good if all the disks and t= he pool are "ashift=3D9" (512 byte blocks). But as soon as one new drive req= uires 4k blocks, performance drops through the floor of the enture pool. >=20 >=20 > In order to upgrade there appear to be two separate things that must be do= ne for a ZFS pool. >=20 > 1. Create partitions on 4K boundaries. This is simple with the "-a 4k" opt= ion in gpart, and it isn't hard to remove disks one at a time from a pool, r= eformat them on the right boundaries and put them back. Hopefully you've lef= t a few spare bytes on the disk to ensure that your partition doesn't get sm= aller when you reinsert it to the pool. >=20 > 2. Create a brand new pool which has ashift=3D12 and zfs send|receive all t= he data over. >=20 >=20 > I guess I don't understand enough about zpool to know why the pool itself h= as a block size, since I understood ZFS to have variable stripe widths. >=20 > The problem with step 2 is that you need to have enough hard disks spare t= o create a whole new pool and throw away the old disks. Plus a disk controll= er with lots of spare ports. Plus the ability to take the system offline for= hours or days while the migration happens. >=20 > One way to reduce this slightly is to create a new pool with reduced redun= dancy. For example, create a RAIDZ2 with two fake disks, then offline those d= isks. Lots of good info in other responses, I just wanted to address this part of y= our message. It should be a given that good backups are a requirement before you start an= y of this. _Especially_ if you have to destroy the old pool in order to prov= ide redundancy for the new pool. I have done this ashift conversion and it was a bit of a nail-biting experie= nce as you've anticipated. The one suggestion I have for improving on the ab= ove is to use snapshots to minimize the downtime. Get an initial clone of th= e pool during off-peak hours (if any), then you only need to take the system= down to send a "final" differential snapshot. JN=