From owner-freebsd-stable@FreeBSD.ORG  Thu Sep 11 02:26:43 2014
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 658EB948
 for <freebsd-stable@freebsd.org>; Thu, 11 Sep 2014 02:26:43 +0000 (UTC)
Received: from webmail2.jnielsen.net (webmail2.jnielsen.net [50.114.224.20])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client CN "webmail2.jnielsen.net",
 Issuer "freebsdsolutions.net" (not verified))
 by mx1.freebsd.org (Postfix) with ESMTPS id 28E7BCBD
 for <freebsd-stable@freebsd.org>; Thu, 11 Sep 2014 02:26:42 +0000 (UTC)
Received: from [192.168.2.123] (c-50-160-123-105.hsd1.ut.comcast.net
 [50.160.123.105]) (authenticated bits=0)
 by webmail2.jnielsen.net (8.14.9/8.14.9) with ESMTP id s8B2QbxC010931
 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO);
 Wed, 10 Sep 2014 20:26:40 -0600 (MDT)
 (envelope-from lists@jnielsen.net)
X-Authentication-Warning: webmail2.jnielsen.net: Host
 c-50-160-123-105.hsd1.ut.comcast.net [50.160.123.105] claimed to be
 [192.168.2.123]
Content-Type: text/plain;
	charset=us-ascii
Mime-Version: 1.0 (1.0)
Subject: Re: getting to 4K disk blocks in ZFS
From: John Nielsen <lists@jnielsen.net>
X-Mailer: iPhone Mail (11D257)
In-Reply-To: <540FF3C4.6010305@ish.com.au>
Date: Wed, 10 Sep 2014 20:26:35 -0600
Content-Transfer-Encoding: quoted-printable
Message-Id: <A6EEF325-B19F-4CDA-8285-A92BCED4BBEB@jnielsen.net>
References: <540FF3C4.6010305@ish.com.au>
To: Aristedes Maniatis <ari@ish.com.au>
Cc: freebsd-stable <freebsd-stable@freebsd.org>
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable/>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 11 Sep 2014 02:26:43 -0000

> On Sep 10, 2014, at 12:46 AM, Aristedes Maniatis <ari@ish.com.au> wrote:
>=20
> As we all know, it is important to ensure that modern disks are set up pro=
perly with the correct block size. Everything is good if all the disks and t=
he pool are "ashift=3D9" (512 byte blocks). But as soon as one new drive req=
uires 4k blocks, performance drops through the floor of the enture pool.
>=20
>=20
> In order to upgrade there appear to be two separate things that must be do=
ne for a ZFS pool.
>=20
> 1. Create partitions on 4K boundaries. This is simple with the "-a 4k" opt=
ion in gpart, and it isn't hard to remove disks one at a time from a pool, r=
eformat them on the right boundaries and put them back. Hopefully you've lef=
t a few spare bytes on the disk to ensure that your partition doesn't get sm=
aller when you reinsert it to the pool.
>=20
> 2. Create a brand new pool which has ashift=3D12 and zfs send|receive all t=
he data over.
>=20
>=20
> I guess I don't understand enough about zpool to know why the pool itself h=
as a block size, since I understood ZFS to have variable stripe widths.
>=20
> The problem with step 2 is that you need to have enough hard disks spare t=
o create a whole new pool and throw away the old disks. Plus a disk controll=
er with lots of spare ports. Plus the ability to take the system offline for=
 hours or days while the migration happens.
>=20
> One way to reduce this slightly is to create a new pool with reduced redun=
dancy. For example, create a RAIDZ2 with two fake disks, then offline those d=
isks.

Lots of good info in other responses, I just wanted to address this part of y=
our message.

It should be a given that good backups are a requirement before you start an=
y of this. _Especially_ if you have to destroy the old pool in order to prov=
ide redundancy for the new pool.

I have done this ashift conversion and it was a bit of a nail-biting experie=
nce as you've anticipated. The one suggestion I have for improving on the ab=
ove is to use snapshots to minimize the downtime. Get an initial clone of th=
e pool during off-peak hours (if any), then you only need to take the system=
 down to send a "final" differential snapshot.

JN=