From owner-freebsd-fs@FreeBSD.ORG  Tue Sep 18 13:19:40 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E3444106566B
	for <freebsd-fs@freebsd.org>; Tue, 18 Sep 2012 13:19:39 +0000 (UTC)
	(envelope-from c.kworr@gmail.com)
Received: from mail-ee0-f54.google.com (mail-ee0-f54.google.com [74.125.83.54])
	by mx1.freebsd.org (Postfix) with ESMTP id 6791A8FC08
	for <freebsd-fs@freebsd.org>; Tue, 18 Sep 2012 13:19:39 +0000 (UTC)
Received: by eeke52 with SMTP id e52so4166540eek.13
	for <freebsd-fs@freebsd.org>; Tue, 18 Sep 2012 06:19:38 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
	h=message-id:date:from:user-agent:mime-version:to:cc:subject
	:references:in-reply-to:content-type:content-transfer-encoding;
	bh=HQcrnhn5FbgWTPGKFJRDpujRWO14AEy0VhnD0ssuKb8=;
	b=D/xmSwUgtuP+qFM6slwzZhNLxL4vDw14leMCp5LZZRBdNAvOoLkA1ait6DdAIcK0b7
	xoGpfzdRPB9N07p9SvF0E7e0n+1oXlm1ZHYdD/CYGa0ygBOhaXxJ0c+hqaoAzIHqTOmi
	2TkN8uwK42aePMphLe2bwg/CMsMFIuusZjRkgzIlG4oCkZtJdko9G0EsYWDTfDleVbvD
	JYQJbgMViyV3FnBkfv1p+P4lq1wE/YxsQ6HY+TPHl6HbDckr/9y5Xz63AdiEHrr3YnNA
	RtNeqccKLwPIkOkAjr3KtU+io+PY5DYOnTknrNBFyJUlnmOyvdIzwToA8P2DCtueG2IJ
	AnLA==
Received: by 10.14.4.198 with SMTP id 46mr215206eej.11.1347974378028;
	Tue, 18 Sep 2012 06:19:38 -0700 (PDT)
Received: from green.local (90-224-132-95.pool.ukrtel.net. [95.132.224.90])
	by mx.google.com with ESMTPS id r45sm36290476eem.6.2012.09.18.06.19.35
	(version=SSLv3 cipher=OTHER); Tue, 18 Sep 2012 06:19:36 -0700 (PDT)
Message-ID: <505874E6.2050109@gmail.com>
Date: Tue, 18 Sep 2012 16:19:34 +0300
From: Volodymyr Kostyrko <c.kworr@gmail.com>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:15.0) Gecko/20120911 Thunderbird/15.0.1
MIME-Version: 1.0
To: Daniel Kalchev <daniel@digsys.bg>
References: <001a01cd900d$bcfcc870$36f65950$@goelli.de>
	<504F282D.8030808@gmail.com>
	<000a01cd90aa$0a277310$1e765930$@goelli.de>
	<5050461A.9050608@gmail.com>
	<000001cd9239$ed734c80$c859e580$@goelli.de>
	<5052EC5D.4060403@gmail.com>
	<000a01cd9274$0aa0bba0$1fe232e0$@goelli.de>
	<505322C9.70200@gmail.com>
	<000001cd9377$e9e9b010$bdbd1030$@goelli.de>
	<50559CD8.1070700@gmail.com>
	<000001cd94f1$a4157030$ec405090$@goelli.de>
	<50581033.4040102@gmail.com> <50584CC1.3030300@digsys.bg>
In-Reply-To: <50584CC1.3030300@digsys.bg>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-fs@freebsd.org
Subject: Re: AW: AW: AW: AW: AW: ZFS: Corrupted pool metadata after adding
 vdev to a pool - no opportunity to rescue data from healthy vdevs? Remove
 a vdev? Rewrite metadata?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 18 Sep 2012 13:19:40 -0000

18.09.2012 13:28, Daniel Kalchev wrote:
>> From my point of view all hype about moving to 4k sectors is highly
>> irrelevant to ZFS and current products on the market.
>>
>> 1. ZFS tends to use big recordsize for storing any data. This means
>> most files on your drives are already stored in 128k sectors. Storing
>> small tails in 512b or 4k sectors shouldn't give big difference.
>
> Truth is, ZFS will write blocks of size from your media sector size up
> to 128K.
>
> The problem is that ZFS writes these records (even 128K) aligned to the
> sector size. So, once you write some data that is under 4k, your pool
> will become misaligned.

Not exactly. https://blogs.oracle.com/bonwick/entry/space_maps

1. ZFS divides the space on each virtual device into a few hundred 
metaslabs.
2. As Metaslabs are quite big so it's quite logical to make them aligned 
with high ashift value (I miss documentations on wheter this is true, 
but at least they should be dividable by 128k as this is default 
recordsize).
3. In each metaslab all space allocation is done through space maps. I 
have no documentation on this one either but due to a presence of gang 
blocks in ZFS specification all new allocation should be aligned to 128k 
if we are allocating 128k block, aligned to 64k if we are allocating 64k 
block and so on (yet again, I miss documentation on wheter this is true, 
but as far I understand Solaris way it's more practical to have data 
aligned then later dealing with it).

I'm bad at reading code so I can't really say how allocations are 
aligned on ZFS metaslabs, but function dealing with metaslab allocation 
takes one 'align' variable.

>> 2. For older drives each drive should be partitioned with respect to
>> 4k sectors. This is what -a option of gpart does: it aligns created
>> partitions to 4k sector bounds. But half a year ago I already found
>> some drives that can auto-shift all disk transactions to optimize read
>> and write performance. Courtesy of Microsoft Windows, OS that does not
>> care about anything not written in license terms, same as the users
>> do, so using this drives would be more straightforward and would not
>> cause decent pain to IT stuff about realigning partitions the way it
>> would just work.
>>
>
> This is only hype. There is no way any disk firmware can shift any
> transactions.

How about Seagate Smart Align? It's documented to do so. I haven't 
touched any Seagate drives as I don't like them anyway...

-- 
Sphinx of black quartz judge my vow.