Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 18 Sep 2012 16:40:11 +0300
From:      Daniel Kalchev <daniel@digsys.bg>
To:        Volodymyr Kostyrko <c.kworr@gmail.com>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: AW: AW: AW: AW: AW: ZFS: Corrupted pool metadata after adding vdev to a pool - no opportunity to rescue data from healthy vdevs? Remove a vdev? Rewrite metadata?
Message-ID:  <505879BB.3000806@digsys.bg>
In-Reply-To: <505874E6.2050109@gmail.com>
References:  <001a01cd900d$bcfcc870$36f65950$@goelli.de> <504F282D.8030808@gmail.com> <000a01cd90aa$0a277310$1e765930$@goelli.de> <5050461A.9050608@gmail.com> <000001cd9239$ed734c80$c859e580$@goelli.de> <5052EC5D.4060403@gmail.com> <000a01cd9274$0aa0bba0$1fe232e0$@goelli.de> <505322C9.70200@gmail.com> <000001cd9377$e9e9b010$bdbd1030$@goelli.de> <50559CD8.1070700@gmail.com> <000001cd94f1$a4157030$ec405090$@goelli.de> <50581033.4040102@gmail.com> <50584CC1.3030300@digsys.bg> <505874E6.2050109@gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help


On 18.09.12 16:19, Volodymyr Kostyrko wrote:
> 18.09.2012 13:28, Daniel Kalchev wrote:
>>
>> The problem is that ZFS writes these records (even 128K) aligned to the
>> sector size. So, once you write some data that is under 4k, your pool
>> will become misaligned.
>
> Not exactly. https://blogs.oracle.com/bonwick/entry/space_maps

There is no statement in this post that contradicts with what I 
commented already. I may have been not precise enough -- the 
mis-alignment might happen within the metaslab, not the whole zpool. ZFS 
clearly does not write larger blocks than necessary, the smallest being 
the sector size.

The sector size is represented by the ashift value. Sector size being 
2^ashift. The ashift value is on per-vdev basis and is calculated as the 
largest sector size of the vdev members. So if you create an vdev mirror 
of two drives that report 512byte sectors to the OS, the resulting vdev 
will have ashift=9. If you create an mirror vdev from one drive that 
reports 512b sectors and another that report 4096b sectors, then you 
will have ashift=12.

You do not need to have all vdevs in an zpool having the same ashift 
value (and thus the same sector size).

>
>>> 2. For older drives each drive should be partitioned with respect to
>>> 4k sectors. This is what -a option of gpart does: it aligns created
>>> partitions to 4k sector bounds. But half a year ago I already found
>>> some drives that can auto-shift all disk transactions to optimize read
>>> and write performance. Courtesy of Microsoft Windows, OS that does not
>>> care about anything not written in license terms, same as the users
>>> do, so using this drives would be more straightforward and would not
>>> cause decent pain to IT stuff about realigning partitions the way it
>>> would just work.
>>>
>>
>> This is only hype. There is no way any disk firmware can shift any
>> transactions.
>
> How about Seagate Smart Align? It's documented to do so. I haven't 
> touched any Seagate drives as I don't like them anyway...
>

I have a lot of Seagate drives with 4k sectors in use with ZFS. Despite 
these claims, performance is far worse if writes are not aligned to 4k. 
It is also awful with UFS if you don't care to align partitions. This is 
just marketing. Their rewrite implementation might be better than 
others, but still is better avoided.

Daniel



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?505879BB.3000806>