From owner-freebsd-fs@FreeBSD.ORG Tue Sep 18 13:40:22 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 6B638106564A for ; Tue, 18 Sep 2012 13:40:22 +0000 (UTC) (envelope-from daniel@digsys.bg) Received: from smtp-sofia.digsys.bg (smtp-sofia.digsys.bg [193.68.3.230]) by mx1.freebsd.org (Postfix) with ESMTP id DCAF68FC08 for ; Tue, 18 Sep 2012 13:40:21 +0000 (UTC) Received: from dcave.digsys.bg (dcave.digsys.bg [192.92.129.5]) (authenticated bits=0) by smtp-sofia.digsys.bg (8.14.5/8.14.5) with ESMTP id q8IDeBn6022485 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Tue, 18 Sep 2012 16:40:12 +0300 (EEST) (envelope-from daniel@digsys.bg) Message-ID: <505879BB.3000806@digsys.bg> Date: Tue, 18 Sep 2012 16:40:11 +0300 From: Daniel Kalchev User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:10.0.7) Gecko/20120918 Thunderbird/10.0.7 MIME-Version: 1.0 To: Volodymyr Kostyrko References: <001a01cd900d$bcfcc870$36f65950$@goelli.de> <504F282D.8030808@gmail.com> <000a01cd90aa$0a277310$1e765930$@goelli.de> <5050461A.9050608@gmail.com> <000001cd9239$ed734c80$c859e580$@goelli.de> <5052EC5D.4060403@gmail.com> <000a01cd9274$0aa0bba0$1fe232e0$@goelli.de> <505322C9.70200@gmail.com> <000001cd9377$e9e9b010$bdbd1030$@goelli.de> <50559CD8.1070700@gmail.com> <000001cd94f1$a4157030$ec405090$@goelli.de> <50581033.4040102@gmail.com> <50584CC1.3030300@digsys.bg> <505874E6.2050109@gmail.com> In-Reply-To: <505874E6.2050109@gmail.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org Subject: Re: AW: AW: AW: AW: AW: ZFS: Corrupted pool metadata after adding vdev to a pool - no opportunity to rescue data from healthy vdevs? Remove a vdev? Rewrite metadata? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Sep 2012 13:40:22 -0000 On 18.09.12 16:19, Volodymyr Kostyrko wrote: > 18.09.2012 13:28, Daniel Kalchev wrote: >> >> The problem is that ZFS writes these records (even 128K) aligned to the >> sector size. So, once you write some data that is under 4k, your pool >> will become misaligned. > > Not exactly. https://blogs.oracle.com/bonwick/entry/space_maps There is no statement in this post that contradicts with what I commented already. I may have been not precise enough -- the mis-alignment might happen within the metaslab, not the whole zpool. ZFS clearly does not write larger blocks than necessary, the smallest being the sector size. The sector size is represented by the ashift value. Sector size being 2^ashift. The ashift value is on per-vdev basis and is calculated as the largest sector size of the vdev members. So if you create an vdev mirror of two drives that report 512byte sectors to the OS, the resulting vdev will have ashift=9. If you create an mirror vdev from one drive that reports 512b sectors and another that report 4096b sectors, then you will have ashift=12. You do not need to have all vdevs in an zpool having the same ashift value (and thus the same sector size). > >>> 2. For older drives each drive should be partitioned with respect to >>> 4k sectors. This is what -a option of gpart does: it aligns created >>> partitions to 4k sector bounds. But half a year ago I already found >>> some drives that can auto-shift all disk transactions to optimize read >>> and write performance. Courtesy of Microsoft Windows, OS that does not >>> care about anything not written in license terms, same as the users >>> do, so using this drives would be more straightforward and would not >>> cause decent pain to IT stuff about realigning partitions the way it >>> would just work. >>> >> >> This is only hype. There is no way any disk firmware can shift any >> transactions. > > How about Seagate Smart Align? It's documented to do so. I haven't > touched any Seagate drives as I don't like them anyway... > I have a lot of Seagate drives with 4k sectors in use with ZFS. Despite these claims, performance is far worse if writes are not aligned to 4k. It is also awful with UFS if you don't care to align partitions. This is just marketing. Their rewrite implementation might be better than others, but still is better avoided. Daniel