From owner-freebsd-fs@FreeBSD.ORG Tue Sep 18 13:19:40 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E3444106566B for ; Tue, 18 Sep 2012 13:19:39 +0000 (UTC) (envelope-from c.kworr@gmail.com) Received: from mail-ee0-f54.google.com (mail-ee0-f54.google.com [74.125.83.54]) by mx1.freebsd.org (Postfix) with ESMTP id 6791A8FC08 for ; Tue, 18 Sep 2012 13:19:39 +0000 (UTC) Received: by eeke52 with SMTP id e52so4166540eek.13 for ; Tue, 18 Sep 2012 06:19:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=HQcrnhn5FbgWTPGKFJRDpujRWO14AEy0VhnD0ssuKb8=; b=D/xmSwUgtuP+qFM6slwzZhNLxL4vDw14leMCp5LZZRBdNAvOoLkA1ait6DdAIcK0b7 xoGpfzdRPB9N07p9SvF0E7e0n+1oXlm1ZHYdD/CYGa0ygBOhaXxJ0c+hqaoAzIHqTOmi 2TkN8uwK42aePMphLe2bwg/CMsMFIuusZjRkgzIlG4oCkZtJdko9G0EsYWDTfDleVbvD JYQJbgMViyV3FnBkfv1p+P4lq1wE/YxsQ6HY+TPHl6HbDckr/9y5Xz63AdiEHrr3YnNA RtNeqccKLwPIkOkAjr3KtU+io+PY5DYOnTknrNBFyJUlnmOyvdIzwToA8P2DCtueG2IJ AnLA== Received: by 10.14.4.198 with SMTP id 46mr215206eej.11.1347974378028; Tue, 18 Sep 2012 06:19:38 -0700 (PDT) Received: from green.local (90-224-132-95.pool.ukrtel.net. [95.132.224.90]) by mx.google.com with ESMTPS id r45sm36290476eem.6.2012.09.18.06.19.35 (version=SSLv3 cipher=OTHER); Tue, 18 Sep 2012 06:19:36 -0700 (PDT) Message-ID: <505874E6.2050109@gmail.com> Date: Tue, 18 Sep 2012 16:19:34 +0300 From: Volodymyr Kostyrko User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:15.0) Gecko/20120911 Thunderbird/15.0.1 MIME-Version: 1.0 To: Daniel Kalchev References: <001a01cd900d$bcfcc870$36f65950$@goelli.de> <504F282D.8030808@gmail.com> <000a01cd90aa$0a277310$1e765930$@goelli.de> <5050461A.9050608@gmail.com> <000001cd9239$ed734c80$c859e580$@goelli.de> <5052EC5D.4060403@gmail.com> <000a01cd9274$0aa0bba0$1fe232e0$@goelli.de> <505322C9.70200@gmail.com> <000001cd9377$e9e9b010$bdbd1030$@goelli.de> <50559CD8.1070700@gmail.com> <000001cd94f1$a4157030$ec405090$@goelli.de> <50581033.4040102@gmail.com> <50584CC1.3030300@digsys.bg> In-Reply-To: <50584CC1.3030300@digsys.bg> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org Subject: Re: AW: AW: AW: AW: AW: ZFS: Corrupted pool metadata after adding vdev to a pool - no opportunity to rescue data from healthy vdevs? Remove a vdev? Rewrite metadata? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Sep 2012 13:19:40 -0000 18.09.2012 13:28, Daniel Kalchev wrote: >> From my point of view all hype about moving to 4k sectors is highly >> irrelevant to ZFS and current products on the market. >> >> 1. ZFS tends to use big recordsize for storing any data. This means >> most files on your drives are already stored in 128k sectors. Storing >> small tails in 512b or 4k sectors shouldn't give big difference. > > Truth is, ZFS will write blocks of size from your media sector size up > to 128K. > > The problem is that ZFS writes these records (even 128K) aligned to the > sector size. So, once you write some data that is under 4k, your pool > will become misaligned. Not exactly. https://blogs.oracle.com/bonwick/entry/space_maps 1. ZFS divides the space on each virtual device into a few hundred metaslabs. 2. As Metaslabs are quite big so it's quite logical to make them aligned with high ashift value (I miss documentations on wheter this is true, but at least they should be dividable by 128k as this is default recordsize). 3. In each metaslab all space allocation is done through space maps. I have no documentation on this one either but due to a presence of gang blocks in ZFS specification all new allocation should be aligned to 128k if we are allocating 128k block, aligned to 64k if we are allocating 64k block and so on (yet again, I miss documentation on wheter this is true, but as far I understand Solaris way it's more practical to have data aligned then later dealing with it). I'm bad at reading code so I can't really say how allocations are aligned on ZFS metaslabs, but function dealing with metaslab allocation takes one 'align' variable. >> 2. For older drives each drive should be partitioned with respect to >> 4k sectors. This is what -a option of gpart does: it aligns created >> partitions to 4k sector bounds. But half a year ago I already found >> some drives that can auto-shift all disk transactions to optimize read >> and write performance. Courtesy of Microsoft Windows, OS that does not >> care about anything not written in license terms, same as the users >> do, so using this drives would be more straightforward and would not >> cause decent pain to IT stuff about realigning partitions the way it >> would just work. >> > > This is only hype. There is no way any disk firmware can shift any > transactions. How about Seagate Smart Align? It's documented to do so. I haven't touched any Seagate drives as I don't like them anyway... -- Sphinx of black quartz judge my vow.