From owner-freebsd-fs@FreeBSD.ORG Tue Jan 29 11:18:54 2013 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 95D167BD for ; Tue, 29 Jan 2013 11:18:54 +0000 (UTC) (envelope-from prvs=1741a054e2=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 381FB208 for ; Tue, 29 Jan 2013 11:18:53 +0000 (UTC) Received: from r2d2 ([188.220.16.49]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50001906339.msg for ; Tue, 29 Jan 2013 11:18:53 +0000 X-Spam-Processed: mail1.multiplay.co.uk, Tue, 29 Jan 2013 11:18:53 +0000 (not processed: message from valid local sender) X-MDRemoteIP: 188.220.16.49 X-Return-Path: prvs=1741a054e2=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: fs@freebsd.org Message-ID: <32655B893F594E9BB0CBDD88C186E27E@multiplay.co.uk> From: "Steven Hartland" To: "Olivier Smedts" , "Adam Nowacki" References: <5105252D.6060502@platinum.linux.pl> <5107A9B7.5030803@platinum.linux.pl> Subject: Re: RAID-Z wasted space - asize roundups to nparity +1 Date: Tue, 29 Jan 2013 11:19:31 -0000 MIME-Version: 1.0 X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: Matthew Ahrens , fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 29 Jan 2013 11:18:54 -0000 ----- Original Message -----=20 From: "Olivier Smedts" > 2013/1/29 Adam Nowacki : >> This brings another issue - recordsize capped at 128KiB. We are using the >> pool for off-line storage of large files (from 50MB to 20GB). Files are >> stored and read sequentially as a whole. With 12 disks in RAID-Z2, 4KiB >> sectors, 128KiB record size and the padding above 9.4% of disk space goes >> completely unused - one whole disk. >> >> Increasing recordsize cap seems trivial enough. On-disk structures and >> kernel code support it already - a single of code had to be changed (#define >> SPA_MAXBLOCKSHIFT - from 17 to 20) to support 1MiB recordsizes. This of >> course breaks compatibility with any other system without this modification. >> With Suns cooperation this could be handled in safe and compatible manner >> via pool version upgrade. Recordsize of 128KiB would remain the default but >> anyone could increase it with zfs set. >=20 > One MB blocksize is already implemented by Oracle with zpool version 32. Oracle is not the upstream, since they went closed source, illumos is our new upstream. It you want to follow the discussion see the thread titled "128K max blocksize in zfs" on developer@lists.illumos.org. Regards Steve =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it.=20 In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-fs@FreeBSD.ORG Tue Jan 29 14:58:23 2013 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id A350E5D2; Tue, 29 Jan 2013 14:58:23 +0000 (UTC) (envelope-from freebsd-listen@fabiankeil.de) Received: from smtprelay05.ispgateway.de (smtprelay05.ispgateway.de [80.67.31.98]) by mx1.freebsd.org (Postfix) with ESMTP id 3746586; Tue, 29 Jan 2013 14:58:23 +0000 (UTC) Received: from [78.35.166.2] (helo=fabiankeil.de) by smtprelay05.ispgateway.de with esmtpsa (SSLv3:AES128-SHA:128) (Exim 4.68) (envelope-from ) id 1U0Cdb-0002ju-CX; Tue, 29 Jan 2013 15:58:15 +0100 Date: Tue, 29 Jan 2013 15:52:50 +0100 From: Fabian Keil To: Dan Nelson Subject: Re: Zpool surgery Message-ID: <20130129155250.29d8f764@fabiankeil.de> In-Reply-To: <20130128214111.GA14888@dan.emsphone.com> References: <20130127103612.GB38645@acme.spoerlein.net> <1F0546C4D94D4CCE9F6BB4C8FA19FFF2@multiplay.co.uk> <20130127201140.GD29105@server.rulingia.com> <20130128085820.GR35868@acme.spoerlein.net> <20130128205802.1ffab53e@fabiankeil.de> <20130128214111.GA14888@dan.emsphone.com> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=PGP-SHA1; boundary="Sig_/WXg2ahZC0rmXbVAQa_iy9g/"; protocol="application/pgp-signature" X-Df-Sender: Nzc1MDY3 Cc: current@freebsd.org, fs@freebsd.org, Ulrich =?UTF-8?B?U3DDtnJsZWlu?= X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 29 Jan 2013 14:58:23 -0000 --Sig_/WXg2ahZC0rmXbVAQa_iy9g/ Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Dan Nelson wrote: > In the last episode (Jan 28), Fabian Keil said: > > Ulrich Sp=C3=B6rlein wrote: > > > On Mon, 2013-01-28 at 07:11:40 +1100, Peter Jeremy wrote: > > > > On 2013-Jan-27 14:31:56 -0000, Steven Hartland wrote: > > > > >----- Original Message -----=20 > > > > >From: "Ulrich Sp=C3=B6rlein" > > > > >> I want to transplant my old zpool tank from a 1TB drive to a new > > > > >> 2TB drive, but *not* use dd(1) or any other cloning mechanism, as > > > > >> the pool was very full very often and is surely severely > > > > >> fragmented. > > > > > > > > > >Cant you just drop the disk in the original machine, set it as a > > > > >mirror then once the mirror process has completed break the mirror > > > > >and remove the 1TB disk. > > > >=20 > > > > That will replicate any fragmentation as well. "zfs send | zfs rec= v" > > > > is the only (current) way to defragment a ZFS pool. > >=20 > > It's not obvious to me why "zpool replace" (or doing it manually) > > would replicate the fragmentation. >=20 > "zpool replace" essentially adds your new disk as a mirror to the parent > vdev, then deletes the original disk when the resilver is done. Since > mirrors are block-identical copies of each other, the new disk will conta= in > an exact copy of the original disk, followed by 1TB of freespace. Thanks for the explanation. I was under the impression that zfs mirrors worked at a higher level than traditional mirrors like gmirror but there seems to be indeed less magic than I expected. Fabian --Sig_/WXg2ahZC0rmXbVAQa_iy9g/ Content-Type: application/pgp-signature; name=signature.asc Content-Disposition: attachment; filename=signature.asc -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (FreeBSD) iEYEARECAAYFAlEH4kgACgkQBYqIVf93VJ1Z4ACgsP2gJkFDDqwImnab1rnKF5Xu gc8AoJuwpBMZrXVyX8ZSboeS6co0PHOk =8PGU -----END PGP SIGNATURE----- --Sig_/WXg2ahZC0rmXbVAQa_iy9g/--