Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 29 Jan 2013 11:19:31 -0000
From:      "Steven Hartland" <killing@multiplay.co.uk>
To:        "Olivier Smedts" <olivier@gid0.org>, "Adam Nowacki" <nowakpl@platinum.linux.pl>
Cc:        Matthew Ahrens <mahrens@delphix.com>, fs@freebsd.org
Subject:   Re: RAID-Z wasted space - asize roundups to nparity +1
Message-ID:  <32655B893F594E9BB0CBDD88C186E27E@multiplay.co.uk>
References:  <5105252D.6060502@platinum.linux.pl> <CAJjvXiEQSqnKYP75crTkgVqLKSk92q9UTikFtdyPHmF6shJFbg@mail.gmail.com> <5107A9B7.5030803@platinum.linux.pl> <CABzXLYMOK0ZDeDw95se1LZShaCowH1k7CZV7vkHdbkmwbZ9eDQ@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
----- Original Message -----=20
From: "Olivier Smedts" <olivier@gid0.org>


> 2013/1/29 Adam Nowacki <nowakpl@platinum.linux.pl>:
>> This brings another issue - recordsize capped at 128KiB. We are using the
>> pool for off-line storage of large files (from 50MB to 20GB). Files are
>> stored and read sequentially as a whole. With 12 disks in RAID-Z2, 4KiB
>> sectors, 128KiB record size and the padding above 9.4% of disk space goes
>> completely unused - one whole disk.
>>
>> Increasing recordsize cap seems trivial enough. On-disk structures and
>> kernel code support it already - a single of code had to be changed (#define
>> SPA_MAXBLOCKSHIFT - from 17 to 20) to support 1MiB recordsizes. This of
>> course breaks compatibility with any other system without this modification.
>> With Suns cooperation this could be handled in safe and compatible manner
>> via pool version upgrade. Recordsize of 128KiB would remain the default but
>> anyone could increase it with zfs set.
>=20
> One MB blocksize is already implemented by Oracle with zpool version 32.

Oracle is not the upstream, since they went closed source, illumos is our new
upstream.

It you want to follow the discussion see the thread titled "128K max blocksize in
zfs" on developer@lists.illumos.org.

    Regards
    Steve

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it.=20

In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337
or return the E.mail to postmaster@multiplay.co.uk.
From owner-freebsd-fs@FreeBSD.ORG  Tue Jan 29 14:58:23 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id A350E5D2;
 Tue, 29 Jan 2013 14:58:23 +0000 (UTC)
 (envelope-from freebsd-listen@fabiankeil.de)
Received: from smtprelay05.ispgateway.de (smtprelay05.ispgateway.de
 [80.67.31.98]) by mx1.freebsd.org (Postfix) with ESMTP id 3746586;
 Tue, 29 Jan 2013 14:58:23 +0000 (UTC)
Received: from [78.35.166.2] (helo=fabiankeil.de)
 by smtprelay05.ispgateway.de with esmtpsa (SSLv3:AES128-SHA:128)
 (Exim 4.68) (envelope-from <freebsd-listen@fabiankeil.de>)
 id 1U0Cdb-0002ju-CX; Tue, 29 Jan 2013 15:58:15 +0100
Date: Tue, 29 Jan 2013 15:52:50 +0100
From: Fabian Keil <freebsd-listen@fabiankeil.de>
To: Dan Nelson <dnelson@allantgroup.com>
Subject: Re: Zpool surgery
Message-ID: <20130129155250.29d8f764@fabiankeil.de>
In-Reply-To: <20130128214111.GA14888@dan.emsphone.com>
References: <20130127103612.GB38645@acme.spoerlein.net>
 <1F0546C4D94D4CCE9F6BB4C8FA19FFF2@multiplay.co.uk>
 <20130127201140.GD29105@server.rulingia.com>
 <20130128085820.GR35868@acme.spoerlein.net>
 <20130128205802.1ffab53e@fabiankeil.de>
 <20130128214111.GA14888@dan.emsphone.com>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=PGP-SHA1;
 boundary="Sig_/WXg2ahZC0rmXbVAQa_iy9g/"; protocol="application/pgp-signature"
X-Df-Sender: Nzc1MDY3
Cc: current@freebsd.org, fs@freebsd.org,
 Ulrich =?UTF-8?B?U3DDtnJsZWlu?= <uqs@freebsd.org>
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>;
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 29 Jan 2013 14:58:23 -0000

--Sig_/WXg2ahZC0rmXbVAQa_iy9g/
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Dan Nelson <dnelson@allantgroup.com> wrote:

> In the last episode (Jan 28), Fabian Keil said:
> > Ulrich Sp=C3=B6rlein <uqs@FreeBSD.org> wrote:
> > > On Mon, 2013-01-28 at 07:11:40 +1100, Peter Jeremy wrote:
> > > > On 2013-Jan-27 14:31:56 -0000, Steven Hartland <killing@multiplay.c=
o.uk> wrote:
> > > > >----- Original Message -----=20
> > > > >From: "Ulrich Sp=C3=B6rlein" <uqs@FreeBSD.org>
> > > > >> I want to transplant my old zpool tank from a 1TB drive to a new
> > > > >> 2TB drive, but *not* use dd(1) or any other cloning mechanism, as
> > > > >> the pool was very full very often and is surely severely
> > > > >> fragmented.
> > > > >
> > > > >Cant you just drop the disk in the original machine, set it as a
> > > > >mirror then once the mirror process has completed break the mirror
> > > > >and remove the 1TB disk.
> > > >=20
> > > > That will replicate any fragmentation as well.  "zfs send | zfs rec=
v"
> > > > is the only (current) way to defragment a ZFS pool.
> >=20
> > It's not obvious to me why "zpool replace" (or doing it manually)
> > would replicate the fragmentation.
>=20
> "zpool replace" essentially adds your new disk as a mirror to the parent
> vdev, then deletes the original disk when the resilver is done.  Since
> mirrors are block-identical copies of each other, the new disk will conta=
in
> an exact copy of the original disk, followed by 1TB of freespace.

Thanks for the explanation.

I was under the impression that zfs mirrors worked at a higher
level than traditional mirrors like gmirror but there seems to
be indeed less magic than I expected.

Fabian

--Sig_/WXg2ahZC0rmXbVAQa_iy9g/
Content-Type: application/pgp-signature; name=signature.asc
Content-Disposition: attachment; filename=signature.asc

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (FreeBSD)

iEYEARECAAYFAlEH4kgACgkQBYqIVf93VJ1Z4ACgsP2gJkFDDqwImnab1rnKF5Xu
gc8AoJuwpBMZrXVyX8ZSboeS6co0PHOk
=8PGU
-----END PGP SIGNATURE-----

--Sig_/WXg2ahZC0rmXbVAQa_iy9g/--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?32655B893F594E9BB0CBDD88C186E27E>