Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 21 Jul 2004 09:27:06 -0700
From:      Brooks Davis <brooks@one-eyed-alien.net>
To:        Daniel Lang <dl@leo.org>
Cc:        Peter Jeremy <PeterJeremy@optushome.com.au>
Subject:   Re: NEW TAR
Message-ID:  <20040721162706.GA12760@Odin.AC.HMC.Edu>
In-Reply-To: <20040721151427.GC54664@atrbg11.informatik.tu-muenchen.de>
References:  <40F963D8.6010201@freebsd.org> <20040719060730.GA87697@nagual.pp.ru> <20040720081051.GB3001@cirb503493.alcatel.com.au> <B82A97D5-DA91-11D8-B0C4-000A95C893E4@lassitu.de> <Pine.GSO.4.61.0407211440210.28037@mail.ilrt.bris.ac.uk> <20040721151427.GC54664@atrbg11.informatik.tu-muenchen.de>

next in thread | previous in thread | raw e-mail | index | archive | help

--lrZ03NoBR/3+SXJZ
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Wed, Jul 21, 2004 at 05:14:27PM +0200, Daniel Lang wrote:
> Hi,
>=20
> Jan Grant wrote on Wed, Jul 21, 2004 at 02:44:42PM +0100:
> [..]
> > You're correct, in that filesystem semantics don't require an archiver=
=20
> > to recreate holes. There are storage efficiency gains to be made in=20
> > identifying holes, that's true - particularly in the case of absolutely=
=20
> > whopping but extremely sparse files. In those cases, a simple=20
> > userland-view-of-the-filesystem-semantics approach to ideentifying area=
s=20
> > that _might_ be holes (just for archive efficiency) can still be=20
> > expensive and might involve the scanning of multiple gigabytes of=20
> > "virtual" zeroes.
> >=20
> > Solaris offers an fcntl to identify holes (IIRC) for just this purpose.=
=20
> > If the underlying filesystem can't be made to support it, there's an=20
> > efficiency loss but otherwise it's no great shakes.
>=20
> I don't get it.
>=20
> I assume, that for any consumer it is totally transparent if
> possibly existing chunks of 0-bytes are actually blocks full of
> zeroes or just non-allocated blocks, correct?
>=20
> Second, it is true, that there is a gain in terms of occupied disk
> space, if chunks of zeroes are not allocated at all, correct?
>=20
> So, from my point of view it is totally irrelevant, if a sparse file
> is archived and then extracted, if the areas, which contain zeroes
> are exactly in the same manner consisting of unallocated blocks
> or not.
>=20
> So, all I guess an archiver must do is:
>=20
>  - read the file=20
>  - scan the file for consecutive blocks of zeroes
>  - archive these blocks in an efficient way
>  - on extraction, create a sparse file with the previously
>    identified empty blocks, regardless if these blocks
>    have been 'sparse' blocks in the original file or not.
>=20
> I do not see, why it is important if the original file was sparse
> at all or maybe in different places.

Since sparse files over commit the disk, they should only be created
deliberatly.  Otherwise you can easily get in trouble if you try to use
reserved space later since it won't actually be reserved.  Consider the
case of a file system image created with "dd if=3D/dev/zero ...; newfw
=2E..".  If your archiver decides to be "smart" and restore a copy of that
file sparce and then you use up the availble blocks on your disk you're
going to be in a world of hurt.  I wouldn't be suprised it that resulted
in a panic.

-- Brooks

--=20
Any statement of the form "X is the one, true Y" is FALSE.
PGP fingerprint 655D 519C 26A7 82E7 2529  9BF0 5D8E 8BE9 F238 1AD4

--lrZ03NoBR/3+SXJZ
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQFA/plZXY6L6fI4GtQRAvmQAJ4u+YermbOn0uurNfGxp9YABnGhZACePfRU
1RGVXsw5HhIjR5U7iO/seN0=
=8BZ4
-----END PGP SIGNATURE-----

--lrZ03NoBR/3+SXJZ--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20040721162706.GA12760>