Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 31 Oct 2000 01:07:48 -0800 (PST)
From:      Matt Dillon <dillon@earth.backplane.com>
To:        Terry Lambert <tlambert@primenet.com>
Cc:        ryan@sasknow.com (Ryan Thompson), freebsd-hackers@FreeBSD.ORG
Subject:   Re: Filesystem holes
Message-ID:  <200010310907.e9V97mk17233@earth.backplane.com>
References:   <200010310847.BAA28086@usr02.primenet.com>

next in thread | previous in thread | raw e-mail | index | archive | help

:
:>     Ahh.. yes, I know.  I'm a filesystem expert :-)  However, that said, I
:>     will tell you quite frankly that virtually *nobody* depends on holes
:>     for efficient storage.  There are only a few problems where it's
:>     practical.... some forms of executables, and sparse matrixes.  That's
:>     pretty much it.
:
:Your master password database.  Most sendmail maps.  Anything
:else that uses the Berkeley DB, like message catalog files,
:locales, etc..

    Actually less then you think.  Even though DB utilizes the concept
    of a sparse file, if you check carefully you will note that your
    sendmail maps and password file databases aren't actually all
    that sparse.  With an 8K filesystem block size the default DB
    multiplier usually results in virtually no sparseness at all.  It
    takes tuning to get any sort of sparseness and even then you don't
    get much benefit from it.  The only real benefit is in the hash table
    size factor ... the hash array may be sparse, but the physical file
    underlying it probably isn't.

    Not even USENET news history files, which can run into the gigabytes,
    actually wind up being that sparse.

    Also keep in mind that Berkeley DB is very old, even with all the
    rewrites, and the original hash algorithm was chosen for expediency
    rather then for the 'best' efficiency.  Our current DB library has
    a btree access method, and for big data sets it works a whole lot better
    then the hash method.  It doesn't require tuning, for one.

:Frankly, sparse files have a huge number of uses, particularly
:when applied to persistant storage of data of the kind you'd
:see in chapter 5, section 5.4.x and chapter 6 in Knuth's.
:
:Plus your FFS filesystem itself is a sparse matrix.  It'd be
:real useful to be able to "free up holes" in a file, if I
:wanted to use one to do user space work on an FS design, for
:example, a log structured FS, where I wanted to be able to
:experiment with a "cleaner" process that recovered extents.
:
:I'd actually be able to tell real quickly whether it was
:working by just setting an allocation range that I expect
:my iterative testing to stay within (if it goes over or under
:the range while I'm moving stuff around and cleaning at the
:same time, I'll know there's a bug in my daemon).
:
:Personally, I'm not rich enough to be able to burn disk space
:so easily.
:					Terry Lambert
:					terry@lambert.org

    I agree that sparse files should not be discarded out of hand.
    There are very few real problems that need them though.  I can
    think of a handful, all very specialized.... for example, the
    VN device uses the concept of a sparse-file (actually a sparse
    backing for the filesystem layer), as do Kirk's softupdates
    snapshots (I believe).  Sparse matrixes are the big math problem
    that benefit, but only because the solution to a sparse matrix problem
    is not even close to random so the sparse matrix winds up still being
    sparse all the way to the end of the solution.  But these are extremely
    specialized problems, not general datastore problems, and nearly
    all of these problems are tuned (or inherently) specific to the block
    size of the underlying system.  Using a sparse file for general data
    store just isn't all that hot an idea, because by its very nature data
    store is, well, storing data.  Packing it is usually the way to go.

					    -Matt



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200010310907.e9V97mk17233>