FreeBSD Mail Archives

Date:      Sun, 29 Oct 2000 19:16:44 -0800 (PST)
From:      Matt Dillon <dillon@earth.backplane.com>
To:        Ryan Thompson <ryan@sasknow.com>
Cc:        freebsd-hackers@FreeBSD.ORG
Subject:   Re: Filesystem holes
Message-ID:  <200010300316.e9U3GiP72404@earth.backplane.com>
References:   <Pine.BSF.4.21.0010291907320.27883-100000@ren.sasknow.com>


:Hi, Matt!  Thanks for the replies.  I'll try and keep you interested ;-)
:
:Hmm... Perhaps you're still missing my original point?  I'm talking about
:a file with 96GB in addressable bytes (well, probably a bunch of files,
:given logical filesize limitations, but let's say for simplicity's sake
:that we have a single file).  It's actual "size" (in terms of allocated
:blocks) will be only a bit larger than 2.1GB.  (Directly proportional to
:the used size of the dataset.  Discrepancies only come into play when
:record size != block size, but that can be worked around somewhat)

    Ah, ok... that's better, though I think you will find yourself
    tuning it endlessly if the blocksize does not match-up.  Remember,
    the filesystem allocates 8K per block, so if your record size is
    1500 bytes and you have a random distribution, you are going to
    wind up eating 8-14GB.

:In other words, ls -ls will report the "size" as some ridiculously large
:number, will show a much smaller block count.  So, assuming four records
:are added to the file on block boundaries, the file will actually only use
:four blocks... nowhere near 96GB!
:
:In the UNIX filesystem (ya, I know.. just pick one :-), size of file !=
:space allocated for file.  Thus, my original questions were centered
:around filesystem holes.  I.e., non-allocated chunks in the middle of a
:file.  When trying to READ from within a hole, the kernel just sends back
:a buffer of zeros... which is enough to show that the record is not
:initialized.  Actually, something like an "exists" function for a record
:wouldn't touch the disk at all!  When writing to a hole, the kernel simply
:allocates the necessary block(s).  This is really fast, too, for creation,
:as the empty set can be written to disk with touch(1), and uses far less
:memory than virtual initialization or memory structures ;-)
:
:As an example, try 

    Ahh.. yes, I know.  I'm a filesystem expert :-)  However, that said, I
    will tell you quite frankly that virtually *nobody* depends on holes
    for efficient storage.  There are only a few problems where it's
    practical.... some forms of executables, and sparse matrixes.  That's
    pretty much it.

:And analyze the reported filesize, as well as the reported block count of
:the file.  You should see a 2GB "size", and maybe 8K in allocated blocks
:(depends how you ran newfs ;-).  This is behaviour that has been around
:since the original UFS.

    It's not a good idea to use a small block size in UFS.  The minimum
    is pretty much 1K frag, 8K block.  for a sparse file this means 8K
    is the minimum that will be allocated (frags are only allocated for
    the end of a file).

    If you think the btree idea is too complex to implement quickly, then
    try using a radix tree (essentially what you said in regards to 
    breaking the hash you calculate into a node traversal).  For your
    problem I would recommend 64 elements per node, which corresponds to
    6 bits.  16 million records would fit in 6x4 = 4 levels.  If you
    cache the first three levels in memory you eat 64^3 = 262144 x
    sizeof(record).  Assuming a simple file offset for the record, you eat
    exactly 1MB of memory, 64MB for the file index, and your data can
    be placed sequentially in the file. 

:- Ryan

					-Matt


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200010300316.e9U3GiP72404>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation