Date: Sun, 29 Oct 2000 19:16:44 -0800 (PST) From: Matt Dillon <dillon@earth.backplane.com> To: Ryan Thompson <ryan@sasknow.com> Cc: freebsd-hackers@FreeBSD.ORG Subject: Re: Filesystem holes Message-ID: <200010300316.e9U3GiP72404@earth.backplane.com> References: <Pine.BSF.4.21.0010291907320.27883-100000@ren.sasknow.com>
next in thread | previous in thread | raw e-mail | index | archive | help
:Hi, Matt! Thanks for the replies. I'll try and keep you interested ;-)
:
:Hmm... Perhaps you're still missing my original point? I'm talking about
:a file with 96GB in addressable bytes (well, probably a bunch of files,
:given logical filesize limitations, but let's say for simplicity's sake
:that we have a single file). It's actual "size" (in terms of allocated
:blocks) will be only a bit larger than 2.1GB. (Directly proportional to
:the used size of the dataset. Discrepancies only come into play when
:record size != block size, but that can be worked around somewhat)
Ah, ok... that's better, though I think you will find yourself
tuning it endlessly if the blocksize does not match-up. Remember,
the filesystem allocates 8K per block, so if your record size is
1500 bytes and you have a random distribution, you are going to
wind up eating 8-14GB.
:In other words, ls -ls will report the "size" as some ridiculously large
:number, will show a much smaller block count. So, assuming four records
:are added to the file on block boundaries, the file will actually only use
:four blocks... nowhere near 96GB!
:
:In the UNIX filesystem (ya, I know.. just pick one :-), size of file !=
:space allocated for file. Thus, my original questions were centered
:around filesystem holes. I.e., non-allocated chunks in the middle of a
:file. When trying to READ from within a hole, the kernel just sends back
:a buffer of zeros... which is enough to show that the record is not
:initialized. Actually, something like an "exists" function for a record
:wouldn't touch the disk at all! When writing to a hole, the kernel simply
:allocates the necessary block(s). This is really fast, too, for creation,
:as the empty set can be written to disk with touch(1), and uses far less
:memory than virtual initialization or memory structures ;-)
:
:As an example, try
Ahh.. yes, I know. I'm a filesystem expert :-) However, that said, I
will tell you quite frankly that virtually *nobody* depends on holes
for efficient storage. There are only a few problems where it's
practical.... some forms of executables, and sparse matrixes. That's
pretty much it.
:And analyze the reported filesize, as well as the reported block count of
:the file. You should see a 2GB "size", and maybe 8K in allocated blocks
:(depends how you ran newfs ;-). This is behaviour that has been around
:since the original UFS.
It's not a good idea to use a small block size in UFS. The minimum
is pretty much 1K frag, 8K block. for a sparse file this means 8K
is the minimum that will be allocated (frags are only allocated for
the end of a file).
If you think the btree idea is too complex to implement quickly, then
try using a radix tree (essentially what you said in regards to
breaking the hash you calculate into a node traversal). For your
problem I would recommend 64 elements per node, which corresponds to
6 bits. 16 million records would fit in 6x4 = 4 levels. If you
cache the first three levels in memory you eat 64^3 = 262144 x
sizeof(record). Assuming a simple file offset for the record, you eat
exactly 1MB of memory, 64MB for the file index, and your data can
be placed sequentially in the file.
:- Ryan
-Matt
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200010300316.e9U3GiP72404>
