Date: Sun, 29 Oct 2000 19:16:44 -0800 (PST) From: Matt Dillon <dillon@earth.backplane.com> To: Ryan Thompson <ryan@sasknow.com> Cc: freebsd-hackers@FreeBSD.ORG Subject: Re: Filesystem holes Message-ID: <200010300316.e9U3GiP72404@earth.backplane.com> References: <Pine.BSF.4.21.0010291907320.27883-100000@ren.sasknow.com>
next in thread | previous in thread | raw e-mail | index | archive | help
:Hi, Matt! Thanks for the replies. I'll try and keep you interested ;-) : :Hmm... Perhaps you're still missing my original point? I'm talking about :a file with 96GB in addressable bytes (well, probably a bunch of files, :given logical filesize limitations, but let's say for simplicity's sake :that we have a single file). It's actual "size" (in terms of allocated :blocks) will be only a bit larger than 2.1GB. (Directly proportional to :the used size of the dataset. Discrepancies only come into play when :record size != block size, but that can be worked around somewhat) Ah, ok... that's better, though I think you will find yourself tuning it endlessly if the blocksize does not match-up. Remember, the filesystem allocates 8K per block, so if your record size is 1500 bytes and you have a random distribution, you are going to wind up eating 8-14GB. :In other words, ls -ls will report the "size" as some ridiculously large :number, will show a much smaller block count. So, assuming four records :are added to the file on block boundaries, the file will actually only use :four blocks... nowhere near 96GB! : :In the UNIX filesystem (ya, I know.. just pick one :-), size of file != :space allocated for file. Thus, my original questions were centered :around filesystem holes. I.e., non-allocated chunks in the middle of a :file. When trying to READ from within a hole, the kernel just sends back :a buffer of zeros... which is enough to show that the record is not :initialized. Actually, something like an "exists" function for a record :wouldn't touch the disk at all! When writing to a hole, the kernel simply :allocates the necessary block(s). This is really fast, too, for creation, :as the empty set can be written to disk with touch(1), and uses far less :memory than virtual initialization or memory structures ;-) : :As an example, try Ahh.. yes, I know. I'm a filesystem expert :-) However, that said, I will tell you quite frankly that virtually *nobody* depends on holes for efficient storage. There are only a few problems where it's practical.... some forms of executables, and sparse matrixes. That's pretty much it. :And analyze the reported filesize, as well as the reported block count of :the file. You should see a 2GB "size", and maybe 8K in allocated blocks :(depends how you ran newfs ;-). This is behaviour that has been around :since the original UFS. It's not a good idea to use a small block size in UFS. The minimum is pretty much 1K frag, 8K block. for a sparse file this means 8K is the minimum that will be allocated (frags are only allocated for the end of a file). If you think the btree idea is too complex to implement quickly, then try using a radix tree (essentially what you said in regards to breaking the hash you calculate into a node traversal). For your problem I would recommend 64 elements per node, which corresponds to 6 bits. 16 million records would fit in 6x4 = 4 levels. If you cache the first three levels in memory you eat 64^3 = 262144 x sizeof(record). Assuming a simple file offset for the record, you eat exactly 1MB of memory, 64MB for the file index, and your data can be placed sequentially in the file. :- Ryan -Matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200010300316.e9U3GiP72404>