Date: Mon, 30 Oct 2000 01:09:41 -0600 (CST) From: Ryan Thompson <ryan@sasknow.com> To: Matt Dillon <dillon@earth.backplane.com> Cc: freebsd-hackers@FreeBSD.ORG Subject: Re: Filesystem holes Message-ID: <Pine.BSF.4.21.0010300028510.86769-100000@ren.sasknow.com> In-Reply-To: <200010300316.e9U3GiP72404@earth.backplane.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Matt Dillon wrote to Ryan Thompson: > :Hi, Matt! Thanks for the replies. I'll try and keep you interested ;-) > : > :Hmm... Perhaps you're still missing my original point? I'm talking about > :a file with 96GB in addressable bytes (well, probably a bunch of files, > :given logical filesize limitations, but let's say for simplicity's sake > :that we have a single file). It's actual "size" (in terms of allocated > :blocks) will be only a bit larger than 2.1GB. (Directly proportional to > :the used size of the dataset. Discrepancies only come into play when > :record size != block size, but that can be worked around somewhat) > > Ah, ok... that's better, though I think you will find yourself > tuning it endlessly if the blocksize does not match-up. Remember, > the filesystem allocates 8K per block, so if your record size is > 1500 bytes and you have a random distribution, you are going to > wind up eating 8-14GB. True.. As they say, "Disk is cheap", though... And "tuning" isn't so bad on a simple address calculation. I agree that if the record size doesn't closely match the blocksize, there will be a waste of space, but at least that waste is proportional to the dataset... Better than many data structures that I could call by name. It wouldn't be that difficult with many problem sets to optimize them to the point where their records align neatly with the blocksize. Granted, the waste space would hang you with some problem sets. I propose, though, that for some problems, it is easy enough to tune them to reduce (or, if you're really lucky, eliminate), waste space. Yes, this is a specialized solution. > :In other words, ls -ls will report the "size" as some ridiculously large > :number, will show a much smaller block count. So, assuming four records > :are added to the file on block boundaries, the file will actually only use > :four blocks... nowhere near 96GB! > : > :In the UNIX filesystem (ya, I know.. just pick one :-), size of file != > :space allocated for file. Thus, my original questions were centered > :around filesystem holes. I.e., non-allocated chunks in the middle of a > :file. When trying to READ from within a hole, the kernel just sends back > :a buffer of zeros... which is enough to show that the record is not > :initialized. Actually, something like an "exists" function for a record > :wouldn't touch the disk at all! When writing to a hole, the kernel simply > :allocates the necessary block(s). This is really fast, too, for creation, > :as the empty set can be written to disk with touch(1), and uses far less > :memory than virtual initialization or memory structures ;-) > : > :As an example, try > > Ahh.. yes, I know. I'm a filesystem expert :-) I know, Matt, and that's why it's great to talk to you about this :-) I guess I needed an example to convey my point, though. I apologize if it came across as an insult to your intelligence; 'twas not intended that way in the least. > However, that said, I > will tell you quite frankly that virtually *nobody* depends on holes > for efficient storage. Ahh. Ok. This is the kind of response I was looking for. Case studies :-) > There are only a few problems where it's > practical.... some forms of executables, and sparse matrixes. That's > pretty much it. > :And analyze the reported filesize, as well as the reported block count of > :the file. You should see a 2GB "size", and maybe 8K in allocated blocks > :(depends how you ran newfs ;-). This is behaviour that has been around > :since the original UFS. > > It's not a good idea to use a small block size in UFS. The minimum > is pretty much 1K frag, 8K block. for a sparse file this means 8K > is the minimum that will be allocated (frags are only allocated for > the end of a file). Right, which is why this strategy _does_ only work for a specific subset of problems, just as a directed graph works well for a path structure, but would really suck for (among other things) maintaining a sorted list of account numbers. > If you think the btree idea is too complex to implement quickly, then > try using a radix tree (essentially what you said in regards to > breaking the hash you calculate into a node traversal). For your > problem I would recommend 64 elements per node, which corresponds to > 6 bits. 16 million records would fit in 6x4 = 4 levels. If you > cache the first three levels in memory you eat 64^3 = 262144 x > sizeof(record). Assuming a simple file offset for the record, you eat > exactly 1MB of memory, 64MB for the file index, and your data can > be placed sequentially in the file. Right. One more way to skin the cat, and a pretty good one at that, if you have main memory to burn (I don't :-) Assuming, though, (yes, I'm going to be a bit stubborn, here... ;-) that I WANT to use this address calculation method in conjunction with the filesystem's ability to allocate blocks as they are used... I can do so with some calculable storage increase on disk (which can probably be reduced or eliminated, depending on the problem). Assuming that increase is affordable, then implementing this data structure can be done without the use of ANY significant main memory. Not 64MB... Not 1MB... Just a few KB here and there for a temp record or two, functions and library calls, while accesses are still very fast. Recall that another 64MB of RAM probably costs almost as much right now as the difference between a 20GB HD and a 30GB HD. SO.. Assuming that I could sell enough of that rationale in my design... And I went ahead to implementation... I shall ask the original question: Once a block has been allocated in the middle of a file, is there a simple and efficient way to de-allocate that block arbitrarily? (So, if I seek to the middle of a file, write 8KB, and later decide that I want to remove that 8KB block to reclaim the space, can it be done in userland? If so, how?) Thanks for reading! :-) > -Matt > - Ryan -- Ryan Thompson <ryan@sasknow.com> Network Administrator, Accounts Phone: +1 (306) 664-1161 SaskNow Technologies http://www.sasknow.com #106-380 3120 8th St E Saskatoon, SK S7H 0W2 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.4.21.0010300028510.86769-100000>