Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 30 Oct 2000 01:09:41 -0600 (CST)
From:      Ryan Thompson <ryan@sasknow.com>
To:        Matt Dillon <dillon@earth.backplane.com>
Cc:        freebsd-hackers@FreeBSD.ORG
Subject:   Re: Filesystem holes
Message-ID:  <Pine.BSF.4.21.0010300028510.86769-100000@ren.sasknow.com>
In-Reply-To: <200010300316.e9U3GiP72404@earth.backplane.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Matt Dillon wrote to Ryan Thompson:

> :Hi, Matt!  Thanks for the replies.  I'll try and keep you interested ;-)
> :
> :Hmm... Perhaps you're still missing my original point?  I'm talking about
> :a file with 96GB in addressable bytes (well, probably a bunch of files,
> :given logical filesize limitations, but let's say for simplicity's sake
> :that we have a single file).  It's actual "size" (in terms of allocated
> :blocks) will be only a bit larger than 2.1GB.  (Directly proportional to
> :the used size of the dataset.  Discrepancies only come into play when
> :record size != block size, but that can be worked around somewhat)
> 
>     Ah, ok... that's better, though I think you will find yourself
>     tuning it endlessly if the blocksize does not match-up.  Remember,
>     the filesystem allocates 8K per block, so if your record size is
>     1500 bytes and you have a random distribution, you are going to
>     wind up eating 8-14GB.

True.. As they say, "Disk is cheap", though... And "tuning" isn't so bad
on a simple address calculation.  I agree that if the record size doesn't
closely match the blocksize, there will be a waste of space, but at least
that waste is proportional to the dataset... Better than many data
structures that I could call by name.  It wouldn't be that difficult with
many problem sets to optimize them to the point where their records align
neatly with the blocksize.  Granted, the waste space would hang you with
some problem sets.  I propose, though, that for some problems, it is easy
enough to tune them to reduce (or, if you're really lucky, eliminate),
waste space.  Yes, this is a specialized solution.


> :In other words, ls -ls will report the "size" as some ridiculously large
> :number, will show a much smaller block count.  So, assuming four records
> :are added to the file on block boundaries, the file will actually only use
> :four blocks... nowhere near 96GB!
> :
> :In the UNIX filesystem (ya, I know.. just pick one :-), size of file !=
> :space allocated for file.  Thus, my original questions were centered
> :around filesystem holes.  I.e., non-allocated chunks in the middle of a
> :file.  When trying to READ from within a hole, the kernel just sends back
> :a buffer of zeros... which is enough to show that the record is not
> :initialized.  Actually, something like an "exists" function for a record
> :wouldn't touch the disk at all!  When writing to a hole, the kernel simply
> :allocates the necessary block(s).  This is really fast, too, for creation,
> :as the empty set can be written to disk with touch(1), and uses far less
> :memory than virtual initialization or memory structures ;-)
> :
> :As an example, try 
> 
>     Ahh.. yes, I know.  I'm a filesystem expert :-)  

I know, Matt, and that's why it's great to talk to you about this :-)  I
guess I needed an example to convey my point, though.  I apologize if it
came across as an insult to your intelligence; 'twas not intended that
way in the least.


>	However, that said, I
>     will tell you quite frankly that virtually *nobody* depends on holes
>     for efficient storage.  

Ahh.  Ok.  This is the kind of response I was looking for.  Case studies
:-)

> There are only a few problems where it's
>     practical.... some forms of executables, and sparse matrixes.  That's
>     pretty much it.


> :And analyze the reported filesize, as well as the reported block count of
> :the file.  You should see a 2GB "size", and maybe 8K in allocated blocks
> :(depends how you ran newfs ;-).  This is behaviour that has been around
> :since the original UFS.
> 
>     It's not a good idea to use a small block size in UFS.  The minimum
>     is pretty much 1K frag, 8K block.  for a sparse file this means 8K
>     is the minimum that will be allocated (frags are only allocated for
>     the end of a file).

Right, which is why this strategy _does_ only work for a specific subset
of problems, just as a directed graph works well for a path structure, but
would really suck for (among other things) maintaining a sorted list of
account numbers.


>     If you think the btree idea is too complex to implement quickly, then
>     try using a radix tree (essentially what you said in regards to 
>     breaking the hash you calculate into a node traversal).  For your
>     problem I would recommend 64 elements per node, which corresponds to
>     6 bits.  16 million records would fit in 6x4 = 4 levels.  If you
>     cache the first three levels in memory you eat 64^3 = 262144 x
>     sizeof(record).  Assuming a simple file offset for the record, you eat
>     exactly 1MB of memory, 64MB for the file index, and your data can
>     be placed sequentially in the file. 

Right.  One more way to skin the cat, and a pretty good one at that, if
you have main memory to burn (I don't :-)

Assuming, though, (yes, I'm going to be a bit stubborn, here... ;-) that I
WANT to use this address calculation method in conjunction with the
filesystem's ability to allocate blocks as they are used...  I can do so
with some calculable storage increase on disk (which can probably be
reduced or eliminated, depending on the problem).  Assuming that increase
is affordable, then implementing this data structure can be done without
the use of ANY significant main memory.  Not 64MB... Not 1MB... Just a few
KB here and there for a temp record or two, functions and library calls,
while accesses are still very fast.  Recall that another 64MB of RAM
probably costs almost as much right now as the difference between a 20GB
HD and a 30GB HD.

SO.. Assuming that I could sell enough of that rationale in my design...
And I went ahead to implementation... I shall ask the original question:

Once a block has been allocated in the middle of a file, is there a simple
and efficient way to de-allocate that block arbitrarily?  (So, if I seek
to the middle of a file, write 8KB, and later decide that I want to remove
that 8KB block to reclaim the space, can it be done in userland?  If so,
how?)


Thanks for reading!  :-)

> 					-Matt
> 

- Ryan

-- 
  Ryan Thompson <ryan@sasknow.com>
  Network Administrator, Accounts
  Phone: +1 (306) 664-1161

  SaskNow Technologies     http://www.sasknow.com
  #106-380 3120 8th St E   Saskatoon, SK  S7H 0W2



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.4.21.0010300028510.86769-100000>