Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 31 Mar 1997 11:32:24 -0700 (MST)
From:      Terry Lambert <terry@lambert.org>
To:        scrappy@hub.org (The Hermit Hacker)
Cc:        hackers@freebsd.org
Subject:   Re: ftruncate("directory")...
Message-ID:  <199703311832.LAA09800@phaeton.artisoft.com>
In-Reply-To: <Pine.NEB.3.96.970331000021.199S-100000@thelab.hub.org> from "The Hermit Hacker" at Mar 31, 97 00:03:27 am

next in thread | previous in thread | raw e-mail | index | archive | help
> 	Has anyone written (is it possible?) a utility that can
> 'truncate' a directory? 
> 
> 	Essentially, what I'm looking at is something that would
> open a directory, re-org the entries in it so that only the 'good'
> entries are at the head of the file, and then truncate it...
> 
> 	Not sure if this is actually possible, but am going to play
> with it here...but if its already been done, all the better... :)

Directories are operated on in terms of "directory blocks".

New entries are allocated from as early in the directory as it is
possible to allocate them.

If there are two entries with a gap between them, and there would
be enough space in the block for the new entry if the gap wasn't
there, then entries are moved.  This is called "compaction", and
only occurs on entry creation, and only inside a block, never
spanning a block boundry.


When a block is fully empty (the last entry in the block has just
been deleted), then if it is not the last block, it's left there.
If, however, it was the last block, then all contiguous empty
blocks prior to the end of the directory are truncated back.


It should be possible to modify the algorithm so that intermediate
blocks which are fully empty are removed from the block list, either
resulting in a sparse directory file, or (by reorganizing the block
order) actually a smaller file by removing the placeholders.  This
is not recommended, since it would invlidate NFS cookies in such a
way as to cause SunOS NFSv3 clients to fail unrecoverably in some
situations where the clients expect traditional UFS behaviour and fail
to deal gracefully with invalidated cookies (like they are supposed
to, but do not).  This is a bug in Sun's NFSv3 client code, and it
is not likely that you will be able to get them to fix it.

If you choose the sparse directory route, the ufs_dir.c operations
must be made to recognize a zero "next" pointer for the first entry
in a block as an indicator that the block contents are invalid and
the next block should be consulted instead.


It should be noted at this point that truncation on deletion is one
of the reasons FreeBSD performs so poorly on the lmbench tests: the
truncation affects FS metadata which is synchronously written unless
you mount -async.  There is considerable controversy as to whether
or not lmbench tests on ext2fs on Linux (or any other -async mounted
FS which fails to make POSIX guarantees, as -async mounted FS's are
wont to do) can really be considered a "figure of merit".  The general
consensus among people with FS engineering backgrounds is that it can
not.


Inter-block compaction (as opposed to intra-block compaction) must
include multiple non-atomic yet idempotent operations, since it will,
by definition, span multiple blocks.

It is highly unlikely that you will achive any significant block
recovery by doing this, and the fact that you must implement the
idempotence in muc the same way that "rename" implements it, means
that this would be a very high-overhead operation: not something
you would want to do during standard directory operations.  For
the amount of fragmentation necessary to achieve utility from this
recovery, you would need to have set up a scenario terribly different
from standard utilization patterns.  Perhaps the lmbench directory
create/delete test?

Finally, it is unlikely that you will be able to successfully
employ a "least fit" algorithm in allocating these entries without
an O(N*(N-1)) traversal of the directory entries, noting that you
must make specific exception for "." and "..", which *must* be the
first entries in any directory.


In any case, if you do pursue this, you should establish performance
baselines for the operations which will be affected by the additional
overhead, and one or more common scenarios where your changes will
actually be beneficial instead of detrimental.


					Regards,
					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199703311832.LAA09800>