Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 15 Aug 1997 09:07:29 -0700 (MST)
From:      Terry Lambert <terry@lambert.org>
To:        dg@root.com
Cc:        ponds!rivers@dg-rtp.dg.com, freebsd-hackers@FreeBSD.ORG
Subject:   Re: More info on slow "rm" times with 2.2.1+.
Message-ID:  <199708151607.JAA22981@phaeton.artisoft.com>
In-Reply-To: <199708150532.WAA12760@implode.root.com> from "David Greenman" at Aug 14, 97 10:32:14 pm

next in thread | previous in thread | raw e-mail | index | archive | help
>    The answer to this is that FreeBSD, like many (most?) other operating
> systems, doesn't handle very large directories very well. When I was looking
> into similar slowness on a different news server, I found that the
> control.cancel newsgroup directory was more than 10MB large! I think it
> contained more than 200,000 files, but I don't recall the exact number.

Just some thoughts...

The NT and OS/2 FS's both use btree's for their lookups.

While it is initially expensive to fault inodes at the same time as
directory entries, it's possible to do.

Adding the directory entry offset to the DNLC cache, and doing the
faulting asynchronously *dramatically* improves large directory
performance, as long as there is high locality in the requests,
or the cache is ~50% of the remaining entries in the directory
in size (to delete an inode, you must fault it twice to be able
to modify the reference count -- this is a problem in the way
hard links are handled -- the same reason you can't have a parent
pointer on normal files).  If you could actually *use* the DNLC
for deletes, it'd be a serious win.

Part of the DNLC problem is that the vnodes it references are
backed by an ihash.  The ihash needs to go away, preferrably
by making the vnodes per FS, and adding a per FS "vrele"; this
would shorten the search algorithm by N*log2(N).

The FFS directory structure lends itself rather easily to skiplists,
if a NULL initial record is tagged by something other than a zero
inode number (say a "deleted" flag, which is useful for "undelete"
as well).  I'm not necessarily recommending skiplists, but by
rewriting the block offset ordering for direct and indirect blocks,
blocks could be reordered within clusters without a high impact;
inserting near the head of a directory would be more expensive,
but on average, create operations are less clustered than delete
operations, and we know for sure that on average the *number* of
each is equal (or our hard disks quickly fill up).

If you were to flag the entries rather than deleting them so long
as a flag operation occured in a timer window, then purge when the
timer expired (inactivity purge), it would go a hell of a long way
toward making it faster.

Finally, VMS and NT ware not stupid when they put their globbing
in the kernel space.  Not only does it allow you to only push
back matching entries instead of all entries, for a delete
operation, it allows the entire globbed delete to occur in a
single linear traversal of the directory structure.  For news
expiration, you would need to be able to glob on any attribute,
not just name (specifically, create time).


					Regards,
					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199708151607.JAA22981>