Date: Sun, 27 Mar 2005 21:45:06 +0100 From: David Malone <dwmalone@maths.tcd.ie> To: Scott <scottl@samsco.org> Cc: Robert Watson <rwatson@FreeBSD.org> Subject: Re: UFS Subdirectory limit. Message-ID: <200503272145.aa71162@salmon.maths.tcd.ie> In-Reply-To: Your message of "Mon, 28 Mar 2005 02:42:55 PDT." <4247D19F.6010502@samsco.org>
next in thread | previous in thread | raw e-mail | index | archive | help
> Luckily, linear reads through a directory are nearly O(1) in UFS since > ufs_lookup() caches the offset to the last entry read so that a > subsequent call doesn't have to start from the beginning. (dirhash also has an equivelent optimisation 'cos that bit of ufs_lookup code isn't called when dirhash is in use) > Would > an application that isn't as well-written as cyrus behave as well? What > about an application like Squid? Random lookups should be almost O(1) with dirhash when you have many operations to amortise the cost of the hash over. You loose out with dirhash are when you make a small number of accesses to a large directory and all those entries live close to the beginning of the directory (or possibly when you're thrashing against dirhash's memory limit). If the directory entries are actually constant (as is the case with squid in truncate mode), then you should get ~O(1) but with a slightly smaller constant than when the directory entries are changing. Just to check, I'm running a benchmark at the moment to compare 150k directories either aranged as: 1) a flat 150k subdirectories of one directory, or 2) 150k directories arranged as a two levels with ~387 subdirectories. At the moment it looks like accessing files in either structure performs equivelently but it is a bit slower to build/remove the flat structure. I'll post the results once the run is complete. David.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200503272145.aa71162>