From owner-freebsd-fs@FreeBSD.ORG Sun Mar 27 20:45:11 2005 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 28AFF16A4CE; Sun, 27 Mar 2005 20:45:11 +0000 (GMT) Received: from salmon.maths.tcd.ie (salmon.maths.tcd.ie [134.226.81.11]) by mx1.FreeBSD.org (Postfix) with SMTP id 24D7643D1F; Sun, 27 Mar 2005 20:45:10 +0000 (GMT) (envelope-from dwmalone@maths.tcd.ie) Received: from walton.maths.tcd.ie by salmon.maths.tcd.ie with SMTP id ; 27 Mar 2005 21:45:09 +0100 (BST) To: Scott In-reply-to: Your message of "Mon, 28 Mar 2005 02:42:55 PDT." <4247D19F.6010502@samsco.org> X-Request-Do: Date: Sun, 27 Mar 2005 21:45:06 +0100 From: David Malone Message-ID: <200503272145.aa71162@salmon.maths.tcd.ie> cc: freebsd-fs@FreeBSD.org cc: Robert Watson Subject: Re: UFS Subdirectory limit. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 27 Mar 2005 20:45:11 -0000 > Luckily, linear reads through a directory are nearly O(1) in UFS since > ufs_lookup() caches the offset to the last entry read so that a > subsequent call doesn't have to start from the beginning. (dirhash also has an equivelent optimisation 'cos that bit of ufs_lookup code isn't called when dirhash is in use) > Would > an application that isn't as well-written as cyrus behave as well? What > about an application like Squid? Random lookups should be almost O(1) with dirhash when you have many operations to amortise the cost of the hash over. You loose out with dirhash are when you make a small number of accesses to a large directory and all those entries live close to the beginning of the directory (or possibly when you're thrashing against dirhash's memory limit). If the directory entries are actually constant (as is the case with squid in truncate mode), then you should get ~O(1) but with a slightly smaller constant than when the directory entries are changing. Just to check, I'm running a benchmark at the moment to compare 150k directories either aranged as: 1) a flat 150k subdirectories of one directory, or 2) 150k directories arranged as a two levels with ~387 subdirectories. At the moment it looks like accessing files in either structure performs equivelently but it is a bit slower to build/remove the flat structure. I'll post the results once the run is complete. David.