From owner-freebsd-hackers Sat Jun 2 10:48: 2 2001 Delivered-To: freebsd-hackers@freebsd.org Received: from earth.backplane.com (earth-nat-cw.backplane.com [208.161.114.67]) by hub.freebsd.org (Postfix) with ESMTP id 6C2B837B422 for ; Sat, 2 Jun 2001 10:47:59 -0700 (PDT) (envelope-from dillon@earth.backplane.com) Received: (from dillon@localhost) by earth.backplane.com (8.11.3/8.11.2) id f52Hluf03989; Sat, 2 Jun 2001 10:47:56 -0700 (PDT) (envelope-from dillon) Date: Sat, 2 Jun 2001 10:47:56 -0700 (PDT) From: Matt Dillon Message-Id: <200106021747.f52Hluf03989@earth.backplane.com> To: Ian Dowse Cc: freebsd-hackers@FreeBSD.ORG Subject: Re: UFS large directory performance References: <200106021207.aa78407@salmon.maths.tcd.ie> Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG : :Thanks for the comments :-) Yes, malloc pool fragmentation is a :problem. I think that it can be addressed to some extent by using :a 2-level system (an array of pointers to fixed-size arrays) instead :of a single large array, but I'm open to any better suggestions. I do precisely this for the swap meta-support structures associated with VM objects. It works very well and it means you can use the zone allocator for the fixed size structures. :If the second-level array size was fixed at around 4k, that would :keep the variable-length first-level array small enough not to :cause too many fragmentation issues. The per-DIRBLKSIZ free space :summary array is probably relatively okay as it is now. 4K = (IA32) 1024 directory entries / 2 (approximate hash slop) = 512 directory entries? That seems quite reasonable but I would use a smaller block size.. 512 bytes (remember, with zalloc there is no overhead for allocating smaller structures. Read on! I would further recommend a (dynamic) array of pointers at the first level as part of the summary structure. Any given array entry would either point to the second level array (the 512 byte allocations), be NULL (no second level array was necessary), or be (void *)-1 which would indicate that the second level array was reclaimed for other uses. During a lookup your hash algorithm would operate as per normal but when it skips to the next top level array index it would test for NULL (search ends, entry not found) and (void *)-1. (void *)-1 would indicate 'search ends but the result is indeterminant, you have to rescan the directory'. By using a smaller block size for the second level array you create more slots in the first level array which gives the system a better chance of reusing a second level array block for other purpopses without seriously compromising performance for file creates, deletions, and lookups on the directory. e.g. lower chance of the lookup hitting the (void *)-1 reclaim mark in the first level array. :The other main issue, that of discarding old hashes when the memory :limit is reached, may be quite tricky to resolve. Any approach :based on doing this from ufsdirhash_build() is likely to become a :locking nightmare. My original idea was to have ufsdirhash_build() :walk a list of other inodes with hashes attached and free them if :possible, but that would involve locking the other inode, so things :could get messy. : :Ian If the zone allocator is used for the second level block allocations it shouldn't be a problem. You can (had better be able to!) put a mutex around zone frees in -current. -Matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message