From owner-freebsd-hackers Sat Aug 16 12:28:43 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id MAA06917 for hackers-outgoing; Sat, 16 Aug 1997 12:28:43 -0700 (PDT) Received: from phaeton.artisoft.com (phaeton.Artisoft.COM [198.17.250.50]) by hub.freebsd.org (8.8.5/8.8.5) with SMTP id MAA06912 for ; Sat, 16 Aug 1997 12:28:40 -0700 (PDT) Received: (from terry@localhost) by phaeton.artisoft.com (8.6.11/8.6.9) id MAA04387; Sat, 16 Aug 1997 12:20:48 -0700 From: Terry Lambert Message-Id: <199708161920.MAA04387@phaeton.artisoft.com> Subject: Re: More info on slow "rm" times with 2.2.1+. To: karpen@ocean.campus.luth.se (Mikael Karpberg) Date: Sat, 16 Aug 1997 12:20:48 -0700 (MST) Cc: dg@root.com, hackers@FreeBSD.ORG In-Reply-To: <199708161229.OAA01231@ocean.campus.luth.se> from "Mikael Karpberg" at Aug 16, 97 02:29:10 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-hackers@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk > > How many files are in the directory isn't important. What is important is > > the size of the directory. You can have a 20MB directory and yet have only a > > 100 files in it. There is code to free up unused space in directories, but > > it only works if the free space is at the end. If the directory is large, > > then it will take a large amount of time to search through it. > > And this cuz it's slow? Or? Isn't there a command (which could be run in > daily, or weekly, or something) that goes through a directory (or many) and > optimize the space they take? > > If there isn't... why? And would it be hard to write? This particular optimization is not possible "in-band" because of you can't reorder the directory entries when someone has the directory open without damaging the validit of their offsef for their next "getdents()". The closest you can get with this approach is a sparse directory file (do not get me wrong; this is a not insignificant win). But even so, you will probably not be in the area of the previous versions performance, unless you are right on the cusp of directory entry pages being LRU'ed on you. And if you were, you could speed it up much more easily by adding RAM (best) or swap (good) to extend the LRU period so that the directory entry traversal did not force pages out. Of course, doing that, you aren't going to get a real win: you are just putting off the problem for a future recurrence when your number of entries goes up, yet again. Throwing hardware at a problem is a piss-poor way to optimize. One *very* nice possibility would be to seperate, completely, the directory and file entry operations (the VFS abstraction fails to do this in a number of circumstances right now, and namei() and the directory cache being per FS instead of in the common VFS layer are in the middle of where the blame should fall). If you did this, you could provide a directory entry function "iterator" operation. If you had one of these, you could add a system call to call the iterator with a delete function (yes, the function would have globbing in the kernel) and delete everything matching the criteria in a single, linear pass of the directory, without kernel/user transitions. Yet another VFS layering issue, I'm afraid. 8-(. Regards, Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers.