From owner-freebsd-current Thu Sep 27 23:23:44 2001 Delivered-To: freebsd-current@freebsd.org Received: from critter.freebsd.dk (critter.freebsd.dk [212.242.86.163]) by hub.freebsd.org (Postfix) with ESMTP id 3F95A37B401 for ; Thu, 27 Sep 2001 23:23:39 -0700 (PDT) Received: from critter (localhost [127.0.0.1]) by critter.freebsd.dk (8.11.4/8.11.4) with ESMTP id f8S6NTv69647; Fri, 28 Sep 2001 08:23:29 +0200 (CEST) (envelope-from phk@critter.freebsd.dk) To: Julian Elischer Cc: current@FreeBSD.ORG Subject: Re: RFC: mod for 'du' In-Reply-To: Your message of "Thu, 27 Sep 2001 14:00:26 PDT." <3BB3936A.D1E1F6B3@vicor-nb.com> Date: Fri, 28 Sep 2001 08:23:29 +0200 Message-ID: <69645.1001658209@critter> From: Poul-Henning Kamp Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG In message <3BB3936A.D1E1F6B3@vicor-nb.com>, Julian Elischer writes: >This is a multi-part message in MIME format. >--------------5937F02517145C712E28C299 >Content-Type: text/plain; charset=us-ascii >Content-Transfer-Encoding: 7bit > >'du' keeps an array of files it has encountered that have > 1 link. >Whenever it encounters another, it checks to see if it's one it has >already seen >and thus can avoid counting its space twice.. > >This is ok for small filesystems, however VICOR maintains >500GB filesystems on which much of the data has several links. > >The following patch to replace the linear array (which it realocs if too >small) >(which it scans linearly) with a hash-table can makle a DRASTIC change >to how DU perfomrs for us in this environment. > >In a small test, we made a linked copy of /usr/src > >the run times were: >old: 0.410u 2.221s 1:55.41 2.2% 12+1355k 6325+0io 2pf+0w >new: 8.610u 2.665s 2:09.23 8.7% 10+718k 6367+0io 2pf+0 Have you examined if the hask key is evenly distributed ? >Does anyone have objections to me committing this (or a variant of it)? I would certainly have objections of you show us one patch and commit another.. > * The Regents of the University of California. All rights reserved. > * >+ * This version of du has been modified by Andre de Bruin and Scott Macy for Vicor. Too long line ? >+ * The change is related to the handling of file links, in the official >+ * release, du keeps a simple linked list of all visited i-nodes with >+ * multiple links. This list is now implemented as a "hash table" (actually >+ * a fixed size array) with link list nodes providing a >+ * performance of up to 30% better than the original version. >+ * The only changes are in file create.c and marked with "AdB". We put such stuff in the CVS commit message, not in the source file. >+typedef struct _ID { >+ dev_t dev; >+ ino_t inode; >+ struct _ID *next; >+} ID; Why not use db(3) ? It has a known good hash, it autosizes to different task-sets and it could be setup with an option to put the hash-table in a diskfile ("du -t" ?) for known bad cases of many hardlinks -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message