From owner-freebsd-bugs@FreeBSD.ORG Sat Apr 19 00:50:11 2003 Return-Path: Delivered-To: freebsd-bugs@hub.freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id B973137B401 for ; Sat, 19 Apr 2003 00:50:11 -0700 (PDT) Received: from freefall.freebsd.org (freefall.freebsd.org [216.136.204.21]) by mx1.FreeBSD.org (Postfix) with ESMTP id A5BE943FBF for ; Sat, 19 Apr 2003 00:50:10 -0700 (PDT) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.12.9/8.12.9) with ESMTP id h3J7oAUp042723 for ; Sat, 19 Apr 2003 00:50:10 -0700 (PDT) (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.12.9/8.12.9/Submit) id h3J7oAHO042722; Sat, 19 Apr 2003 00:50:10 -0700 (PDT) Resent-Date: Sat, 19 Apr 2003 00:50:10 -0700 (PDT) Resent-Message-Id: <200304190750.h3J7oAHO042722@freefall.freebsd.org> Resent-From: FreeBSD-gnats-submit@FreeBSD.org (GNATS Filer) Resent-To: freebsd-bugs@FreeBSD.org Resent-Reply-To: FreeBSD-gnats-submit@FreeBSD.org, Peter van Dijk Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id B3E0C37B401 for ; Sat, 19 Apr 2003 00:47:24 -0700 (PDT) Received: from useful.dataloss.nl (useful.dataloss.nl [81.17.40.64]) by mx1.FreeBSD.org (Postfix) with SMTP id 9F2F843FA3 for ; Sat, 19 Apr 2003 00:47:23 -0700 (PDT) (envelope-from peter@dataloss.nl) Received: (qmail 29699 invoked by uid 1001); 19 Apr 2003 07:47:21 -0000 Message-Id: <20030419074721.29698.qmail@useful.dataloss.nl> Date: 19 Apr 2003 07:47:21 -0000 From: Peter van Dijk To: FreeBSD-gnats-submit@FreeBSD.org X-Send-Pr-Version: 3.113 Subject: bin/51151: du hardlinkmatching is slow - fix included X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: Peter van Dijk List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 19 Apr 2003 07:50:12 -0000 >Number: 51151 >Category: bin >Synopsis: du hardlinkmatching is slow - fix included >Confidential: no >Severity: non-critical >Priority: low >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: change-request >Submitter-Id: current-users >Arrival-Date: Sat Apr 19 00:50:10 PDT 2003 >Closed-Date: >Last-Modified: >Originator: Peter van Dijk >Release: FreeBSD 4.7-RELEASE-p6 i386 >Organization: - >Environment: System: FreeBSD useful.dataloss.nl 4.7-RELEASE-p6 FreeBSD 4.7-RELEASE-p6 #3: Fri Feb 28 21:41:55 CET 2003 root@useful.home.dataloss.nl:/home2/usr2/obj/home2/usr2/src/sys/USEFUL i386 >Description: Running du -hs on a set of directories with lots of hardlinks between them is horribly slow. The reason is that the inode-numbers are stored in a flat array which is searched in a linear way. >How-To-Repeat: Create lots of hardlinks and run du on your directory. >Fix: This patch modifies du to use the balanced tree methods provided by libisc. On my testset it reduced du runtime from around 4 hours to just under 2 minutes. I'm sorry it depends on libisc, but I could not find any suitable datastructures in the libc. I am open to suggestions. --- /usr/src/usr.bin/du/du.c Wed Sep 19 21:19:48 2001 +++ du.c Thu Apr 17 15:16:17 2003 @@ -63,6 +63,7 @@ #include #include #include +#include #define KILO_SZ(n) (n) #define MEGA_SZ(n) ((n) * (n)) @@ -104,6 +105,8 @@ void ignoreclean __P((void)); int ignorep __P((FTSENT *)); +tree *linktree; + int main(argc, argv) int argc; @@ -232,6 +235,8 @@ blocksize /= 512; rval = 0; + + tree_init(&linktree); if ((fts = fts_open(argv, ftsoptions, NULL)) == NULL) err(1, "fts_open"); @@ -308,36 +313,33 @@ exit(rval); } - -typedef struct _ID { - dev_t dev; - ino_t inode; -} ID; - - int linkchk(p) FTSENT *p; { - static ID *files; - static int maxfiles, nfiles; - ID *fp, *start; ino_t ino; dev_t dev; + char *s; + + if((s=malloc(32))==0) + err(1, "malloc"); ino = p->fts_statp->st_ino; dev = p->fts_statp->st_dev; - if ((start = files) != NULL) - for (fp = start + nfiles - 1; fp >= start; --fp) - if (ino == fp->inode && dev == fp->dev) - return (1); - - if (nfiles == maxfiles && (files = realloc((char *)files, - (u_int)(sizeof(ID) * (maxfiles += 128)))) == NULL) - errx(1, "can't allocate memory"); - files[nfiles].inode = ino; - files[nfiles].dev = dev; - ++nfiles; + + snprintf(s, 32, "%u:%u", dev, ino); + + if(tree_srch(&linktree, strcmp, s)==0) + { + tree_add(&linktree, strcmp, s, 0); + return(0); + } + else + { + free(s); + return(1); + } + return (0); } >Release-Note: >Audit-Trail: >Unformatted: