Date: Thu, 29 Aug 2002 09:19:44 -0700 (PDT) From: Mike Harding <mvh@ix.netcom.com> To: freebsd-gnats-submit@FreeBSD.org Subject: misc/42167: du uses linear search for duplicate inodes - very slow! Message-ID: <200208291619.g7TGJikm008160@www.freebsd.org>
next in thread | raw e-mail | index | archive | help
>Number: 42167 >Category: misc >Synopsis: du uses linear search for duplicate inodes - very slow! >Confidential: no >Severity: non-critical >Priority: low >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: change-request >Submitter-Id: current-users >Arrival-Date: Thu Aug 29 09:20:01 PDT 2002 >Closed-Date: >Last-Modified: >Originator: Mike Harding >Release: 4.6-STABLE >Organization: Welkyn Technologies, LLC >Environment: FreeBSD netcom1.netcom.com 4.6-STABLE FreeBSD 4.6-STABLE #0: Wed Aug 28 09:57:56 PDT 2002 mvh@netcom1.netcom.com:/usr/obj/usr/src/sys/MIKEIPF i386 >Description: 'du' uses an unordered linear search for duplicate inodes. This causes du to run _extremely_ slowly if you have a lot of links. I have a local news spool created by 'leafnode+' and it takes hours at 100% cpu to run du on it. This sort of approach is order N^2 or N^3 and is very time inefficient. >How-To-Repeat: run du on a directory with a _lot_ of files (my leafnode directory has about 1 million files). >Fix: see the function linkchk() in du.c - this function does a linear search and insert. It could be enhanced to use either a hash, a btree, an ordered list, or an ordered list of a fixed size and a sorted overflow list. Another possibility is to use a set of hash buckets and the same underlying algorithm - I will be trying to code up this approach in the next few days. >Release-Note: >Audit-Trail: >Unformatted: To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-bugs" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200208291619.g7TGJikm008160>