From owner-freebsd-bugs Thu Aug 29 9:20: 7 2002 Delivered-To: freebsd-bugs@hub.freebsd.org Received: from mx1.FreeBSD.org (mx1.FreeBSD.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A203537B400 for ; Thu, 29 Aug 2002 09:20:03 -0700 (PDT) Received: from freefall.freebsd.org (freefall.FreeBSD.org [216.136.204.21]) by mx1.FreeBSD.org (Postfix) with ESMTP id EBA2C43E6E for ; Thu, 29 Aug 2002 09:20:02 -0700 (PDT) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.12.4/8.12.4) with ESMTP id g7TGK2JU003378 for ; Thu, 29 Aug 2002 09:20:02 -0700 (PDT) (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.12.4/8.12.4/Submit) id g7TGK2ud003377; Thu, 29 Aug 2002 09:20:02 -0700 (PDT) Received: from mx1.FreeBSD.org (mx1.FreeBSD.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id BD71E37B400 for ; Thu, 29 Aug 2002 09:19:44 -0700 (PDT) Received: from www.freebsd.org (www.FreeBSD.org [216.136.204.117]) by mx1.FreeBSD.org (Postfix) with ESMTP id 8054743E42 for ; Thu, 29 Aug 2002 09:19:44 -0700 (PDT) (envelope-from nobody@FreeBSD.org) Received: from www.freebsd.org (localhost [127.0.0.1]) by www.freebsd.org (8.12.4/8.12.4) with ESMTP id g7TGJiOT008161 for ; Thu, 29 Aug 2002 09:19:44 -0700 (PDT) (envelope-from nobody@www.freebsd.org) Received: (from nobody@localhost) by www.freebsd.org (8.12.4/8.12.4/Submit) id g7TGJikm008160; Thu, 29 Aug 2002 09:19:44 -0700 (PDT) Message-Id: <200208291619.g7TGJikm008160@www.freebsd.org> Date: Thu, 29 Aug 2002 09:19:44 -0700 (PDT) From: Mike Harding To: freebsd-gnats-submit@FreeBSD.org X-Send-Pr-Version: www-1.0 Subject: misc/42167: du uses linear search for duplicate inodes - very slow! Sender: owner-freebsd-bugs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org >Number: 42167 >Category: misc >Synopsis: du uses linear search for duplicate inodes - very slow! >Confidential: no >Severity: non-critical >Priority: low >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: change-request >Submitter-Id: current-users >Arrival-Date: Thu Aug 29 09:20:01 PDT 2002 >Closed-Date: >Last-Modified: >Originator: Mike Harding >Release: 4.6-STABLE >Organization: Welkyn Technologies, LLC >Environment: FreeBSD netcom1.netcom.com 4.6-STABLE FreeBSD 4.6-STABLE #0: Wed Aug 28 09:57:56 PDT 2002 mvh@netcom1.netcom.com:/usr/obj/usr/src/sys/MIKEIPF i386 >Description: 'du' uses an unordered linear search for duplicate inodes. This causes du to run _extremely_ slowly if you have a lot of links. I have a local news spool created by 'leafnode+' and it takes hours at 100% cpu to run du on it. This sort of approach is order N^2 or N^3 and is very time inefficient. >How-To-Repeat: run du on a directory with a _lot_ of files (my leafnode directory has about 1 million files). >Fix: see the function linkchk() in du.c - this function does a linear search and insert. It could be enhanced to use either a hash, a btree, an ordered list, or an ordered list of a fixed size and a sorted overflow list. Another possibility is to use a set of hash buckets and the same underlying algorithm - I will be trying to code up this approach in the next few days. >Release-Note: >Audit-Trail: >Unformatted: To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-bugs" in the body of the message