From owner-freebsd-current@FreeBSD.ORG Fri Jan 14 00:48:52 2005 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8564016A4CE; Fri, 14 Jan 2005 00:48:52 +0000 (GMT) Received: from kientzle.com (h-66-166-149-50.snvacaid.covad.net [66.166.149.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id 16F6C43D49; Fri, 14 Jan 2005 00:48:52 +0000 (GMT) (envelope-from kientzle@freebsd.org) Received: from freebsd.org (p54.kientzle.com [66.166.149.54]) by kientzle.com (8.12.9/8.12.9) with ESMTP id j0E0mpOZ026648; Thu, 13 Jan 2005 16:48:51 -0800 (PST) (envelope-from kientzle@freebsd.org) Message-ID: <41E716F3.20504@freebsd.org> Date: Thu, 13 Jan 2005 16:48:51 -0800 From: Tim Kientzle User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.4) Gecko/20031006 X-Accept-Language: en-us, en MIME-Version: 1.0 To: David Schultz References: <200501120735.j0C7ZABq048856@repoman.freebsd.org> <41E5ED66.4070902@freebsd.org> <20050113072153.GA28485@VARK.MIT.EDU> In-Reply-To: <20050113072153.GA28485@VARK.MIT.EDU> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit cc: "'freebsd-current@freebsd.org'" cc: Pawel Jakub Dawidek Subject: Re: fts improvements, alternatives X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 14 Jan 2005 00:48:52 -0000 [Moved to -current for further discussion.] David Schultz wrote: >Tim Kientzle wrote: >>Pawel Jakub Dawidek wrote: >> >>> Introduce new field 'fts_bignum' ... >>> This work is part of the BigDisk project: >>> http://www.FreeBSD.org/projects/bigdisk/ >> >>Any plans to deal with other fts limits ... ? > > Removing FTS_LOGICAL's path length limitation is nontrivial, but > given your experience with bsdtar, I'd say you're an ideal person > to do it. ;-) As it happens, I started tinkering with these ideas a while ago but haven't found time to finish it all. Here's a snapshot of the current WIP: http://people.freebsd.org/~kientzle/libarchive/src/tree.tgz This includes a simple main() test function that just does the rough equivalent of "find .". > .. fts() effectively uses chdir("..") to > climb up one level in the tree in chdir mode. If it chdir'd > across a symlink earlier, this would put it in the wrong place. > Perhaps you have a better solution, but my best idea is to keep > the parent directory open when chdiring ... The tree package does exactly this. It just keeps a flag with each ancestor node indicating the type of traversal that's needed. > A more uniform approach would be to ... always keep the > parent open when descending a level. Unfortunately, this limits > the traversal depth to the number of available file descriptors. The tree package does not do this for exactly this reason. I have some ideas for handling the case where the number of symlinks exceeds the available file descriptors, but that doesn't seem particularly pressing at the moment. > (On the other hand, I would argue that anyone with a directory > tree a few thousand levels deep is asking for trouble.) I thought you *wanted* to support big disks! ;-) I started this work partly because I wanted to really stress the long pathname support in bsdtar. I've archived directory trees with megabyte pathnames (several thousand directory levels crossing several hundred symlinks) in testing. Of course, I can't yet *dearchive* such deep trees. That seems to be a harder problem. The tree package also keeps a *lot* less data in memory than fts. It has no trouble with million-entry directories, for example. In comparison, the current ls crashes on such large directories in part because of the memory required for fts. The tree package is quite a bit different in many respects: * The traversal is in a very different order, for instance. * It has a completely different API than fts. It's fully opaque (so should be easier to change in the future, unlike fts). * It takes a very different approach to determine when to visit a child. In particular, instead of the client specifying a mode and optionally inhibiting the descent through a "prune" request, the tree package has the client specifically request descent. If you request descent for *every* item, you'll end up with a logical traversal; if you request descent for every dir, you'll end up with a physical traversal. (The tree package is smart enough to ignore any descent request that isn't for a dir or a link to a dir.) Feedback, suggestions, etc. appreciated. Tim