Date: Thu, 13 Jan 2005 16:48:51 -0800 From: Tim Kientzle <kientzle@freebsd.org> To: David Schultz <das@freebsd.org> Cc: Pawel Jakub Dawidek <pjd@freebsd.org> Subject: Re: fts improvements, alternatives Message-ID: <41E716F3.20504@freebsd.org> In-Reply-To: <20050113072153.GA28485@VARK.MIT.EDU> References: <200501120735.j0C7ZABq048856@repoman.freebsd.org> <41E5ED66.4070902@freebsd.org> <20050113072153.GA28485@VARK.MIT.EDU>
next in thread | previous in thread | raw e-mail | index | archive | help
[Moved to -current for further discussion.] David Schultz wrote: >Tim Kientzle wrote: >>Pawel Jakub Dawidek wrote: >> >>> Introduce new field 'fts_bignum' ... >>> This work is part of the BigDisk project: >>> http://www.FreeBSD.org/projects/bigdisk/ >> >>Any plans to deal with other fts limits ... ? > > Removing FTS_LOGICAL's path length limitation is nontrivial, but > given your experience with bsdtar, I'd say you're an ideal person > to do it. ;-) As it happens, I started tinkering with these ideas a while ago but haven't found time to finish it all. Here's a snapshot of the current WIP: http://people.freebsd.org/~kientzle/libarchive/src/tree.tgz This includes a simple main() test function that just does the rough equivalent of "find .". > .. fts() effectively uses chdir("..") to > climb up one level in the tree in chdir mode. If it chdir'd > across a symlink earlier, this would put it in the wrong place. > Perhaps you have a better solution, but my best idea is to keep > the parent directory open when chdiring ... The tree package does exactly this. It just keeps a flag with each ancestor node indicating the type of traversal that's needed. > A more uniform approach would be to ... always keep the > parent open when descending a level. Unfortunately, this limits > the traversal depth to the number of available file descriptors. The tree package does not do this for exactly this reason. I have some ideas for handling the case where the number of symlinks exceeds the available file descriptors, but that doesn't seem particularly pressing at the moment. <grin> > (On the other hand, I would argue that anyone with a directory > tree a few thousand levels deep is asking for trouble.) I thought you *wanted* to support big disks! ;-) I started this work partly because I wanted to really stress the long pathname support in bsdtar. I've archived directory trees with megabyte pathnames (several thousand directory levels crossing several hundred symlinks) in testing. Of course, I can't yet *dearchive* such deep trees. That seems to be a harder problem. The tree package also keeps a *lot* less data in memory than fts. It has no trouble with million-entry directories, for example. In comparison, the current ls crashes on such large directories in part because of the memory required for fts. The tree package is quite a bit different in many respects: * The traversal is in a very different order, for instance. * It has a completely different API than fts. It's fully opaque (so should be easier to change in the future, unlike fts). * It takes a very different approach to determine when to visit a child. In particular, instead of the client specifying a mode and optionally inhibiting the descent through a "prune" request, the tree package has the client specifically request descent. If you request descent for *every* item, you'll end up with a logical traversal; if you request descent for every dir, you'll end up with a physical traversal. (The tree package is smart enough to ignore any descent request that isn't for a dir or a link to a dir.) Feedback, suggestions, etc. appreciated. Tim
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?41E716F3.20504>