From owner-freebsd-current@FreeBSD.ORG  Thu Apr 22 18:27:35 2004
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 687FD16A4CE
	for <freebsd-current@freebsd.org>;
	Thu, 22 Apr 2004 18:27:35 -0700 (PDT)
Received: from kientzle.com (h-66-166-149-50.snvacaid.covad.net
	[66.166.149.50])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 0510343D5C
	for <freebsd-current@freebsd.org>;
	Thu, 22 Apr 2004 18:27:35 -0700 (PDT)
	(envelope-from tim@kientzle.com)
Received: from kientzle.com (p54.kientzle.com [66.166.149.54])
	by kientzle.com (8.12.9/8.12.9) with ESMTP id i3N1RS90085062;
	Thu, 22 Apr 2004 18:27:28 -0700 (PDT)
	(envelope-from tim@kientzle.com)
Message-ID: <40887100.3040606@kientzle.com>
Date: Thu, 22 Apr 2004 18:27:28 -0700
From: Tim Kientzle <tim@kientzle.com>
User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.4) Gecko/20031006
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: Eric Anderson <anderson@centtech.com>
References: <40867A5D.9010600@centtech.com>
In-Reply-To: <40867A5D.9010600@centtech.com>
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
cc: freebsd-current@freebsd.org
Subject: Re: Directories with 2million files
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 23 Apr 2004 01:27:35 -0000

Eric Anderson wrote:
> First, let me say that I am impressed (but not shocked) - FreeBSD 
> quietly handled my building of a directory with 2055476 files in it.  
> 
> However, several tools seem to choke on that many files ...
> 
> $ ls -al | wc -l
> ls: fts_read: Cannot allocate memory
>       0
> 
> Watching memory usage, it goes up to about 515Mb, and runs out of memory 
> (can't swap it), and then dies. (I only have 768Mb in this machine).

Not "can't swap", but "doesn't need to swap."  Your 'top' output
shows you've got plenty of free swap, so that's not the issue.
I suspect you have got a per-process data limit of 512MB, so the
kernel is killing the process when it gets too big.  Up that
limit, and it should succeed.

What does "limit -d" say?
What is the 'datasize' set to in /etc/login.conf?
What are you using for DFLDSIZ in your kernel config file?
(See /usr/src/sys/conf/NOTES for more information on DFLDSIZ,
which I believe defaults to 512MB.)

If you're using directories with over 2million files,
you probably have other processes that could use
more memory as well, so upping this limit is advisable.

The Real Fix

Of course, 'ls' should probably not be using this
much memory.

Having some recent experience with fts, I think
I can explain why it does use so much memory and what
would be required to fix this, in case some enterprising
soul would like to tackle it.  (I might have a few
details wrong, of course.  Caveat hacker.)

ls, du, and find all use the fts library
(src/lib/libc/gen/fts.c) to walk the dir
heirarchy.  fts is pretty nifty, but it can
be memory-hungry.

To understand why, I'll first explain how I tried to
walk a directory tree once before I started using fts
myself:  opendir() each directory,
read and return entries one at a time (recursively handling
each directory as you encounter it), then closedir().  This
approach is simple, obvious, and breaks on very deeply
nested directory trees because it keeps an open file
handle for each nested directory.  Instead, the BSD fts
library reads each dir entirely into memory when it sees
it, so it can immediately close that directory.  In your
case, the fts library is keeping an in-memory list of
every file in the directory.

ls is then requesting all of that information from fts,
formatting all of the lines and then holding them in
memory so that it can compute the optimal column widths.
This last bit surprised me, too, when I stumbled across
it.  I'm not at all certain that 'ls -lf' should do that.

As a result, 'ls' has two copies of the dir information:
one copy in the form of 'stat' structures within the fts
library and another copy in the form of partially-formatted
lines for the final output.  Yes, that's a lot of memory.

Here are some ideas for improving 'ls'; maybe someone
will take a stab at implementing one or more of these:

* Easy: Change ls -lf to not store the lines and adjust
    the column widths but just output the lines immediately,
    using pre-set minimum column widths.  Less pretty, but
    certainly agrees with the expected behavior of -f.

* Moderate:  Have 'ls' not use fts for generating a listing
    for single directories.  Instead, use opendir/readdir/closedir
    directly.  Combined with the above, ls -lf would be exactly
    as fast as you would expect.

* Moderate-Difficult: Make fts smarter/more flexible:
    = Maybe fts should not always read a full directory
      as soon as it sees it (for instance, it might only slurp
      the remainder of the directory into memory if it encountered a
      subdir, if the client requires sorted output, or if fts
      exceeded some small limit on file descriptors).
    = Adding more flags to fts might help (if fts knows the client
      will never re-visit files, for example, then fts can be
      more aggressive about releasing memory; similarly, a number
      of clients, including 'ls', need _directories_ to be visited
      in sorted order, but not regular files).  ls should arguably
      be doing its own sorting, anyway, because it has to keep
      the lines in memory anyway.
    = fts might only read the first <number>
      entries initially, reading the remainder only on demand.

    Some combination of the above would allow fts to use a lot
    less memory in a number of common situations.

Overhauling fts could make a very good student project for
someone at the senior/masters level.  It's a surprisingly
tricky exercise in algorithm and library API design.  There
are a number of trade-offs in fts that would require some
real insight to fully overcome.


 > du does the exact same thing.

This puzzles me.  It shouldn't use nearly as much
memory as ls does, because 'du' doesn't store
any per-file information outside of fts.  That's a
real head-scratcher.  If I have time, I'll look at
this one.


> find, however, works fine (and is very fast!):
> $ time find .  | wc -l
> 2055476 

'find' also uses fts, so it should be using approximately
as much memory as 'du'.  I find it a bit peculiar that
it behaves so differently.

Tim Kientzle