Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 04 May 2002 19:58:13 -0700
From:      Terry Lambert <tlambert2@mindspring.com>
To:        Bakul Shah <bakul@bitblocks.com>
Cc:        Scott Hess <scott@avantgo.com>, "Vladimir B. Grebenschikov" <vova@sw.ru>, fs@FreeBSD.ORG
Subject:   Re: Filesystem
Message-ID:  <3CD49FC5.D1B17CB7@mindspring.com>
References:  <200205042039.QAA16949@glatton.cnchost.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Bakul Shah wrote:
> If anything, I am advocating the zero, one and infinity rule.
> Don't place arbitrary limits and justify it as a good thing.

And, if anything, I'm saying "don't try to pretend that arbitrary
rules don't exist on 140+ UNIX systems, just because they don't
exist on 2-3 UNIX systems when configured with particular
non-default options.  8-).


> A directory provides a namespace and using it for that
> purpose for any number of files is a perfectly sane thing to
> do.

Time to whip out the ol' "reductio ad absurdum"...

Actually, it's not.

It turns out, the size of hard disks is finite, and will tend
to continue to be finite for the forseeable future.  So no matter
how you look at it, there's an "arbitrary limit" imposed by the
amount of storage you have, divided by the number of bytes it
takes to store an average entry.


> If it worked well enough, you wouldn't see abominations
> like ftp.netcom.com:users/ba/bakul (where they had use a two
> level scheme to improve access time).

Actually, this is a pretty reasonable thing to do; it gives a
nice limit on the number of entries when the thing scanning it
isn't running software, but is running wetware, instead.

No matter how you slice it, putting 10,000 files in a directory,
or even 1,000, generally results in significant barriers to
processing the thing by a human being.


> BTW, I too have come across a couple of cases where tens of
> thousands of files were stored in one dir.  In one case it
> was a "throw away" program that ended up being a critical
> tool and the prototype got used and expended instead of being
> simply rewritten.  By the time I got involved, it also
> depededed on Sybase but they couldn't figure out what to do
> with those zillion existing files!

Bite the bullet, and actually solve the problem, if it doesn't
work for you.  If it works for you, then file the observation
and move on.

In other words, perhaps the problem wasn't "what to do with those
zillion existing files", but "why do you think it's necessary to
*do* something with those zillion existing files?".  8-).


> > This case is an externalized interface used by programs, which
> > have a choice in their implementation, and choose badly.
> 
> Can you please translate that in simple english?

Yeah.  You have to interface to the underlying system to get
to the files, and you don't necessarily have control of which
underlying system you end up running on, so it behooves you to
not make implementation assumptions that depend on a particular
underlying system, or a particular implementation technology or
algorithm being present in the underlying system -- especially
if such technology is not implemented in the vast majority of
systems.

Or in even plainer English...

"Write portable code."


> > I agree that there is a lot of room for extending the basic OS
> > capabilities; I would also argue that there is generally a lot
> > of research in that area, as well, and that research isn't being
> > put into practice, for the most part, for a reason other than
> > "not invented here" or "it's too hard: let's go shopping" (though
> > that is sometimes the reason).  The main reason I think applies
> > is that legacy interoperability has more value than the other
> > benefits that the change brings.
> 
> XFS from SGI also has btree directories so at least some
> vendors are doing this.

Yet I don't see the adoption of this technology happening in
Linux, even though it's available for Linux, and the only places
I *do* see it being adopted are where it's integrated into the
default filesystem type for the particular OS platform.

This is why I've thought implementing XFS in FreeBSD was a losing
proposition: it's not a case of "if you build it, they will come",
it's a case of "it it comes with the OS, then, yeah, we'll leave
it turned on".


> I suspect in the free software community the reason is likely
> to be a) it is not hip enough,

See, most people have a general misunderstanding of what drives
Open Source projects.  They think they can just declare a project,
and, by doing so, hordes of programmers will descend upon it and
write the code for you.  Like army ants eating everything in their
path.  Or killer bees attacking someone trying to dig a post hole
through their nest.

The fact is that the only thing that motivates volunteerism on an
Open Source project is preexisting working code.

There is nothing so populous as a declared Open Source project
which goes nowhere for lack of something to tinker with.


> b) people who care & have expertise don't have time and/or
> inclination,

Certainly, my motivation to work on XFS (or anything else where it
can't be compiled into and distributed on a CDROM as the default
because of license conflicts) is pretty much zilch.

I think that people generally don't give engineers credit for
intelligence.  They treat them as if they were autistic savants,
unable to really understand the ramifications of their work.  RMS
certainly does this, when he assumes that programmers will program
for the love of it, with no reward asked or given, other than the
task itself.  I guess if you get enough grants and other funding
unrelated to your work product, you might develop that idea.

> c) NIH -- let
> us do our own cool thing even if it is just a tiny variation.
> BTW, I don't look down on any of these reasons.  It is just the
> way things are.

I look down on NIH.  It's blatant stupidity.  Generally, what people
call NIH comes down to other factors, though.  For example, a lot of
people think any Open Source license is equivalent, and do they are
incapable of understanding someone who writes new code to fulfill a
function, merely to get out from under a license.


> I don't recall the goals of UFS2 being published except
> extending some limits.

Read the DARPA document that initiated the work.  It goes into
some more detail.


> But to me this is a perfect example
> of NIH (I admit I don't know the details but seems that way).
> Why not just clone XFS?  SGI can already achieve amazing data
> rates with it, its design is proven by fire and it has a lot
> of good features.

The License.  The impossibility of distributing a CDROM that
installs a precompiled binary image with the FS on it due to
license issues making the resulting binary illegal, according
to most corporate IP lawyers (engineering opinions do not matter
in a legal risk analysis equation).


> Note that a number of companies are currently doing a lot of
> research in the FS area but I am afraid most of it will die
> with the companies.

Yes.  That's a matter for tort reform of IP law, requiring source
escrow to obtain any protection whatsoever.


> > Fast dir search won't be picked up until important applications
> > start to rely on it.  And important applications won't rely on
> > it until it's generally available.  So the only real way to break
> > the log-jam is to come up with a killer app, which relies on some
> > feature you want to proselytize.
> 
> No app has to critically rely on fast dir search for it to be
> useful.  It can be done (almost) transparently and most all
> look ups will speedup.  Store a "dir type code" somewhere in
> the dir. inode.  Use that to select the appropriate function
> table from namei().  Add a way to convert between dir types.

No app that you know of so far.  Maybe there is a "killer app"
that needs it.  it's doubtful, though.

If you're right, it means that fast and large directory performance
is irrelevent to the big picture.  I'd actually agree with that
assessment.

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3CD49FC5.D1B17CB7>