Date: Sat, 04 May 2002 19:58:13 -0700 From: Terry Lambert <tlambert2@mindspring.com> To: Bakul Shah <bakul@bitblocks.com> Cc: Scott Hess <scott@avantgo.com>, "Vladimir B. Grebenschikov" <vova@sw.ru>, fs@FreeBSD.ORG Subject: Re: Filesystem Message-ID: <3CD49FC5.D1B17CB7@mindspring.com> References: <200205042039.QAA16949@glatton.cnchost.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Bakul Shah wrote: > If anything, I am advocating the zero, one and infinity rule. > Don't place arbitrary limits and justify it as a good thing. And, if anything, I'm saying "don't try to pretend that arbitrary rules don't exist on 140+ UNIX systems, just because they don't exist on 2-3 UNIX systems when configured with particular non-default options. 8-). > A directory provides a namespace and using it for that > purpose for any number of files is a perfectly sane thing to > do. Time to whip out the ol' "reductio ad absurdum"... Actually, it's not. It turns out, the size of hard disks is finite, and will tend to continue to be finite for the forseeable future. So no matter how you look at it, there's an "arbitrary limit" imposed by the amount of storage you have, divided by the number of bytes it takes to store an average entry. > If it worked well enough, you wouldn't see abominations > like ftp.netcom.com:users/ba/bakul (where they had use a two > level scheme to improve access time). Actually, this is a pretty reasonable thing to do; it gives a nice limit on the number of entries when the thing scanning it isn't running software, but is running wetware, instead. No matter how you slice it, putting 10,000 files in a directory, or even 1,000, generally results in significant barriers to processing the thing by a human being. > BTW, I too have come across a couple of cases where tens of > thousands of files were stored in one dir. In one case it > was a "throw away" program that ended up being a critical > tool and the prototype got used and expended instead of being > simply rewritten. By the time I got involved, it also > depededed on Sybase but they couldn't figure out what to do > with those zillion existing files! Bite the bullet, and actually solve the problem, if it doesn't work for you. If it works for you, then file the observation and move on. In other words, perhaps the problem wasn't "what to do with those zillion existing files", but "why do you think it's necessary to *do* something with those zillion existing files?". 8-). > > This case is an externalized interface used by programs, which > > have a choice in their implementation, and choose badly. > > Can you please translate that in simple english? Yeah. You have to interface to the underlying system to get to the files, and you don't necessarily have control of which underlying system you end up running on, so it behooves you to not make implementation assumptions that depend on a particular underlying system, or a particular implementation technology or algorithm being present in the underlying system -- especially if such technology is not implemented in the vast majority of systems. Or in even plainer English... "Write portable code." > > I agree that there is a lot of room for extending the basic OS > > capabilities; I would also argue that there is generally a lot > > of research in that area, as well, and that research isn't being > > put into practice, for the most part, for a reason other than > > "not invented here" or "it's too hard: let's go shopping" (though > > that is sometimes the reason). The main reason I think applies > > is that legacy interoperability has more value than the other > > benefits that the change brings. > > XFS from SGI also has btree directories so at least some > vendors are doing this. Yet I don't see the adoption of this technology happening in Linux, even though it's available for Linux, and the only places I *do* see it being adopted are where it's integrated into the default filesystem type for the particular OS platform. This is why I've thought implementing XFS in FreeBSD was a losing proposition: it's not a case of "if you build it, they will come", it's a case of "it it comes with the OS, then, yeah, we'll leave it turned on". > I suspect in the free software community the reason is likely > to be a) it is not hip enough, See, most people have a general misunderstanding of what drives Open Source projects. They think they can just declare a project, and, by doing so, hordes of programmers will descend upon it and write the code for you. Like army ants eating everything in their path. Or killer bees attacking someone trying to dig a post hole through their nest. The fact is that the only thing that motivates volunteerism on an Open Source project is preexisting working code. There is nothing so populous as a declared Open Source project which goes nowhere for lack of something to tinker with. > b) people who care & have expertise don't have time and/or > inclination, Certainly, my motivation to work on XFS (or anything else where it can't be compiled into and distributed on a CDROM as the default because of license conflicts) is pretty much zilch. I think that people generally don't give engineers credit for intelligence. They treat them as if they were autistic savants, unable to really understand the ramifications of their work. RMS certainly does this, when he assumes that programmers will program for the love of it, with no reward asked or given, other than the task itself. I guess if you get enough grants and other funding unrelated to your work product, you might develop that idea. > c) NIH -- let > us do our own cool thing even if it is just a tiny variation. > BTW, I don't look down on any of these reasons. It is just the > way things are. I look down on NIH. It's blatant stupidity. Generally, what people call NIH comes down to other factors, though. For example, a lot of people think any Open Source license is equivalent, and do they are incapable of understanding someone who writes new code to fulfill a function, merely to get out from under a license. > I don't recall the goals of UFS2 being published except > extending some limits. Read the DARPA document that initiated the work. It goes into some more detail. > But to me this is a perfect example > of NIH (I admit I don't know the details but seems that way). > Why not just clone XFS? SGI can already achieve amazing data > rates with it, its design is proven by fire and it has a lot > of good features. The License. The impossibility of distributing a CDROM that installs a precompiled binary image with the FS on it due to license issues making the resulting binary illegal, according to most corporate IP lawyers (engineering opinions do not matter in a legal risk analysis equation). > Note that a number of companies are currently doing a lot of > research in the FS area but I am afraid most of it will die > with the companies. Yes. That's a matter for tort reform of IP law, requiring source escrow to obtain any protection whatsoever. > > Fast dir search won't be picked up until important applications > > start to rely on it. And important applications won't rely on > > it until it's generally available. So the only real way to break > > the log-jam is to come up with a killer app, which relies on some > > feature you want to proselytize. > > No app has to critically rely on fast dir search for it to be > useful. It can be done (almost) transparently and most all > look ups will speedup. Store a "dir type code" somewhere in > the dir. inode. Use that to select the appropriate function > table from namei(). Add a way to convert between dir types. No app that you know of so far. Maybe there is a "killer app" that needs it. it's doubtful, though. If you're right, it means that fast and large directory performance is irrelevent to the big picture. I'd actually agree with that assessment. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3CD49FC5.D1B17CB7>