Date: Tue, 31 Oct 2000 01:07:48 -0800 (PST) From: Matt Dillon <dillon@earth.backplane.com> To: Terry Lambert <tlambert@primenet.com> Cc: ryan@sasknow.com (Ryan Thompson), freebsd-hackers@FreeBSD.ORG Subject: Re: Filesystem holes Message-ID: <200010310907.e9V97mk17233@earth.backplane.com> References: <200010310847.BAA28086@usr02.primenet.com>
next in thread | previous in thread | raw e-mail | index | archive | help
: :> Ahh.. yes, I know. I'm a filesystem expert :-) However, that said, I :> will tell you quite frankly that virtually *nobody* depends on holes :> for efficient storage. There are only a few problems where it's :> practical.... some forms of executables, and sparse matrixes. That's :> pretty much it. : :Your master password database. Most sendmail maps. Anything :else that uses the Berkeley DB, like message catalog files, :locales, etc.. Actually less then you think. Even though DB utilizes the concept of a sparse file, if you check carefully you will note that your sendmail maps and password file databases aren't actually all that sparse. With an 8K filesystem block size the default DB multiplier usually results in virtually no sparseness at all. It takes tuning to get any sort of sparseness and even then you don't get much benefit from it. The only real benefit is in the hash table size factor ... the hash array may be sparse, but the physical file underlying it probably isn't. Not even USENET news history files, which can run into the gigabytes, actually wind up being that sparse. Also keep in mind that Berkeley DB is very old, even with all the rewrites, and the original hash algorithm was chosen for expediency rather then for the 'best' efficiency. Our current DB library has a btree access method, and for big data sets it works a whole lot better then the hash method. It doesn't require tuning, for one. :Frankly, sparse files have a huge number of uses, particularly :when applied to persistant storage of data of the kind you'd :see in chapter 5, section 5.4.x and chapter 6 in Knuth's. : :Plus your FFS filesystem itself is a sparse matrix. It'd be :real useful to be able to "free up holes" in a file, if I :wanted to use one to do user space work on an FS design, for :example, a log structured FS, where I wanted to be able to :experiment with a "cleaner" process that recovered extents. : :I'd actually be able to tell real quickly whether it was :working by just setting an allocation range that I expect :my iterative testing to stay within (if it goes over or under :the range while I'm moving stuff around and cleaning at the :same time, I'll know there's a bug in my daemon). : :Personally, I'm not rich enough to be able to burn disk space :so easily. : Terry Lambert : terry@lambert.org I agree that sparse files should not be discarded out of hand. There are very few real problems that need them though. I can think of a handful, all very specialized.... for example, the VN device uses the concept of a sparse-file (actually a sparse backing for the filesystem layer), as do Kirk's softupdates snapshots (I believe). Sparse matrixes are the big math problem that benefit, but only because the solution to a sparse matrix problem is not even close to random so the sparse matrix winds up still being sparse all the way to the end of the solution. But these are extremely specialized problems, not general datastore problems, and nearly all of these problems are tuned (or inherently) specific to the block size of the underlying system. Using a sparse file for general data store just isn't all that hot an idea, because by its very nature data store is, well, storing data. Packing it is usually the way to go. -Matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200010310907.e9V97mk17233>