From owner-freebsd-hackers  Sun Oct 29  8:27:25 2000
Delivered-To: freebsd-hackers@freebsd.org
Received: from earth.backplane.com (placeholder-dcat-1076843399.broadbandoffice.net [64.47.83.135])
	by hub.freebsd.org (Postfix) with ESMTP id DD1C637B479
	for <freebsd-hackers@FreeBSD.ORG>; Sun, 29 Oct 2000 08:27:22 -0800 (PST)
Received: (from dillon@localhost)
	by earth.backplane.com (8.11.1/8.9.3) id e9TGRMP70398;
	Sun, 29 Oct 2000 08:27:22 -0800 (PST)
	(envelope-from dillon)
Date: Sun, 29 Oct 2000 08:27:22 -0800 (PST)
From: Matt Dillon <dillon@earth.backplane.com>
Message-Id: <200010291627.e9TGRMP70398@earth.backplane.com>
To: Ryan Thompson <ryan@sasknow.com>, freebsd-hackers@FreeBSD.ORG
Subject: Re: Filesystem holes
References:  <Pine.GSO.4.21.0010282230110.10039-100000@fondue.ICSI.Berkeley.EDU>
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

:> Hi all...
:> 
:> One the tasks that I have undertaken lately is to improve the efficiency
:> of a couple of storage facilities we use internally, here.  Basically,
:> they are moderate-size tables currently implemented in SQL, which is OK in
:> terms of performance, but the hash function is breaking down and the
:> storage is rather inefficient for our table of about 2,850,000 members
:> (~2.1 GB total storage).  There are 64M possible hash values in our
:> current implementation, and our record size is variable, but could be
:> safely fixed at about 1.5KB... So, total storage if all values were used
:> would be about 96GB.  (See where I'm going with this?)
:> 
:> One of the options I am considering is actually using address calculation,
:> and taking advantage of filesystem holes, to keep storage down to what is
:> actually being used, while providing instant lookups.
:> 
:> The single file would be about 96G addressable bytes... But the actual
:> block count would be much lower.  I suppose I will have to create a series
:> of these files and divide the problem into < 4GB chunks, but one
:> lookup/store will still be done with one calculation and one disk seek
:> (just more filehandles open).
:> 
:> Deletes seem problematic.  My question is, does the operating system
:> provide any way to free blocks used in the middle of a file?
:> 
:> Must I periodically re-create these files (long and slow process, but not
:> so bad if done infrequently) to reclaim space, or is there a way to free
:> arbitrary blocks in a file in userland (WITHOUT trashing the filesystem?
:> :-)
:> 
:> - Ryan
:> 
:> -- 
:>   Ryan Thompson <ryan@sasknow.com>
:>   Network Administrator, Accounts
:>   Phone: +1 (306) 664-1161
:> 
:>   SaskNow Technologies     http://www.sasknow.com
:>   #106-380 3120 8th St E   Saskatoon, SK  S7H 0W2

    I would strongly recommend using a B+Tree instead of a hash table.  With
    a hash table slightly different lookups will seek all over the place.
    With a B+Tree lookups will stay more localized.

    For example, if you insert a (nearly) sorted dictionary of words into an
    SQL table with a B+Tree, the memory working set required to insert
    efficiently stays constant whether you are inserting a thousand, a million,
    or a billion records.  That is, the memory requirement is effecitvely
    O(LogN) for a disk requirement of O(N).   With a hash table, the memory
    working set required to insert efficiently is approximately O(N) for a disk
    requirement of O(N)... much much worse.

    A B+Tree will also scale with the size of the dataset being managed,
    so you do not have to preallocate or prereserve file space.

    We are using an in-house SQL database for our product (which I can't
    really talk about right now) and, using B+Tree's, I can consistently
    insert 300 records/sec on a cheap desktop PC (which I use for testing)
    with 64MB of ram (remember, insert requires an internal select to check
    for key conflicts), even when the test database grows to several
    gigabytes.

    In anycase, a B+Tree is the way to go.  Hash tables are useful only in
    very, very, very specialized circumstances.  In all other circumstances
    they are no better then a pain in the rear.

						-Matt


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message