From owner-freebsd-fs@FreeBSD.ORG  Sat Sep  8 10:07:54 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 8CA0116A420
	for <freebsd-fs@freebsd.org>; Sat,  8 Sep 2007 10:07:54 +0000 (UTC)
	(envelope-from qpadla@gmail.com)
Received: from nf-out-0910.google.com (nf-out-0910.google.com [64.233.182.191])
	by mx1.freebsd.org (Postfix) with ESMTP id 15BBA13C467
	for <freebsd-fs@freebsd.org>; Sat,  8 Sep 2007 10:07:53 +0000 (UTC)
	(envelope-from qpadla@gmail.com)
Received: by nf-out-0910.google.com with SMTP id k4so534322nfd
	for <freebsd-fs@freebsd.org>; Sat, 08 Sep 2007 03:07:52 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=beta;
	h=domainkey-signature:received:received:from:reply-to:to:subject:date:user-agent:mime-version:content-type:content-transfer-encoding:message-id;
	bh=hS0I9p4MEpL3wbjmDBRIS+4MSKEkdNNgihcSS06DKRM=;
	b=WJvsPsdxa+oeJ0mEVDQpZSfS5JDSddtLb+XzRX8B7WfVJpwJH0/X+Jo5qBcdaBtf919AotIitsheBu+vGmgPmXj7d5ZG89myAOLEaPnog/aYixYg8CIFVHhtfGO0aRZ/IcKTnhGz+kp2nglce9PoxFvroK2l0YTXaMQuwOmdmOM=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta;
	h=received:from:reply-to:to:subject:date:user-agent:mime-version:content-type:content-transfer-encoding:message-id;
	b=DF5qcyhPpz+bGaGGXMb2n08FKzwTAyw9ghRgB/ZKtBJMMDCM6pB5iVtG4k+SIWbJ4ee4kY7zmVWXojzYLl8qyvISlfSWXw6/5N3q4JSIOwC0QgYwI0mvOn3fQMZ4UfXLaEpR6JcFXkj9jOAfR2RoA7R9tGCDhH2USbOe2PwgU14=
Received: by 10.78.168.1 with SMTP id q1mr1043649hue.1189244615224;
	Sat, 08 Sep 2007 02:43:35 -0700 (PDT)
Received: from orion ( [77.109.37.125])
	by mx.google.com with ESMTPS id 32sm1184525hui.2007.09.08.02.43.29
	(version=TLSv1/SSLv3 cipher=OTHER);
	Sat, 08 Sep 2007 02:43:34 -0700 (PDT)
From: Nikolay Pavlov <qpadla@gmail.com>
To: freebsd-fs@freebsd.org,
 freebsd-current@freebsd.org
Date: Sat, 8 Sep 2007 12:43:44 +0300
User-Agent: KMail/1.9.7
MIME-Version: 1.0
Content-Type: multipart/signed; boundary="nextPart3566643.daDD0r832G";
	protocol="application/pgp-signature"; micalg=pgp-sha1
Content-Transfer-Encoding: 7bit
Message-Id: <200709081243.48890.qpadla@gmail.com>
Cc: 
Subject: On-disk indexing for "Project Ideas" page
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: qpadla@gmail.com
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 08 Sep 2007 10:07:54 -0000

--nextPart3566643.daDD0r832G
Content-Type: text/plain;
  charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

Recently while reading "Design and Implementation of FreeBSD operation=20
system" by Marshall Kirk McKusick and gnn i have found a very intresting=20
paragraph regarding UFS2 implementation, indexing and B-trees. According=20
to it on-disk indexes was not implemented, but some structures was=20
reserved for future development. May be some SOC students could implement=20
this in future. How about to adding this into Project Ideas page?=20
Let me quote from the paragraph "8.3 Naming":

=46inding of Names in Directories

A common request to the filesystem is to lookup a specific name in a=20
directory. The kernel usually does the lookup by starting at the beginning=
=20
of the directory and going through, comparing each entry in turn. First,=20
the length of the sought-after name is compared with the length of the=20
name being checked. If the lengths are identical, a string comparison of=20
the name being sought and the directory entry is made. If they match, the=20
search is complete; if they fail, either in the length or in the string=20
comparison, the search continues with the next entry. Whenever a name is=20
found, its name and containing directory are entered into the systemwide=20
name cache described in Section 6.6. Whenever a search is unsuccessful, an=
=20
entry is made in the cache showing that the name does not exist in the=20
particular directory. Before starting a directory scan, the kernel looks=20
for the name in the cache. If either a positive or negative entry is=20
found, the directory scan can be avoided.

Another common operation is to lookup all the entries in a directory. For=20
example, many programs do a stat system call on each name in a directory=20
in the order that the names appear in the directory. To improve=20
performance for these programs, the kernel maintains the directory offset=20
of the last successful lookup for each directory. Each time that a lookup=20
is done in that directory, the search is started from the offset at which=20
the previous name was found (instead of from the beginning of the=20
directory). For programs that step sequentially through a directory with n=
=20
files, search time decreases from Order(n2) to Order(n).

One quick benchmark that demonstrates the maximum effectiveness of the=20
cache is running the ls -l command on a directory containing 600 files. On=
=20
a system that retains the most recent directory offset, the amount of=20
system time for this test is reduced by 85 percent. Unfortunately, the=20
maximum effectiveness is much greater than the average effectiveness.=20
Although the cache is 90 percent effective when hit, it is applicable to=20
only about 25 percent of the names being looked up. Despite the amount of=20
time spent in the lookup routine itself decreasing substantially, the=20
improvement is diminished because more time is spent in the routines that=20
that routine calls. Each cache miss causes a directory to be accessed=20
twice=E2=80=94once to search from the middle to the end and once to search =
from=20
the beginning to the middle.

These caches provide good directory lookup performance but are ineffective=
=20
for large directories that have a high rate of entry creation and=20
deletion. Each time a new directory entry is created, the kernel must scan=
=20
the entire directory to ensure that the entry does not already exist. When=
=20
an existing entry is deleted, the kernel must scan the directory to find=20
the entry to be removed. For directories with many entries these linear=20
scans are time-consuming.

The approach to solving this problem in FreeBSD 5.2 is to introduce dynamic=
=20
directory hashing that retrofits a directory indexing system to UFS [Dowse=
=20
& Malone, 2002]. To avoid repeated linear searches of large directories,=20
the dynamic directory hashing builds a hash table of directory entries on=20
the fly when the directory is first accessed. This table avoids directory=20
scans on later lookups, creates, and deletes. Unlike filesystems=20
originally designed with large directories in mind, these indices are not=20
saved on disk and so the system is backward compatible. The drawback is=20
that the indices need to be built the first time that a large directory is=
=20
encountered after each system reboot. The effect of the dynamic directory=20
hashing is that large directories in UFS cause minimal performance=20
problems.

When we built UFS2, we contemplated solving the large directory update=20
problem by changing to a more complex directory structure such as one that=
=20
uses B-trees. This technique is used in many modern filesystems such as=20
XFS [Sweeney et al., 1996], JFS [Best & Kleikamp, 2003], ReiserFS [Reiser,=
=20
2001], and in later versions of Ext2 [Phillips, 2001]. We decided not to=20
make the change at the time that UFS2 was first implemented for several=20
reasons. First, we had limited time and resources, and we wanted to get=20
something working and stable that could be used in the time frame of=20
=46reeBSD 5.2. By keeping the same directory format, we were able to reuse=
=20
all the directory code from UFS1, did not have to change numerous=20
filesystem utilities to understand and maintain a new directory format,=20
and were able to produce a stable and reliable filesystem in the time=20
frame available to us. The other reason that we felt that we could retain=20
the existing directory structure is because of the dynamic directory=20
hashing that was added to FreeBSD.

Borrowing the technique used by the Ext2 filesystem a flag was also added=20
to show that an on-disk indexing structure is supported for directories=20
[Phillips, 2001]. This flag is unconditionally turned off by the existing=20
implementation of UFS. In the future, if an implementation of an on-disk=20
directory-indexing structure is added, the implementations that support it=
=20
will not turn the flag off. Index-supporting kernels will maintain the=20
indices and leave the flag on. If an old non-index-supporting kernel is=20
run, it will turn off the flag so that when the filesystem is once again=20
run under a new kernel, the new kernel will discover that the indexing=20
flag has been turned off and will know that the indices may be out date=20
and have to be rebuilt before being used. The only constraint on an=20
implementation of the indices is that they have to be an auxiliary data=20
structure that references the old linear directory format.


=2D-=20
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =20
=2D Best regards, Nikolay Pavlov. <<<-----------------------------------   =
=20
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =20


--nextPart3566643.daDD0r832G
Content-Type: application/pgp-signature; name=signature.asc 
Content-Description: This is a digitally signed message part.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQBG4m7U/2R6KvEYGaIRAkA9AJ9v24dpn2ptSVReq1qxZmLeHy3MmACfcnYm
XukTYaTHTXrzVmbG0BAGZxs=
=Q+9o
-----END PGP SIGNATURE-----

--nextPart3566643.daDD0r832G--