Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 8 Sep 2007 12:43:44 +0300
From:      Nikolay Pavlov <qpadla@gmail.com>
To:        freebsd-fs@freebsd.org, freebsd-current@freebsd.org
Subject:   On-disk indexing for "Project Ideas" page
Message-ID:  <200709081243.48890.qpadla@gmail.com>

next in thread | raw e-mail | index | archive | help
--nextPart3566643.daDD0r832G
Content-Type: text/plain;
  charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

Recently while reading "Design and Implementation of FreeBSD operation=20
system" by Marshall Kirk McKusick and gnn i have found a very intresting=20
paragraph regarding UFS2 implementation, indexing and B-trees. According=20
to it on-disk indexes was not implemented, but some structures was=20
reserved for future development. May be some SOC students could implement=20
this in future. How about to adding this into Project Ideas page?=20
Let me quote from the paragraph "8.3 Naming":

=46inding of Names in Directories

A common request to the filesystem is to lookup a specific name in a=20
directory. The kernel usually does the lookup by starting at the beginning=
=20
of the directory and going through, comparing each entry in turn. First,=20
the length of the sought-after name is compared with the length of the=20
name being checked. If the lengths are identical, a string comparison of=20
the name being sought and the directory entry is made. If they match, the=20
search is complete; if they fail, either in the length or in the string=20
comparison, the search continues with the next entry. Whenever a name is=20
found, its name and containing directory are entered into the systemwide=20
name cache described in Section 6.6. Whenever a search is unsuccessful, an=
=20
entry is made in the cache showing that the name does not exist in the=20
particular directory. Before starting a directory scan, the kernel looks=20
for the name in the cache. If either a positive or negative entry is=20
found, the directory scan can be avoided.

Another common operation is to lookup all the entries in a directory. For=20
example, many programs do a stat system call on each name in a directory=20
in the order that the names appear in the directory. To improve=20
performance for these programs, the kernel maintains the directory offset=20
of the last successful lookup for each directory. Each time that a lookup=20
is done in that directory, the search is started from the offset at which=20
the previous name was found (instead of from the beginning of the=20
directory). For programs that step sequentially through a directory with n=
=20
files, search time decreases from Order(n2) to Order(n).

One quick benchmark that demonstrates the maximum effectiveness of the=20
cache is running the ls -l command on a directory containing 600 files. On=
=20
a system that retains the most recent directory offset, the amount of=20
system time for this test is reduced by 85 percent. Unfortunately, the=20
maximum effectiveness is much greater than the average effectiveness.=20
Although the cache is 90 percent effective when hit, it is applicable to=20
only about 25 percent of the names being looked up. Despite the amount of=20
time spent in the lookup routine itself decreasing substantially, the=20
improvement is diminished because more time is spent in the routines that=20
that routine calls. Each cache miss causes a directory to be accessed=20
twice=E2=80=94once to search from the middle to the end and once to search =
from=20
the beginning to the middle.

These caches provide good directory lookup performance but are ineffective=
=20
for large directories that have a high rate of entry creation and=20
deletion. Each time a new directory entry is created, the kernel must scan=
=20
the entire directory to ensure that the entry does not already exist. When=
=20
an existing entry is deleted, the kernel must scan the directory to find=20
the entry to be removed. For directories with many entries these linear=20
scans are time-consuming.

The approach to solving this problem in FreeBSD 5.2 is to introduce dynamic=
=20
directory hashing that retrofits a directory indexing system to UFS [Dowse=
=20
& Malone, 2002]. To avoid repeated linear searches of large directories,=20
the dynamic directory hashing builds a hash table of directory entries on=20
the fly when the directory is first accessed. This table avoids directory=20
scans on later lookups, creates, and deletes. Unlike filesystems=20
originally designed with large directories in mind, these indices are not=20
saved on disk and so the system is backward compatible. The drawback is=20
that the indices need to be built the first time that a large directory is=
=20
encountered after each system reboot. The effect of the dynamic directory=20
hashing is that large directories in UFS cause minimal performance=20
problems.

When we built UFS2, we contemplated solving the large directory update=20
problem by changing to a more complex directory structure such as one that=
=20
uses B-trees. This technique is used in many modern filesystems such as=20
XFS [Sweeney et al., 1996], JFS [Best & Kleikamp, 2003], ReiserFS [Reiser,=
=20
2001], and in later versions of Ext2 [Phillips, 2001]. We decided not to=20
make the change at the time that UFS2 was first implemented for several=20
reasons. First, we had limited time and resources, and we wanted to get=20
something working and stable that could be used in the time frame of=20
=46reeBSD 5.2. By keeping the same directory format, we were able to reuse=
=20
all the directory code from UFS1, did not have to change numerous=20
filesystem utilities to understand and maintain a new directory format,=20
and were able to produce a stable and reliable filesystem in the time=20
frame available to us. The other reason that we felt that we could retain=20
the existing directory structure is because of the dynamic directory=20
hashing that was added to FreeBSD.

Borrowing the technique used by the Ext2 filesystem a flag was also added=20
to show that an on-disk indexing structure is supported for directories=20
[Phillips, 2001]. This flag is unconditionally turned off by the existing=20
implementation of UFS. In the future, if an implementation of an on-disk=20
directory-indexing structure is added, the implementations that support it=
=20
will not turn the flag off. Index-supporting kernels will maintain the=20
indices and leave the flag on. If an old non-index-supporting kernel is=20
run, it will turn off the flag so that when the filesystem is once again=20
run under a new kernel, the new kernel will discover that the indexing=20
flag has been turned off and will know that the indices may be out date=20
and have to be rebuilt before being used. The only constraint on an=20
implementation of the indices is that they have to be an auxiliary data=20
structure that references the old linear directory format.


=2D-=20
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =20
=2D Best regards, Nikolay Pavlov. <<<-----------------------------------   =
=20
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =20


--nextPart3566643.daDD0r832G
Content-Type: application/pgp-signature; name=signature.asc 
Content-Description: This is a digitally signed message part.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQBG4m7U/2R6KvEYGaIRAkA9AJ9v24dpn2ptSVReq1qxZmLeHy3MmACfcnYm
XukTYaTHTXrzVmbG0BAGZxs=
=Q+9o
-----END PGP SIGNATURE-----

--nextPart3566643.daDD0r832G--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200709081243.48890.qpadla>