Date: Sat, 8 Sep 2007 12:43:44 +0300 From: Nikolay Pavlov <qpadla@gmail.com> To: freebsd-fs@freebsd.org, freebsd-current@freebsd.org Subject: On-disk indexing for "Project Ideas" page Message-ID: <200709081243.48890.qpadla@gmail.com>
next in thread | raw e-mail | index | archive | help
--nextPart3566643.daDD0r832G Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Recently while reading "Design and Implementation of FreeBSD operation=20 system" by Marshall Kirk McKusick and gnn i have found a very intresting=20 paragraph regarding UFS2 implementation, indexing and B-trees. According=20 to it on-disk indexes was not implemented, but some structures was=20 reserved for future development. May be some SOC students could implement=20 this in future. How about to adding this into Project Ideas page?=20 Let me quote from the paragraph "8.3 Naming": =46inding of Names in Directories A common request to the filesystem is to lookup a specific name in a=20 directory. The kernel usually does the lookup by starting at the beginning= =20 of the directory and going through, comparing each entry in turn. First,=20 the length of the sought-after name is compared with the length of the=20 name being checked. If the lengths are identical, a string comparison of=20 the name being sought and the directory entry is made. If they match, the=20 search is complete; if they fail, either in the length or in the string=20 comparison, the search continues with the next entry. Whenever a name is=20 found, its name and containing directory are entered into the systemwide=20 name cache described in Section 6.6. Whenever a search is unsuccessful, an= =20 entry is made in the cache showing that the name does not exist in the=20 particular directory. Before starting a directory scan, the kernel looks=20 for the name in the cache. If either a positive or negative entry is=20 found, the directory scan can be avoided. Another common operation is to lookup all the entries in a directory. For=20 example, many programs do a stat system call on each name in a directory=20 in the order that the names appear in the directory. To improve=20 performance for these programs, the kernel maintains the directory offset=20 of the last successful lookup for each directory. Each time that a lookup=20 is done in that directory, the search is started from the offset at which=20 the previous name was found (instead of from the beginning of the=20 directory). For programs that step sequentially through a directory with n= =20 files, search time decreases from Order(n2) to Order(n). One quick benchmark that demonstrates the maximum effectiveness of the=20 cache is running the ls -l command on a directory containing 600 files. On= =20 a system that retains the most recent directory offset, the amount of=20 system time for this test is reduced by 85 percent. Unfortunately, the=20 maximum effectiveness is much greater than the average effectiveness.=20 Although the cache is 90 percent effective when hit, it is applicable to=20 only about 25 percent of the names being looked up. Despite the amount of=20 time spent in the lookup routine itself decreasing substantially, the=20 improvement is diminished because more time is spent in the routines that=20 that routine calls. Each cache miss causes a directory to be accessed=20 twice=E2=80=94once to search from the middle to the end and once to search = from=20 the beginning to the middle. These caches provide good directory lookup performance but are ineffective= =20 for large directories that have a high rate of entry creation and=20 deletion. Each time a new directory entry is created, the kernel must scan= =20 the entire directory to ensure that the entry does not already exist. When= =20 an existing entry is deleted, the kernel must scan the directory to find=20 the entry to be removed. For directories with many entries these linear=20 scans are time-consuming. The approach to solving this problem in FreeBSD 5.2 is to introduce dynamic= =20 directory hashing that retrofits a directory indexing system to UFS [Dowse= =20 & Malone, 2002]. To avoid repeated linear searches of large directories,=20 the dynamic directory hashing builds a hash table of directory entries on=20 the fly when the directory is first accessed. This table avoids directory=20 scans on later lookups, creates, and deletes. Unlike filesystems=20 originally designed with large directories in mind, these indices are not=20 saved on disk and so the system is backward compatible. The drawback is=20 that the indices need to be built the first time that a large directory is= =20 encountered after each system reboot. The effect of the dynamic directory=20 hashing is that large directories in UFS cause minimal performance=20 problems. When we built UFS2, we contemplated solving the large directory update=20 problem by changing to a more complex directory structure such as one that= =20 uses B-trees. This technique is used in many modern filesystems such as=20 XFS [Sweeney et al., 1996], JFS [Best & Kleikamp, 2003], ReiserFS [Reiser,= =20 2001], and in later versions of Ext2 [Phillips, 2001]. We decided not to=20 make the change at the time that UFS2 was first implemented for several=20 reasons. First, we had limited time and resources, and we wanted to get=20 something working and stable that could be used in the time frame of=20 =46reeBSD 5.2. By keeping the same directory format, we were able to reuse= =20 all the directory code from UFS1, did not have to change numerous=20 filesystem utilities to understand and maintain a new directory format,=20 and were able to produce a stable and reliable filesystem in the time=20 frame available to us. The other reason that we felt that we could retain=20 the existing directory structure is because of the dynamic directory=20 hashing that was added to FreeBSD. Borrowing the technique used by the Ext2 filesystem a flag was also added=20 to show that an on-disk indexing structure is supported for directories=20 [Phillips, 2001]. This flag is unconditionally turned off by the existing=20 implementation of UFS. In the future, if an implementation of an on-disk=20 directory-indexing structure is added, the implementations that support it= =20 will not turn the flag off. Index-supporting kernels will maintain the=20 indices and leave the flag on. If an old non-index-supporting kernel is=20 run, it will turn off the flag so that when the filesystem is once again=20 run under a new kernel, the new kernel will discover that the indexing=20 flag has been turned off and will know that the indices may be out date=20 and have to be rebuilt before being used. The only constraint on an=20 implementation of the indices is that they have to be an auxiliary data=20 structure that references the old linear directory format. =2D-=20 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =20 =2D Best regards, Nikolay Pavlov. <<<----------------------------------- = =20 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =20 --nextPart3566643.daDD0r832G Content-Type: application/pgp-signature; name=signature.asc Content-Description: This is a digitally signed message part. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQBG4m7U/2R6KvEYGaIRAkA9AJ9v24dpn2ptSVReq1qxZmLeHy3MmACfcnYm XukTYaTHTXrzVmbG0BAGZxs= =Q+9o -----END PGP SIGNATURE----- --nextPart3566643.daDD0r832G--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200709081243.48890.qpadla>