From owner-freebsd-fs@FreeBSD.ORG Sat Sep 8 10:07:54 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8CA0116A420 for ; Sat, 8 Sep 2007 10:07:54 +0000 (UTC) (envelope-from qpadla@gmail.com) Received: from nf-out-0910.google.com (nf-out-0910.google.com [64.233.182.191]) by mx1.freebsd.org (Postfix) with ESMTP id 15BBA13C467 for ; Sat, 8 Sep 2007 10:07:53 +0000 (UTC) (envelope-from qpadla@gmail.com) Received: by nf-out-0910.google.com with SMTP id k4so534322nfd for ; Sat, 08 Sep 2007 03:07:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:from:reply-to:to:subject:date:user-agent:mime-version:content-type:content-transfer-encoding:message-id; bh=hS0I9p4MEpL3wbjmDBRIS+4MSKEkdNNgihcSS06DKRM=; b=WJvsPsdxa+oeJ0mEVDQpZSfS5JDSddtLb+XzRX8B7WfVJpwJH0/X+Jo5qBcdaBtf919AotIitsheBu+vGmgPmXj7d5ZG89myAOLEaPnog/aYixYg8CIFVHhtfGO0aRZ/IcKTnhGz+kp2nglce9PoxFvroK2l0YTXaMQuwOmdmOM= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:from:reply-to:to:subject:date:user-agent:mime-version:content-type:content-transfer-encoding:message-id; b=DF5qcyhPpz+bGaGGXMb2n08FKzwTAyw9ghRgB/ZKtBJMMDCM6pB5iVtG4k+SIWbJ4ee4kY7zmVWXojzYLl8qyvISlfSWXw6/5N3q4JSIOwC0QgYwI0mvOn3fQMZ4UfXLaEpR6JcFXkj9jOAfR2RoA7R9tGCDhH2USbOe2PwgU14= Received: by 10.78.168.1 with SMTP id q1mr1043649hue.1189244615224; Sat, 08 Sep 2007 02:43:35 -0700 (PDT) Received: from orion ( [77.109.37.125]) by mx.google.com with ESMTPS id 32sm1184525hui.2007.09.08.02.43.29 (version=TLSv1/SSLv3 cipher=OTHER); Sat, 08 Sep 2007 02:43:34 -0700 (PDT) From: Nikolay Pavlov To: freebsd-fs@freebsd.org, freebsd-current@freebsd.org Date: Sat, 8 Sep 2007 12:43:44 +0300 User-Agent: KMail/1.9.7 MIME-Version: 1.0 Content-Type: multipart/signed; boundary="nextPart3566643.daDD0r832G"; protocol="application/pgp-signature"; micalg=pgp-sha1 Content-Transfer-Encoding: 7bit Message-Id: <200709081243.48890.qpadla@gmail.com> Cc: Subject: On-disk indexing for "Project Ideas" page X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: qpadla@gmail.com List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 08 Sep 2007 10:07:54 -0000 --nextPart3566643.daDD0r832G Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Recently while reading "Design and Implementation of FreeBSD operation=20 system" by Marshall Kirk McKusick and gnn i have found a very intresting=20 paragraph regarding UFS2 implementation, indexing and B-trees. According=20 to it on-disk indexes was not implemented, but some structures was=20 reserved for future development. May be some SOC students could implement=20 this in future. How about to adding this into Project Ideas page?=20 Let me quote from the paragraph "8.3 Naming": =46inding of Names in Directories A common request to the filesystem is to lookup a specific name in a=20 directory. The kernel usually does the lookup by starting at the beginning= =20 of the directory and going through, comparing each entry in turn. First,=20 the length of the sought-after name is compared with the length of the=20 name being checked. If the lengths are identical, a string comparison of=20 the name being sought and the directory entry is made. If they match, the=20 search is complete; if they fail, either in the length or in the string=20 comparison, the search continues with the next entry. Whenever a name is=20 found, its name and containing directory are entered into the systemwide=20 name cache described in Section 6.6. Whenever a search is unsuccessful, an= =20 entry is made in the cache showing that the name does not exist in the=20 particular directory. Before starting a directory scan, the kernel looks=20 for the name in the cache. If either a positive or negative entry is=20 found, the directory scan can be avoided. Another common operation is to lookup all the entries in a directory. For=20 example, many programs do a stat system call on each name in a directory=20 in the order that the names appear in the directory. To improve=20 performance for these programs, the kernel maintains the directory offset=20 of the last successful lookup for each directory. Each time that a lookup=20 is done in that directory, the search is started from the offset at which=20 the previous name was found (instead of from the beginning of the=20 directory). For programs that step sequentially through a directory with n= =20 files, search time decreases from Order(n2) to Order(n). One quick benchmark that demonstrates the maximum effectiveness of the=20 cache is running the ls -l command on a directory containing 600 files. On= =20 a system that retains the most recent directory offset, the amount of=20 system time for this test is reduced by 85 percent. Unfortunately, the=20 maximum effectiveness is much greater than the average effectiveness.=20 Although the cache is 90 percent effective when hit, it is applicable to=20 only about 25 percent of the names being looked up. Despite the amount of=20 time spent in the lookup routine itself decreasing substantially, the=20 improvement is diminished because more time is spent in the routines that=20 that routine calls. Each cache miss causes a directory to be accessed=20 twice=E2=80=94once to search from the middle to the end and once to search = from=20 the beginning to the middle. These caches provide good directory lookup performance but are ineffective= =20 for large directories that have a high rate of entry creation and=20 deletion. Each time a new directory entry is created, the kernel must scan= =20 the entire directory to ensure that the entry does not already exist. When= =20 an existing entry is deleted, the kernel must scan the directory to find=20 the entry to be removed. For directories with many entries these linear=20 scans are time-consuming. The approach to solving this problem in FreeBSD 5.2 is to introduce dynamic= =20 directory hashing that retrofits a directory indexing system to UFS [Dowse= =20 & Malone, 2002]. To avoid repeated linear searches of large directories,=20 the dynamic directory hashing builds a hash table of directory entries on=20 the fly when the directory is first accessed. This table avoids directory=20 scans on later lookups, creates, and deletes. Unlike filesystems=20 originally designed with large directories in mind, these indices are not=20 saved on disk and so the system is backward compatible. The drawback is=20 that the indices need to be built the first time that a large directory is= =20 encountered after each system reboot. The effect of the dynamic directory=20 hashing is that large directories in UFS cause minimal performance=20 problems. When we built UFS2, we contemplated solving the large directory update=20 problem by changing to a more complex directory structure such as one that= =20 uses B-trees. This technique is used in many modern filesystems such as=20 XFS [Sweeney et al., 1996], JFS [Best & Kleikamp, 2003], ReiserFS [Reiser,= =20 2001], and in later versions of Ext2 [Phillips, 2001]. We decided not to=20 make the change at the time that UFS2 was first implemented for several=20 reasons. First, we had limited time and resources, and we wanted to get=20 something working and stable that could be used in the time frame of=20 =46reeBSD 5.2. By keeping the same directory format, we were able to reuse= =20 all the directory code from UFS1, did not have to change numerous=20 filesystem utilities to understand and maintain a new directory format,=20 and were able to produce a stable and reliable filesystem in the time=20 frame available to us. The other reason that we felt that we could retain=20 the existing directory structure is because of the dynamic directory=20 hashing that was added to FreeBSD. Borrowing the technique used by the Ext2 filesystem a flag was also added=20 to show that an on-disk indexing structure is supported for directories=20 [Phillips, 2001]. This flag is unconditionally turned off by the existing=20 implementation of UFS. In the future, if an implementation of an on-disk=20 directory-indexing structure is added, the implementations that support it= =20 will not turn the flag off. Index-supporting kernels will maintain the=20 indices and leave the flag on. If an old non-index-supporting kernel is=20 run, it will turn off the flag so that when the filesystem is once again=20 run under a new kernel, the new kernel will discover that the indexing=20 flag has been turned off and will know that the indices may be out date=20 and have to be rebuilt before being used. The only constraint on an=20 implementation of the indices is that they have to be an auxiliary data=20 structure that references the old linear directory format. =2D-=20 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =20 =2D Best regards, Nikolay Pavlov. <<<----------------------------------- = =20 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =20 --nextPart3566643.daDD0r832G Content-Type: application/pgp-signature; name=signature.asc Content-Description: This is a digitally signed message part. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQBG4m7U/2R6KvEYGaIRAkA9AJ9v24dpn2ptSVReq1qxZmLeHy3MmACfcnYm XukTYaTHTXrzVmbG0BAGZxs= =Q+9o -----END PGP SIGNATURE----- --nextPart3566643.daDD0r832G--