Date: Wed, 5 Sep 2012 12:18:54 +0300 From: Konstantin Belousov <kostikbel@gmail.com> To: fs@freebsd.org Cc: pho@freebsd.org Subject: Nullfs shared lookup Message-ID: <20120905091854.GD33100@deviant.kiev.zoral.com.ua>
next in thread | raw e-mail | index | archive | help
--8nNlsiDOXg6+a5Ho Content-Type: text/plain; charset=us-ascii Content-Disposition: inline I, together with Peter Holm, developed a patch to enable shared lookups on nullfs mounts when lower filesystem allows the shared lookups. The lack of shared lookup support for nullfs is quite visible on any VFS-intensive workloads which utilize path translations. In particular, it was a complain on $dayjob which started me thinking about this issue. There are two problems which prevent direct translation of shared lookup bit into nullfs upper mount bit: 1. When vfs_lookup() calls VOP_LOOKUP() for nullfs, which passes lookup operation to lower fs, resulting vnode is often only shared-locked. Then null_nodeget() cannot instantiate covering vnode for lower vnode, since insmntque1() and null_hashins() require exclusive lock on the lower. The solution is straightforward, if null hash failed to find pre-existing nullfs vnode for lower vnode, the lower vnode lock is upgraded. 2. (More serious). Nullfs reclaims its vnodes on deactivation. The cause is due to nullfs inability to detect reclamation of the lower vnode. Reclamation of a nullfs vnode at deactivation time prevents a reference to the lower vnode to become stale. Unfortunately, this means that all lookups on nullfs need exclusive lock to instantiate upper vnode, which is never cached. Solution which we propose is to add VFS notification to the upper filesystem about reclamation of the vnode in the lower filesystem. Now, vgone() calls new VFS op vfs_reclaim_lowervp() with an argument lowervp which is reclaimed. It is possible to register several reclamation event listeners, to correctly handle the case of several nullfs mounts over the same directory. For the filesystem not having nullfs mounts over it, the overhead added is a single mount interlock lock/unlock in the vnode reclamation path. Benchmarks consisting of up 1K threads doing parallel stat(2) on the same file demonstate almost constant execution time, not depending of number of running threads. While without the patch, exec time between single-threaded run and run with 1024 threads performing the same total count of stat(2), differ in 6 times. Somewhat problematic detail, IMO, is that nullfs reclamation procedure calls vput() on the lowervp vnode, temporary unlocking the vnode being reclaimed. This seems to be fine for MPSAFE filesystems, but not-MPSAFE code often put partially initialized vnode on some globally visible list, and later can decide that half-constructed vnode is not needed. If nullfs mount is created above such filesystem, then other threads might catch such not properly initialized vnode. Instead of trying to overcome this case, e.g. by recursing the lower vnode lock in null_reclaim_lowervp(), I decided to rely on nearby extermination of non-MPSAFE filesystems support. I think that unionfs can also benefit from this mechanism, but I did not even looked at unionfs. Patch is available at http://people.freebsd.org/~kib/misc/nullfs_shared_lookup.1.patch It survived stress2 torturing. Comments ? --8nNlsiDOXg6+a5Ho Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (FreeBSD) iEYEARECAAYFAlBHGP4ACgkQC3+MBN1Mb4jWwACg0Wt552c/FcNg9Gc8MC8z36wv DsMAnAoCKzEU561FtBc4rUMJZLiKAUO5 =qa7h -----END PGP SIGNATURE----- --8nNlsiDOXg6+a5Ho--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120905091854.GD33100>