From owner-freebsd-fs@FreeBSD.ORG Tue Apr 25 12:56:15 2006 Return-Path: X-Original-To: freebsd-fs@freebsd.org Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0045B16A402; Tue, 25 Apr 2006 12:56:14 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1437343D7B; Tue, 25 Apr 2006 12:56:03 +0000 (GMT) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id 54D5846B1E; Tue, 25 Apr 2006 08:56:02 -0400 (EDT) Date: Tue, 25 Apr 2006 13:56:02 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Daichi GOTO In-Reply-To: <444E13BA.8050902@freebsd.org> Message-ID: <20060425133412.V51337@fledge.watson.org> References: <43E5D052.3020207@freebsd.org> <43E656C7.8040302@freesbie.org> <43E6D5C8.4050405@freebsd.org> <43E71485.5040901@freesbie.org> <43E73330.8070101@freebsd.org> <43EB4C00.2030101@freebsd.org> <4417DD8D.3050201@freebsd.org> <4433CA53.5050000@freebsd.org> <444E13BA.8050902@freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: ozawa@ongs.co.jp, freebsd-hackers@freebsd.org, freebsd-fs@freebsd.org, freebsd-current@freebsd.org, Alexander@Leidinger.net Subject: Re: [ANN] unionfs patchset-11 release X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 25 Apr 2006 12:56:15 -0000 On Tue, 25 Apr 2006, Daichi GOTO wrote: > Changes in unionfs-p11.diff > - Changed a few implementations around the lock/unlock > mechanism. Because of this, you can use both the unionfs > and the nullfs together without LK_CANRECURSE. > - Fixed a bug that sometimes does not unlock if it cannot > create shadow file. First off, thanks again for working on this! > If someone knows the details of vnode's lock status via > VOP_GETWRITEMOUNT, Please teach us (daichi, ozawa). We > want to know the details. Basically, file systems supporting full file system snapshots (UFS) provide a mechanism to "lock out" writers before they enter VFS so that they don't end up holding write locks for long periods, leading to deadlock. vn_start_write() is called to notify the file system that a thread is about to enter the file system for a write, and vn_write_finished() is called to notify the file system it is done. In effect, it's a giant reader-writer lock, which allows multiple readers and multipler writers, except during snapshot generation, when it blocks new writers until the snapshot is generated. In general, you'll notice two sorts of logic around calls to vn_start_write(): a first set, where vn_start_write() is called once holding a vnode reference, is acquired, and then things continue as normal, with a final vn_finished_write() call at the end. In this situation, vnode locks are acquired after the vn_start_write() call, but vnode references are held before (since vn_start_write() takes a vnode so that it can find the file system). The other circumstance is where vnode locks may already be held, in which case a non-sleeping acquire is performed, since in effect this is a violation of lock order. If it fails, the vnode lock is released, the reference is acquired, and then the whole operation is restarted so that we can try again to acquire the vnode lock under circumstances where file system snapshot lock can be safely acquired. So basically, it has deadlock detection and recovery logic. The V_XSLEEP lock basically says "Sleep until the snapshot lock would be available, then return", which loops back so we can re-try the acquires. So according to the above, the file system snapshot lock is *before* the vnode locks in the lock order, although in practice we acquire in any order as long as it won't lead to deadlock (in which case we recover). The logic here is a little shaky in practice -- among other things, it looks like potentially the mount point could go away during the call to vn_start_write() once the vnode is released in the deadlock detection code, but in practice this probably never happens. Notice that the above is all couched in terms of a single file system, not stacking. This is probably because it was all written with UFS and not stacking in mind. Robert N M Watson