From owner-freebsd-fs@FreeBSD.ORG Thu Sep 9 05:42:21 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 31C9010656C6; Thu, 9 Sep 2010 05:42:21 +0000 (UTC) (envelope-from mckusick@mckusick.com) Received: from chez.mckusick.com (chez.mckusick.com [64.81.247.49]) by mx1.freebsd.org (Postfix) with ESMTP id F2AE48FC14; Thu, 9 Sep 2010 05:42:20 +0000 (UTC) Received: from chez.mckusick.com (localhost [127.0.0.1]) by chez.mckusick.com (8.14.3/8.14.3) with ESMTP id o895gD7f096872; Wed, 8 Sep 2010 22:42:14 -0700 (PDT) (envelope-from mckusick@chez.mckusick.com) Message-Id: <201009090542.o895gD7f096872@chez.mckusick.com> To: Gleb Kurtsou In-reply-to: <20100907230433.GA3938@tops> Date: Wed, 08 Sep 2010 22:42:13 -0700 From: Kirk McKusick Cc: freebsd-fs@freebsd.org, Jeff Roberson , Ivan Voras Subject: Re: kern/150143: [patch][tmpfs] Source directory vnode can disappear before locking it in tmpfs_rename X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 09 Sep 2010 05:42:21 -0000 > Date: Wed, 8 Sep 2010 02:04:33 +0300 > From: Gleb Kurtsou > To: Kirk McKusick > Cc: Ivan Voras , freebsd-fs@freebsd.org > Subject: Re: kern/150143: [patch][tmpfs] Source directory vnode can > disappear before locking it in tmpfs_rename > X-ASK-Info: Message Queued (2010/09/07 16:22:21) > X-ASK-Info: Confirmed by User (2010/09/07 17:06:38) > > Hello Kirk, > > I was working on improving namecache during this summer, and I have to > admit rename with the biggest problem of all, and it still remains. Rename is very complex and hard to get right. I have made at least five attempts to implement it and I am not convinced that it is yet right. > There are several common approaches taken by filesystems. > > UFS locks all vnodes involved in rename, unlocking, trying to lock > vnodes and check for races, tmpfs does something similar (although vnode > locking is incorrect, I'm going to fix it a bit later). > > Some others (like ext2fs and msdosfs if I'm not mistaken) keep locking > at minimum, it seems to work, but honestly I don't see why it can't > race. Ext2fs (and most others) have a filesystem-wide lock that is held whenever one is doing a rename which means that only one rename at a time can take place. That greatly reduces the set of possible races that one has to deal with. While this will obviously limit rename intensive applications, I don't know of any practical examples where this serialization matters. I am about ready to throw in the towel and use this approach for UFS rename. > ZFS is somewhat unique in this respect. It uses name locking, keeps > per directory table of locked file names, i.e. names that can't change > while in table. So that destination file won't be added during rename, > source file can't disappear, etc. > > What do you think about name locking approach taken by ZFS? Are there > any drawbacks you are aware of? A parallel rename can still move one of your parents in such a way that you can end up loping off a branch of the tree: a / \ b d / \ c e rename(b, a/d/e) & rename(d, a/b/c) could end up with b->c->d->e->b in a loop divorced from the tree rooted by a. The current UFS locking will catch this, but one merely tracking names may not. Note that serializing these two renames ensure that this cannot happen as the second rename will recognize that it is about to do something bad as the first one will have finished before it starts. I have not studied the ZFS solution, so they may in fact catch this possible race. > I was thinking of trying to unify rename locking, either make UFS > approach standard, i.e. lock all vnodes outside of rename or use name > locking similar to ZFS. UFS way may not fit well into existing VOP API > (extra vnode lookups to check for races) besides vnode locking order > remains an important issue. ZFS style locks may be interesting in a way > that they would allow to reduce scope of vnode locks, especially > considering merging with ongoing work on rangelocks (just a guess). > > Thanks, > Gleb. I do think that coming up with a common rename solution would be good. If the ZFS code catches the known races, then that would be a good one on which to standardize. Further scrutiny of the current UFS code may show that we have indeed found all the races. But I fear that the only implementable solution is to single-thread rename per filesystem. I have copied Jeff Roberson on this email as he is likely to have some insight on an optimal solution. Kirk McKusick