From owner-freebsd-bugs@FreeBSD.ORG Thu May 15 13:06:32 2008 Return-Path: Delivered-To: freebsd-bugs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 960A0106566B for ; Thu, 15 May 2008 13:06:32 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail17.syd.optusnet.com.au (mail17.syd.optusnet.com.au [211.29.132.198]) by mx1.freebsd.org (Postfix) with ESMTP id 18F898FC28 for ; Thu, 15 May 2008 13:06:31 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from c220-239-252-11.carlnfd3.nsw.optusnet.com.au (c220-239-252-11.carlnfd3.nsw.optusnet.com.au [220.239.252.11]) by mail17.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id m4FD6R3I004934 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 15 May 2008 23:06:29 +1000 Date: Thu, 15 May 2008 23:06:27 +1000 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: Jaakko Heinonen In-Reply-To: <200805151040.m4FAe4tP057923@freefall.freebsd.org> Message-ID: <20080515204521.H40842@delplex.bde.org> References: <200805151040.m4FAe4tP057923@freefall.freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-bugs@freebsd.org Subject: Re: kern/122833: [snapshots] [patch] mountd fails on nmount() after UFS snapshot creation with mount X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 15 May 2008 13:06:32 -0000 On Thu, 15 May 2008, Jaakko Heinonen wrote: > On 2008-05-14, Craig Rodrigues wrote: > > On Wed, May 14, 2008 at 06:56:40PM +0000, Craig Rodrigues wrote: > > > That fix is wrong. The better fix would be to come up > > Yes, I probably should have stated more clearly that it was a quick > hack. Most things in nmount() (starting with its existence) are wrong. > > Actually we have such a function. We need to do to the "snapshot" option > > what we do to the "export" option in sys/kern/vfs_export.c. > > > --- ffs_vfsops.c 26 Mar 2008 20:48:07 -0000 1.340 > > +++ ffs_vfsops.c 14 May 2008 21:00:23 -0000 > > There is also code in vfs_mount.c that sets the MNT_SNAPSHOT flag: > (in vfs_donmount()) > else if (strcmp(opt->name, "snapshot") == 0) > fsflags |= MNT_SNAPSHOT; > > I wonder if we should add the option removal code also there or remove > the "snapshot" handling completely from vfs_donmount(). For me it seems > confusing that MNT_SNAPSHOT is first set in vfs_donmount() and then > again in file system specific code but the snapshot option string is > removed only in file system specific code. Snapshots are ffs-only, so the generic code shouldn't know anything about them. However, MNT_SNAPSHOT is an old option bit, so the generic code can handle it at no extra cost by ORing it with generic bits that need similar handling. In particular, it is non-persistent like MNT_UPDATE, and could be kept out of mnt_flag and mnt_opts using the same method that MNT_UPDATE should be kept out of mnt_flag. (I think this method works for mnt_flag but not for mnt_opts now -- it involves MNT_UPDATEMASK and a subset of MNT_CMDFLAGS (should be whole set?), but is not applied to string options.) I used to think that setting MNT_SNAPSHOT in ffs had no effect, just like for almost all the other setting of flags in the same section in ffs has no effect, since generic code near the above has already set almost all the flags better. However, MNT_SNAPSHOT is special since it should update-only. The generic code sets it correctly (I think) in mnt_flag, and then setting it in ffs clobbers this: Here is the code in vfs_domount() that sets it correctly in mnt_flag (after the above in vfs_donmount() sets it correctly in fsflags): % if (fsflags & MNT_UPDATE) { % ... % mp->mnt_flag |= fsflags & % (MNT_RELOAD | MNT_FORCE | MNT_UPDATE | MNT_SNAPSHOT | MNT_ROOTFS); fsflags here is purely from userland, so it is not affected by garbage in mnt_opt or mnt_flag. % ... % mp->mnt_optnew = fsdata; % vfs_mergeopts(mp->mnt_optnew, mp->mnt_opt); mnt_opt shouldn't have any garbage, but in fact it does for "snapshot" and maybe "update". This wouldn't be a problem if we ignored the garbage everywhere. Here we merge it into mnt_optnew, giving garbage there, and then the buggy code in ffs acts on the garbage and clobbers our correct setting of MNT_SNAPSHOT in mnt_flag. The main bug here is not honoring MNT_UPDATEMASK when merging string options. % } % ... % mp->mnt_flag = (mp->mnt_flag & ~MNT_UPDATEMASK) | % (fsflags & (MNT_UPDATEMASK | MNT_FORCE | MNT_ROOTFS | % MNT_RDONLY)); This setting of mnt_flag applies irrespective of MNT_UPDATE (we are still preparing to call VFS_MOUNT()). Here we discard from mnt_flag all flags that are subject to update, and add new flags from fsflags. This corresponds to merging string options but is missing the bugs. In particular, a garbage MNT_SNAPSHOT cannot be obtained as a result of this merge, since although MNT_SNAPSHOT is not in MNT_UPDATEMASK, it cannot be in mnt_flag unless we set it above, since non-persistent pseudo-options for mount-update are removed from mnt_flag after the mount-update. So I think your hack is much better than the "better" fix -- it avoids further bogotification of the above. Most options are more real and less fragile than MNT_SNAPSHOT, so the nearby options code in ffs mostly has no effect: % if (vfs_getopt(mp->mnt_optnew, "acls", NULL, NULL) == 0) % mntorflags |= MNT_ACLS; % MNT_ACLS is ffs-only and is not bogusly handled generically, so this is correct except for its style bug (extra blank line). Support for noacls seems to be necessary and is missing (see next paragraph about negative options). % if (vfs_getopt(mp->mnt_optnew, "async", NULL, NULL) == 0) % mntorflags |= MNT_ASYNC; % MNT_ASYNC has already been set almost correctly generically. The above doesn't understand noasync, while the generic code low-quality handling of negative options. Old userland mount code shows how to handle negative options better. Not to mention all options better (use a table instead of this horrible open coding). % if (vfs_getopt(mp->mnt_optnew, "force", NULL, NULL) == 0) % mntorflags |= MNT_FORCE; % MNT_FORCE is a pseudo-option like MNT_SNAPSHOT, except it is generic and it is also used for unmounting. I don't know if it has the same bugs as update and snapshot. % if (vfs_getopt(mp->mnt_optnew, "multilabel", NULL, NULL) == 0) % mntorflags |= MNT_MULTILABEL; % MNT_MULTILABEL is ffs-specific, so it should be here, but the above is no good since it is missing negative logic. But the generic code bogusly supports this option and its negative. % if (vfs_getopt(mp->mnt_optnew, "noasync", NULL, NULL) == 0) % mntandnotflags |= MNT_ASYNC; % Oops, we do support some negative logic here. Otherwise, see the paragraph about MNT_ASYNC ("async" and "noasync" are generic and have already been handled by generic code, so the code for them here has no effect). % if (vfs_getopt(mp->mnt_optnew, "noatime", NULL, NULL) == 0) % mntorflags |= MNT_NOATIME; % % if (vfs_getopt(mp->mnt_optnew, "noclusterr", NULL, NULL) == 0) % mntorflags |= MNT_NOCLUSTERR; % % if (vfs_getopt(mp->mnt_optnew, "noclusterw", NULL, NULL) == 0) % mntorflags |= MNT_NOCLUSTERW; Like MNT_ASYNC, except these are negative logic and there is no support for their negation here. % % if (vfs_getopt(mp->mnt_optnew, "snapshot", NULL, NULL) == 0) % mntorflags |= MNT_SNAPSHOT; Summary of options processing in ffs: - MNT_SNAPSHOT: correctly placed, but its existence breaks things - MNT_FORCE: incorrectly placed, and its existence might break things. - MNT_ACLS: correctly placed, but incomplete - others above: incorrectly placed, but have no effect. Mount in ffs rarely makes write accesses to mnt_flag except to commit the null changes made by the above. For updating of some flags (especially MNT_RDONLY), the change should not be committed to mnt_flag until mount-update has almost completed the update, so setting mnt_flag in generic code before calling VFS_MOUNT() is not so good. MNT_RDONLY is already handled carefully and MNT_ASYNC is already handled not so carefully. Most or all other file systems don't have bogus options processing like this -- they just process their non-generic options and only have complications and bugs for /""/ro/rw/noro/norw/nonoro/... Bruce