Date: Tue, 15 Apr 2014 16:31:33 -0700 From: Doug Ambrisko <ambrisko@ambrisko.com> To: Jilles Tjoelker <jilles@stack.nl> Cc: FreeBSD Hackers <freebsd-hackers@FreeBSD.org>, kib@FreeBSD.org, Bryan Drewery <bdrewery@FreeBSD.org> Subject: Re: Fix MNAMELEN or reimplement struct statfs Message-ID: <20140415233133.GA14686@ambrisko.com> In-Reply-To: <20140415204348.GA89088@stack.nl> References: <5348952A.3080304@FreeBSD.org> <20140415190302.GA21076@ambrisko.com> <20140415204348.GA89088@stack.nl>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Apr 15, 2014 at 10:43:48PM +0200, Jilles Tjoelker wrote: | On Tue, Apr 15, 2014 at 12:03:02PM -0700, Doug Ambrisko wrote: | > On Fri, Apr 11, 2014 at 08:21:46PM -0500, Bryan Drewery wrote: | > | On Tue, Nov 19 21:53:38 UTC 2013 Jilles Tjoelker wrote: | > | > On Mon, Nov 18, 2013 at 11:01:42AM -0800, Doug Ambrisko wrote: | > | >> On Sat, Nov 16, 2013 at 08:31:29PM +0200, Konstantin Belousov wrote: | > | >> | I think that struct mount should have a const char * field where the | > | >> | non-trimmed path is stored and used for match at unmount. f_mntonname | > | >> | truncation would be only unfortunate user interface glitch. | | > | >> Note that we are not storing the path in mount structure so no structures | > | >> have changed which is nice since then we haven't introduced any real | > | >> ABI breakage. So we could MFC this. The match isn't critical since | > | >> umount will fall back to fsid and work. One thing that might be good to | > | >> do is change umount to try to umount via fsid first and then do the | > | >> match if the fsid failed versus the other way round that it does now. | | > | >> The problem I see is if someone tries to do things based on the parsed | > | >> output of mount/df then that will fail since the output is truncated. | | > | > As noted in comments in sbin/umount/umount.c, the statfs() call is | > | > deliberately after the mount list checks because it may block forever | > | > for unresponsive NFS servers. It would be unfortunate if hung NFS | > | > filesystems would have to be forcibly unmounted by copy/pasting the fsid | > | > from 'mount -v'. | | > | From a user perspective, I'd love to see this get committed and MFC'd. | > | It's very odd to have ENAMETOOLONG errors while traversing .zfs/snapshot. | | > I have a new patch at: | > http://people.freebsd.org/~ambrisko/mount_bigger_2.patch | > that I tested against head. This should be pretty close to commiting | > unless people find some issues with it. | | In sys/kern/vfs_mount.c: | + mp->mnt_path = malloc(strlen(fspath), M_MOUNT, M_WAITOK); | + strlcpy((char *)mp->mnt_path, fspath, strlen(fspath)); | | This always strips the last byte off the fspath. | | I like that this only touches the kernel, so it does not break anything | regarding mount/umount of filesystems with short paths, including | (NFS) filesystems that do not respond. | | The patch does not enlarge f_mntfromname which may be a problem for | nullfs. It is certainly a step forwards for poudriere but [ENAMETOOLONG] | errors could still occur in more extreme situations. Good point on nullfs. I'll look at fixing that. To do that I'm changing mnt_path to mnt_topath so then I can have a mnt_frompath. I'll add nullfs to my test cases. I'll need to run through the uses of f_mntfromname. It was pretty easy with f_mntonname since it was only allocated in one place just used a bunch of other place. I assume that mount root would be short. | > | However, this would make the situation worse for poudriere, which is | > | what this particular thread was started on. It does exactly what you | > | worry about, it parses mount(1) output and umounts all descendants for a | > | given path. We do the same thing at my work for our base build system, | > | and I believe FreeNAS is doing something like this. So it's not uncommon. | | > | Or did the situation improve with the latest patch to show the full | > | mount path in mount(1)? | | > Not yet but could be address in a subsequent enhancement. The framework | > in the kernel to handle longer paths and track the full path name is | > there now. Changes will need to be done in the structure passed to mount. | > This can be done if we implement a statvfs structure that can pass this | > data back. Since we don't have statvfs really defined we can implement | > it to deal with longer paths. Then mount can be updated to show the full | > path. | | > | We could implement umount -r, but I'm not convinced that is adequate due | > | to unknown use of mount(1) output. We really need the real path exported | > | to userland somehow, and preferably to mount(1) by default to not break | > | scripts. | | > | This may not be a clean solution, but couldn't we add another syscall, | > | say getfsmntpath(2), and have mount(1) use both statfs(2) and | > | getfsmntpath(2) to show a proper output? | | > I think we can do this with statvfs to give full path names. | | I agree that a new syscall or similar is needed in a later patch but | disagree that it should be statvfs(). To call statvfs(), the | filesystem's path must already be known and the filesystem (and all | parent filesystems) must respond. In fact, struct statvfs does not | include any pathnames at all. | | If it is acceptable that only root can query the pathnames, a simple | sysctl can map f_fsid to mnt_path (and the extended f_mntfromname). This | can then be used together with statfs(2), fstatfs(2), getfsstat(2) or | getmntinfo(3) (but why would one use the latter?). | | It is possible to extend getfsstat(2) (but not statfs(2)/fstatfs(2)) to | use an f_spare field to point to long pathnames stored elsewhere in the | buffer (using the flags argument to enable the new behaviour). | | Simply increasing MNAMELEN would make struct statfs over 2 kilobytes | large, which would lead to a rather large result from getfsstat(2) on a | machine with many filesystems mounted. We don't want to change statfs since that causes other ripple effects due to binary layout changes. From what I remember when I last looked (which was quite a while ago) is that our statvfs is a stub and not a full structure like NetBSD has. NetBSD switched from statfs to statvfs and made things bigger. Atleast that is what I remember when I looked. It's good to start thinking about for phase 2. Thanks, Doug A.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20140415233133.GA14686>