From owner-freebsd-hackers@FreeBSD.ORG Tue Apr 15 23:31:34 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 89EC318F; Tue, 15 Apr 2014 23:31:34 +0000 (UTC) Received: from mail.ambrisko.com (mail.ambrisko.com [70.91.206.90]) by mx1.freebsd.org (Postfix) with ESMTP id 5F8C41A5E; Tue, 15 Apr 2014 23:31:34 +0000 (UTC) X-Ambrisko-Me: Yes Received: from server2.ambrisko.com (HELO internal.ambrisko.com) ([192.168.1.2]) by ironport.ambrisko.com with ESMTP; 15 Apr 2014 16:36:30 -0700 Received: from ambrisko.com (localhost [127.0.0.1]) by internal.ambrisko.com (8.14.4/8.14.4) with ESMTP id s3FNVXlB020951; Tue, 15 Apr 2014 16:31:33 -0700 (PDT) (envelope-from ambrisko@ambrisko.com) Received: (from ambrisko@localhost) by ambrisko.com (8.14.4/8.14.4/Submit) id s3FNVXtd020950; Tue, 15 Apr 2014 16:31:33 -0700 (PDT) (envelope-from ambrisko) Date: Tue, 15 Apr 2014 16:31:33 -0700 From: Doug Ambrisko To: Jilles Tjoelker Subject: Re: Fix MNAMELEN or reimplement struct statfs Message-ID: <20140415233133.GA14686@ambrisko.com> References: <5348952A.3080304@FreeBSD.org> <20140415190302.GA21076@ambrisko.com> <20140415204348.GA89088@stack.nl> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20140415204348.GA89088@stack.nl> User-Agent: Mutt/1.4.2.3i Cc: FreeBSD Hackers , kib@FreeBSD.org, Bryan Drewery X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Apr 2014 23:31:34 -0000 On Tue, Apr 15, 2014 at 10:43:48PM +0200, Jilles Tjoelker wrote: | On Tue, Apr 15, 2014 at 12:03:02PM -0700, Doug Ambrisko wrote: | > On Fri, Apr 11, 2014 at 08:21:46PM -0500, Bryan Drewery wrote: | > | On Tue, Nov 19 21:53:38 UTC 2013 Jilles Tjoelker wrote: | > | > On Mon, Nov 18, 2013 at 11:01:42AM -0800, Doug Ambrisko wrote: | > | >> On Sat, Nov 16, 2013 at 08:31:29PM +0200, Konstantin Belousov wrote: | > | >> | I think that struct mount should have a const char * field where the | > | >> | non-trimmed path is stored and used for match at unmount. f_mntonname | > | >> | truncation would be only unfortunate user interface glitch. | | > | >> Note that we are not storing the path in mount structure so no structures | > | >> have changed which is nice since then we haven't introduced any real | > | >> ABI breakage. So we could MFC this. The match isn't critical since | > | >> umount will fall back to fsid and work. One thing that might be good to | > | >> do is change umount to try to umount via fsid first and then do the | > | >> match if the fsid failed versus the other way round that it does now. | | > | >> The problem I see is if someone tries to do things based on the parsed | > | >> output of mount/df then that will fail since the output is truncated. | | > | > As noted in comments in sbin/umount/umount.c, the statfs() call is | > | > deliberately after the mount list checks because it may block forever | > | > for unresponsive NFS servers. It would be unfortunate if hung NFS | > | > filesystems would have to be forcibly unmounted by copy/pasting the fsid | > | > from 'mount -v'. | | > | From a user perspective, I'd love to see this get committed and MFC'd. | > | It's very odd to have ENAMETOOLONG errors while traversing .zfs/snapshot. | | > I have a new patch at: | > http://people.freebsd.org/~ambrisko/mount_bigger_2.patch | > that I tested against head. This should be pretty close to commiting | > unless people find some issues with it. | | In sys/kern/vfs_mount.c: | + mp->mnt_path = malloc(strlen(fspath), M_MOUNT, M_WAITOK); | + strlcpy((char *)mp->mnt_path, fspath, strlen(fspath)); | | This always strips the last byte off the fspath. | | I like that this only touches the kernel, so it does not break anything | regarding mount/umount of filesystems with short paths, including | (NFS) filesystems that do not respond. | | The patch does not enlarge f_mntfromname which may be a problem for | nullfs. It is certainly a step forwards for poudriere but [ENAMETOOLONG] | errors could still occur in more extreme situations. Good point on nullfs. I'll look at fixing that. To do that I'm changing mnt_path to mnt_topath so then I can have a mnt_frompath. I'll add nullfs to my test cases. I'll need to run through the uses of f_mntfromname. It was pretty easy with f_mntonname since it was only allocated in one place just used a bunch of other place. I assume that mount root would be short. | > | However, this would make the situation worse for poudriere, which is | > | what this particular thread was started on. It does exactly what you | > | worry about, it parses mount(1) output and umounts all descendants for a | > | given path. We do the same thing at my work for our base build system, | > | and I believe FreeNAS is doing something like this. So it's not uncommon. | | > | Or did the situation improve with the latest patch to show the full | > | mount path in mount(1)? | | > Not yet but could be address in a subsequent enhancement. The framework | > in the kernel to handle longer paths and track the full path name is | > there now. Changes will need to be done in the structure passed to mount. | > This can be done if we implement a statvfs structure that can pass this | > data back. Since we don't have statvfs really defined we can implement | > it to deal with longer paths. Then mount can be updated to show the full | > path. | | > | We could implement umount -r, but I'm not convinced that is adequate due | > | to unknown use of mount(1) output. We really need the real path exported | > | to userland somehow, and preferably to mount(1) by default to not break | > | scripts. | | > | This may not be a clean solution, but couldn't we add another syscall, | > | say getfsmntpath(2), and have mount(1) use both statfs(2) and | > | getfsmntpath(2) to show a proper output? | | > I think we can do this with statvfs to give full path names. | | I agree that a new syscall or similar is needed in a later patch but | disagree that it should be statvfs(). To call statvfs(), the | filesystem's path must already be known and the filesystem (and all | parent filesystems) must respond. In fact, struct statvfs does not | include any pathnames at all. | | If it is acceptable that only root can query the pathnames, a simple | sysctl can map f_fsid to mnt_path (and the extended f_mntfromname). This | can then be used together with statfs(2), fstatfs(2), getfsstat(2) or | getmntinfo(3) (but why would one use the latter?). | | It is possible to extend getfsstat(2) (but not statfs(2)/fstatfs(2)) to | use an f_spare field to point to long pathnames stored elsewhere in the | buffer (using the flags argument to enable the new behaviour). | | Simply increasing MNAMELEN would make struct statfs over 2 kilobytes | large, which would lead to a rather large result from getfsstat(2) on a | machine with many filesystems mounted. We don't want to change statfs since that causes other ripple effects due to binary layout changes. From what I remember when I last looked (which was quite a while ago) is that our statvfs is a stub and not a full structure like NetBSD has. NetBSD switched from statfs to statvfs and made things bigger. Atleast that is what I remember when I looked. It's good to start thinking about for phase 2. Thanks, Doug A.