From owner-freebsd-fs@FreeBSD.ORG Sun Jan 30 03:47:09 2011 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DF09E106564A for ; Sun, 30 Jan 2011 03:47:09 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail02.syd.optusnet.com.au (mail02.syd.optusnet.com.au [211.29.132.183]) by mx1.freebsd.org (Postfix) with ESMTP id 471288FC0C for ; Sun, 30 Jan 2011 03:47:08 +0000 (UTC) Received: from c122-106-165-206.carlnfd1.nsw.optusnet.com.au (c122-106-165-206.carlnfd1.nsw.optusnet.com.au [122.106.165.206]) by mail02.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id p0U3l25B030478 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sun, 30 Jan 2011 14:47:04 +1100 Date: Sun, 30 Jan 2011 14:47:02 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Rick Macklem In-Reply-To: <10004544.1071580.1296332968025.JavaMail.root@erie.cs.uoguelph.ca> Message-ID: <20110130131048.Q988@besplex.bde.org> References: <10004544.1071580.1296332968025.JavaMail.root@erie.cs.uoguelph.ca> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: FreeBSD FS Subject: Re: runtime nfs mount options for existing mounts X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 30 Jan 2011 03:47:09 -0000 On Sat, 29 Jan 2011, Rick Macklem wrote: >>> I can say that, if someone else came up with the syscall/VFS >>> changes, I >>> could easily implement a function in the NFS client that generates >>> the >>> name/value pairs like nmount() uses. (There is currently a function >>> that >>> basically does that for the old mount() and I think a slightly >>> modified >>> version of that would do it. However, I haven't actually tried >>> it.:-) >> >> Old mount(2) doesn't do this. All mount(8) just use statfs(2). >> statfs() >> just returns the old mount flags and a couple of other broken things >> (async and sync i/o counts) that are used mainly by mount -v. >> > All I was trying to say here was that the NFS client(s) already > have a function that turns the flags, etc in the old mount args > into the name/value pairs used by nmount() so that the old > mount() still works. If it the clients did that, then their parsing would be perfectly horrible, but they do less than that -- essentially the opposite. If they actually did that, then for old mount(2) they would start with the args in convenient binary form (packed into struct nfs_args); they would unpack this into 100 or so string options to match the inconvenient nmount(2) form. The options would then arrive back in string form, as if from nmount(2), and have to be reparsed back into binary form and packed into a convenient struct, which might be named nfs_args but would now be kernel-only and need not be restricted by the mount(2) ABI, so it could hold the binary form of string options not supported by old ABI(s). What the clients actually do is just pass struct nfs_args as a binary blob attached to the string option "nfs_args". (IIRC, mount_nfs(8) used the same method until relatively recently when it was converted to use the full awe fullness of nmount(). The simpler userland parsing required by nmount() results in the current mount_nfs sources being only 50% larger than in FreeBSD-5 (well, they still have all the old parsing code, and code for new options, so as to support fallback to mount(2)).) Since struct nfs_args is still compatible with the old mount(2) ABI, it cannot hold some of the newer options like "negnametimeo". The binary value of at least this newer option is held in a global variable. So anything reporting the args back to userland would now have to gather unstructured globals and and combine these with variables in the nfs_args struct. The conversion actually done by the nfs clients is essentially the opposite: nmount() passes them options in inconvenient string form, or possibly as a convenient already-parsed binary blob. The binary blob is copied to the nfs_args struct, which is checked for errors later. String options are converted into binary form in the nfs_args struct, except in cases like "negnametimeo" where they don't fit. The options parsing is of low quality. > I believe that this function could return > the mount options (set as flag bits in the nfs variant of the > kernel mount structure, etc) as nmount() style name/value pairs > if there was a way to get them to userland. (It could be done as > yet another flag for nfssvc() if no other file system needs this > capability.) Well, the options could always have been returned in convenient binary form in the nfs_args struct, except now they don't all fit, and there were always ABI considerations which make the binary format not so good in some cases (say a 32-bit app on a 64-bit kernel wanting to know the nfs options). name=value takes more conversions which are not done even now: - Case 1: old mount(8) utility supplied a binary blob. The kernel just copied it to the kernel nfs_args struct and doesn't have many string options for it - Case 2: nmount(2) supplied string options. The kernel now has some options in string form, but only ones that weren't defaulted. Examples of low-quality options parsing: - from 4.4BSD-Lite mount_nfs.c. Hmm, this is not to bad -- it has no atoi() or sscanf(), but just strtol() with common errors: % case 'r': % num = strtol(optarg, &p, 10); % if (*p || num <= 0) % errx(1, "illegal -r value -- %s", optarg); % nfsargsp->rsize = num; % nfsargsp->flags |= NFSMNT_RSIZE; % break; This has mainly common overflow bugs: - strtol() returns long, so overflow may occur when it is blindly assigned to `num', which is int. Even if overflow is benign, it breaks the subsequent check that num <= 0. - no checking of overflows avoided by by strtol(), except when longs are longer than ints these cause overflow in the assignment here. Input consisting only of whitespace is detected bogusly as num being 0. - from FreeBSD-5.2 mount_nfs.c: it still uses strtol() in old code, but most new code uses atoi() like this: % if (altflags & ALTF_ACREGMAX) { % p = strstr(optarg, "acregmax="); % if (p) % nfsargsp->acregmax = atoi(p + 9); % } atoi() is unusable in serious parsing, since among other design errors, it gives undefined behaviour on overflow (except in FreeBSD, this behaviour is defined as being the same undefined behaviour that occurs on blind assignment of strtol() to int as done in the previous example). - from FreeBSD-current mount_nfs.c: % if (findopt(iov, iovlen, "rsize", &opt, NULL) == 0) { % if (opt == NULL) { % errx(1, "illegal rsize"); % } % ret = sscanf(opt, "%d", &args.rsize); % if (ret != 1 || args.rsize <= 0) { % errx(1, "illegal wsize: %s", opt); % } % args.flags |= NFSMNT_RSIZE; % } Now even the rsize parsing doesn't use strtol(). It uses sscanf(), whose behaviour on overflow is even more undefined than atoi()'s. Despite less error checking, the parsing is now more verbose, with the help of extra braces. It takes such care with strings that it spells "rsize" as "wsize" in the error message. From FreeBSD-current nfsclients (the new one, but I think they use identical code which is almost identical to the userland code, and triplicates all its bugs): % if (vfs_getopt(mp->mnt_optnew, "rsize", (void **)&opt, NULL) == 0) { % if (opt == NULL) { % vfs_mount_error(mp, "illegal rsize"); % error = EINVAL; % goto out; % } % ret = sscanf(opt, "%d", &args.rsize); % if (ret != 1 || args.rsize <= 0) { % vfs_mount_error(mp, "illegal wsize: %s", % opt); % error = EINVAL; % goto out; % } % args.flags |= NFSMNT_RSIZE; % } Parsing of strings in the kernel is even more laborious than in userland. sscanf() is unusable but used. I was asleep when it was committed to the kernel (it was at first used only for parsing a couple of path/device names for booting). The braces are needed now. The vfs_mount_error() line is unnecessarily split, and triplicates the misspelling of "rsize" as "wsize". % if ((argp->flags & NFSMNT_RSIZE) && argp->rsize > 0) { % nmp->nm_rsize = argp->rsize; % /* Round down to multiple of blocksize */ % nmp->nm_rsize &= ~(NFS_FABLKSIZE - 1); % if (nmp->nm_rsize <= 0) % nmp->nm_rsize = NFS_FABLKSIZE; % } % if (nmp->nm_rsize > maxio) % nmp->nm_rsize = maxio; % if (nmp->nm_rsize > MAXBSIZE) % nmp->nm_rsize = MAXBSIZE; This is old code for range checking of rsize. It is still useful and used once the value reaches the nfs_args struct. Not much wrong with it, but altogether it gives about 50 lines of option parsing and range checking code for a single simple numeric option, with bugs in the range checking for overflow cases and redundant checks for the <= 0 case. I would like to be able to write such code as nmp->nm_rsize = parsopt(stringval, minval, dfltval, maxval) in 1 place (actually table-driven). But the range should also be checked in the kernel a userland utility did the parsing. The kernel doesn't need to do any defaulting if userland does it. Bruce