From owner-freebsd-fs@FreeBSD.ORG Tue Jan 15 04:51:35 2013 Return-Path: Delivered-To: fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id B0E841BD; Tue, 15 Jan 2013 04:51:35 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail13.syd.optusnet.com.au (mail13.syd.optusnet.com.au [211.29.132.194]) by mx1.freebsd.org (Postfix) with ESMTP id 374117BE; Tue, 15 Jan 2013 04:51:34 +0000 (UTC) Received: from c211-30-173-106.carlnfd1.nsw.optusnet.com.au (c211-30-173-106.carlnfd1.nsw.optusnet.com.au [211.30.173.106]) by mail13.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id r0F4pNaE013436 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 15 Jan 2013 15:51:25 +1100 Date: Tue, 15 Jan 2013 15:51:23 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Rick Macklem Subject: Re: [PATCH] Better handle NULL utimes() in the NFS client In-Reply-To: <162405990.1985479.1358212854967.JavaMail.root@erie.cs.uoguelph.ca> Message-ID: <20130115141019.H1444@besplex.bde.org> References: <162405990.1985479.1358212854967.JavaMail.root@erie.cs.uoguelph.ca> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.0 cv=P/xiHV8u c=1 sm=1 a=S8Qr1IbAvFsA:10 a=kj9zAlcOel0A:10 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=U1Z5fgpPGSMA:10 a=9QiI2z3JOZ09_-QNc5AA:9 a=CjuIK1q_8ugA:10 a=TEtd8y5WR3g2ypngnwZWYw==:117 Cc: Rick Macklem , fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Jan 2013 04:51:35 -0000 On Mon, 14 Jan 2013, Rick Macklem wrote: > John Baldwin wrote: >> The NFS client tries to infer when an application has passed NULL to >> utimes() >> so that it can let the server set the timestamp rather than using a >> client- >> supplied timestamp. It does this by checking to see if the desired >> timestamp's second matches the current second. However, this breaks >> applications that are intentionally trying to set a specific timestamp >> within >> the current second. In addition, utimes() sets a flag to indicate if >> NULL was >> passed to utimes(). The patch below changes the NFS client to check >> this flag >> and only use the server-supplied time in that case: It is certainly an error to not check VA_UTIMES_NULL at all. I think the flag (or the NULL pointer) cannot be passed to the server, so the best we can do for the VA_UTIMES_NULL case is read the current time on the client and pass it to the server. Upper layers have already read the current time, but have passed us VA_UTIMES_NULL so that we can tell that the pointer was originally null so that we can do the different permissions checks for this case. >> Index: fs/nfsclient/nfs_clport.c >> =================================================================== >> --- fs/nfsclient/nfs_clport.c (revision 225511) >> +++ fs/nfsclient/nfs_clport.c (working copy) >> @@ -762,7 +762,7 @@ >> *tl = newnfs_false; >> } >> if (vap->va_atime.tv_sec != VNOVAL) { >> - if (vap->va_atime.tv_sec != curtime.tv_sec) { >> + if (!(vap->va_vaflags & VA_UTIMES_NULL)) { >> NFSM_BUILD(tl, u_int32_t *, 3 * NFSX_UNSIGNED); >> *tl++ = txdr_unsigned(NFSV3SATTRTIME_TOCLIENT); >> txdr_nfsv3time(&vap->va_atime, tl); >> @@ -775,7 +775,7 @@ >> *tl = txdr_unsigned(NFSV3SATTRTIME_DONTCHANGE); >> ... Something mangled the patch so that it is hard to see what it does. It just uses the flag instead of guessing. I can't see anything that does the different permissions check for the VA_UTIMES_NULL case, and testing shows that this case is just broken, at least for an old version of the old nfs client -- the same permissions are required for all cases, but write permission is supposed to be enough for the VA_UTIMES_NULL case (since write permission is sufficient for setting the mtime to the current time (plus epsilon) using write(2) and truncate(2). Setting the atime to the current time should require no more and no less than read permission, since it can be done using read(2), but utimes(NULL) requires write permission for that too). > In the old days, a lot of NFS servers only stored times at a > resolution of 1sec, which I think is why the code had the habit > of comparing "seconds equal". I think this is not the reason for the check here. > If there is some app. out there > that sets "current time" via utimes(2) with a curent time argument > instead of a NULL argument would seem to be broken to me. > (It is conceivable that some app. did this to avoid clock > skew between the client and server, but I doubt it.) Apps have no alternative to using the NULL arg if they have write permission to the file but don't own it. Oops, on looking at the code I now think it _is_ possible to pass the request to set the current time on the server, since in the NFSV3SATTRTIME_TOSERVER case we just pass this case value and not any time value to the server, so the server has no option but to use its current time. It is not surprising that the permissions checks for this don't work right. I thought that the client was responsible for most permissions checks, but can't find many or the relevant one here. The NFSV3SATTRTIME_TOSERVER code on the server sets VA_UTIMES_NULL, so I would have thought that the permissions check on the server does the right thing. There are some large timestamping bugs nearby: - the old nfs server code for NFSV3SATTRTIME_TOSERVER uses getnanotime() to read the current time. This violates the system's policy set by the vfs.timestamp precision in most cases, since using getnanotime() is the worst supported policy and is not the defaul. The old nfs client uses the correct function to read the current time, vfs_timestamp(), in nfs_create(), but this is the only use of vfs_timestamp() in old nfs code. I think most cases use the server time and thus use the correct function iff the leaf server file system uses the correct function. - the new nfs server code for NFSV3SATTRTIME_TOSERVER macro-izes all reads of the current time except 1 as NFSGETTIME(). This uses getmicrotime(), so it violates the system's policy in all cases, since using getmicrotime() is not a supported policy (using microtime() is supported). The 1 exception is a hard-coded getmicrotime() in fs/nfsclient/nfs_clport.c whose use is visible in the above patch. This one really didn't matter, because only the seconds part of curtime was used. It was just a micro-pessimization and style bug. The (not quite) correct way to get the seconds part is to use time_second, as is done in the old nfs client. (This way is not quite correct because there are some races and non-monotonicities reading the times. In the above check, vap->va_atime.tv_sec might have been read by a more precise clock than curtime.tv_sec. Then the check might give a false positive or negative. But the check is only a heuristic, and is inherently racy, so this doesn't rally matter. With the above pathcm the check becomes a different pessimization and style bug. The curtime variable becomes unused except for its incorrect initialization. New nfs code never uses the correct function vfs_timestamp(). Following the system pollcy for file timestamps causes some problems for utimes(NULL) too. Old versions hard-coded microtime(). Current versions use vfs_timestamp(). The latter is better, but tends to give different results than times(non_NULL), since few or no applications know anything about the system's policy. touch(1) probably should know, but doesn't. So the simple "touch foo" gives various results, depending: - touch(1) starts with gettimeofday(). This gives microseconds resolution and usually microseconds accuracy if its result is used. - touch then tries utimes(non_NULL) with the current time that it just read. This usually works, giving microseconds resolution, etc. This is OK, but often different from the system policy. - touch then tries utimes(NULL). If this works, then it follow the system policy. Another problem is that not all file systems support nanoseconds resolutions, so not all system policies or utimes() requests can be honored. I would usually prefer the system's policy to be enforced as far as possible. Thus if the system's policy is microseconds resolution, then times with nanoseconds resolution should be rounded down to the nearest microsecond. This case is most useful since utimes() cannot preserve times with more than microseconds resolution. Utilities like cp(1) blindly round the times given in nanoseconds by stat(2) to ones that can be written by utimes(2), so this often happens in an uncontrollable way anyway (POSIX is finally getting around to specifying permissible errors for unrepresentable resolutions). But sometimes I want utimes() to preserve times as well as possible. Bruce