From owner-freebsd-fs@FreeBSD.ORG  Tue Jan 15 04:51:35 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id B0E841BD;
 Tue, 15 Jan 2013 04:51:35 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail13.syd.optusnet.com.au (mail13.syd.optusnet.com.au
 [211.29.132.194])
 by mx1.freebsd.org (Postfix) with ESMTP id 374117BE;
 Tue, 15 Jan 2013 04:51:34 +0000 (UTC)
Received: from c211-30-173-106.carlnfd1.nsw.optusnet.com.au
 (c211-30-173-106.carlnfd1.nsw.optusnet.com.au [211.30.173.106])
 by mail13.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id r0F4pNaE013436
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
 Tue, 15 Jan 2013 15:51:25 +1100
Date: Tue, 15 Jan 2013 15:51:23 +1100 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Rick Macklem <rmacklem@uoguelph.ca>
Subject: Re: [PATCH] Better handle NULL utimes() in the NFS client
In-Reply-To: <162405990.1985479.1358212854967.JavaMail.root@erie.cs.uoguelph.ca>
Message-ID: <20130115141019.H1444@besplex.bde.org>
References: <162405990.1985479.1358212854967.JavaMail.root@erie.cs.uoguelph.ca>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.0 cv=P/xiHV8u c=1 sm=1 a=S8Qr1IbAvFsA:10
 a=kj9zAlcOel0A:10 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=U1Z5fgpPGSMA:10
 a=9QiI2z3JOZ09_-QNc5AA:9 a=CjuIK1q_8ugA:10 a=TEtd8y5WR3g2ypngnwZWYw==:117
Cc: Rick Macklem <rmacklem@FreeBSD.org>, fs@FreeBSD.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 15 Jan 2013 04:51:35 -0000

On Mon, 14 Jan 2013, Rick Macklem wrote:

> John Baldwin wrote:
>> The NFS client tries to infer when an application has passed NULL to
>> utimes()
>> so that it can let the server set the timestamp rather than using a
>> client-
>> supplied timestamp. It does this by checking to see if the desired
>> timestamp's second matches the current second. However, this breaks
>> applications that are intentionally trying to set a specific timestamp
>> within
>> the current second. In addition, utimes() sets a flag to indicate if
>> NULL was
>> passed to utimes(). The patch below changes the NFS client to check
>> this flag
>> and only use the server-supplied time in that case:

It is certainly an error to not check VA_UTIMES_NULL at all.  I think
the flag (or the NULL pointer) cannot be passed to the server, so the
best we can do for the VA_UTIMES_NULL case is read the current time on
the client and pass it to the server.  Upper layers have already read
the current time, but have passed us VA_UTIMES_NULL so that we can tell
that the pointer was originally null so that we can do the different
permissions checks for this case.

>> Index: fs/nfsclient/nfs_clport.c
>> ===================================================================
>> --- fs/nfsclient/nfs_clport.c (revision 225511)
>> +++ fs/nfsclient/nfs_clport.c (working copy)
>> @@ -762,7 +762,7 @@
>> *tl = newnfs_false;
>> }
>> if (vap->va_atime.tv_sec != VNOVAL) {
>> - if (vap->va_atime.tv_sec != curtime.tv_sec) {
>> + if (!(vap->va_vaflags & VA_UTIMES_NULL)) {
>> NFSM_BUILD(tl, u_int32_t *, 3 * NFSX_UNSIGNED);
>> *tl++ = txdr_unsigned(NFSV3SATTRTIME_TOCLIENT);
>> txdr_nfsv3time(&vap->va_atime, tl);
>> @@ -775,7 +775,7 @@
>> *tl = txdr_unsigned(NFSV3SATTRTIME_DONTCHANGE);
>> ...

Something mangled the patch so that it is hard to see what it does.  It
just uses the flag instead of guessing.

I can't see anything that does the different permissions check for
the VA_UTIMES_NULL case, and testing shows that this case is just broken,
at least for an old version of the old nfs client -- the same permissions
are required for all cases, but write permission is supposed to be
enough for the VA_UTIMES_NULL case (since write permission is sufficient
for setting the mtime to the current time (plus epsilon) using write(2)
and truncate(2).  Setting the atime to the current time should require
no more and no less than read permission, since it can be done using
read(2), but utimes(NULL) requires write permission for that too).

> In the old days, a lot of NFS servers only stored times at a
> resolution of 1sec, which I think is why the code had the habit
> of comparing "seconds equal".

I think this is not the reason for the check here.

> If there is some app. out there
> that sets "current time" via utimes(2) with a curent time argument
> instead of a NULL argument would seem to be broken to me.
> (It is conceivable that some app. did this to avoid clock
> skew between the client and server, but I doubt it.)

Apps have no alternative to using the NULL arg if they have write permission
to the file but don't own it.

Oops, on looking at the code I now think it _is_ possible to pass the
request to set the current time on the server, since in the
NFSV3SATTRTIME_TOSERVER case we just pass this case value and not
any time value to the server, so the server has no option but to use
its current time.  It is not surprising that the permissions checks
for this don't work right.  I thought that the client was responsible
for most permissions checks, but can't find many or the relevant one
here. The  NFSV3SATTRTIME_TOSERVER code on the server sets
VA_UTIMES_NULL, so I would have thought that the permissions check on
the server does the right thing.

There are some large timestamping bugs nearby:

- the old nfs server code for NFSV3SATTRTIME_TOSERVER uses getnanotime()
   to read the current time.  This violates the system's policy set by
   the vfs.timestamp precision in most cases, since using getnanotime()
   is the worst supported policy and is not the defaul.

   The old nfs client uses the correct function to read the current
   time, vfs_timestamp(), in nfs_create(), but this is the only use of
   vfs_timestamp() in old nfs code.  I think most cases use the server
   time and thus use the correct function iff the leaf server file
   system uses the correct function.

- the new nfs server code for NFSV3SATTRTIME_TOSERVER macro-izes all
   reads of the current time except 1 as NFSGETTIME().  This uses
   getmicrotime(), so it violates the system's policy in all cases,
   since using getmicrotime() is not a supported policy (using
   microtime() is supported).  The 1 exception is a hard-coded
   getmicrotime() in fs/nfsclient/nfs_clport.c whose use is visible
   in the above patch.  This one really didn't matter, because only the
   seconds part of curtime was used.  It was just a micro-pessimization
   and style bug.  The (not quite) correct way to get the seconds part
   is to use time_second, as is done in the old nfs client.
      (This way is not quite correct because there are some races and
      non-monotonicities reading the times.  In the above check,
      vap->va_atime.tv_sec might have been read by a more precise clock
      than curtime.tv_sec.  Then the check might give a false positive
      or negative.  But the check is only a heuristic, and is inherently
      racy, so this doesn't rally matter.
   With the above pathcm the check becomes a different pessimization and
   style bug.  The curtime variable becomes unused except for its
   incorrect initialization.

   New nfs code never uses the correct function vfs_timestamp().

Following the system pollcy for file timestamps causes some problems
for utimes(NULL) too.  Old versions hard-coded microtime().  Current
versions use vfs_timestamp().  The latter is better, but tends to
give different results than times(non_NULL), since few or no
applications know anything about the system's policy.  touch(1)
probably should know, but doesn't.  So the simple "touch foo" gives
various results, depending:
- touch(1) starts with gettimeofday().  This gives microseconds
   resolution and usually microseconds accuracy if its result is used.
- touch then tries utimes(non_NULL) with the current time that it
   just read.  This usually works, giving microseconds resolution,
   etc.  This is OK, but often different from the system policy.
- touch then tries utimes(NULL).  If this works, then it follow the
   system policy.

Another problem is that not all file systems support nanoseconds
resolutions, so not all system policies or utimes() requests can
be honored.

I would usually prefer the system's policy to be enforced as far as
possible.  Thus if the system's policy is microseconds resolution,
then times with nanoseconds resolution should be rounded down to the
nearest microsecond.  This case is most useful since utimes() cannot
preserve times with more than microseconds resolution.  Utilities like
cp(1) blindly round the times given in nanoseconds by stat(2) to ones
that can be written by utimes(2), so this often happens in an
uncontrollable way anyway (POSIX is finally getting around to specifying
permissible errors for unrepresentable resolutions).  But sometimes I
want utimes() to preserve times as well as possible.

Bruce