Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 7 Feb 2015 21:15:30 +1100 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Jilles Tjoelker <jilles@stack.nl>
Cc:        arch@freebsd.org
Subject:   Re: Change default VFS timestamp precision?
Message-ID:  <20150207190728.W1334@besplex.bde.org>
In-Reply-To: <20150206231812.GA13408@stack.nl>
References:  <201412161348.41219.jhb@freebsd.org> <20141216233844.GA1490@stack.nl> <20141217105256.J1490@besplex.bde.org> <20150206231812.GA13408@stack.nl>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, 7 Feb 2015, Jilles Tjoelker wrote:

> On Wed, Dec 17, 2014 at 11:39:37AM +1100, Bruce Evans wrote:
>> On Wed, 17 Dec 2014, Jilles Tjoelker wrote:
>* ...
>>> A quick grep also shows various calls to utime(3) with a non-NULL timep
>>> argument. The ones in contrib/bmake and usr.bin/make definitely look
>>> incorrect.
>
>> Do you mean that they are incorrect because they use utime(3) and not
>> utimes(3), or just because they use the non-NULL timep?  The latter
>> is just a style bug in at least usr.bin/make.  Since it hasn't
>> caught up with the existence of utimes(3), it is reasonable for it
>> to also have not caught up with the existence of the non-NULL timep.
>> It just wants to set the time to the current time.  It uses its own
>> copy of the current time for this.
>
> Calling any utime/utimes/utimens function with the current time subverts
> vfs.timestamp_precision. This may cause timestamps set by different
> applications to compare differently from provable ordering. For example,
> if "make -t" finishes within one second, a file created before it was
> started might have a timestamp 1423261861.1 while files touched by "make
> -t" will have timestamp 1423261861.0.

This annoys me for backups too.  Verifying backups includes checking
their timestamps, and it is too hard to determine if a difference is
small enough to not be an error.  I mostly use old versions of FreeBSD
which don't have *timens*().  So utilities cannot restores timestamps
with nanoseconds resolution.  Backup utilities are also further than
most utilities from being able to back them up (some backup formats
barely have seconds resolution).  So times with nanoseconds (or even
milliseconds) resolution shouldn't be created.  vfs.timestampe_precision
gives this.  Except touch(1) subverts it.

> Also, with NFS it is better to use the server's clock for timestamps, so
> timestamps in the same filesystem are consistent.

I think this happens in most cases.  nfs has remarkably little code
for setting timestamps.  Hmm, now I wonder how its caching timeouts
work when the server times differ significantly.  It compares file
mtimes and ctimes with timeouts, but the mtimes and ctimes are usually
in server real time (or worse when an application steps the mtime
backwards), but timeouts should be in client monotonic time.

> ...
> Calling utime() with a NULL timep is not inherently wrong, although
> futimes() or futimens() is preferable if the file is open.

I think it is as correct as possible for the current time.  It doesn't
subvert vfs.timestamp_precision.

> ...
>> This reminds me of related bugs in touch(1):
>> - touch also tries first with a non-NULL timep.  So it uses its own idea
>>    of the current time.  This is normally the one returned a little in
>>    the past by gettimeofday().  It has microseconds resolution.  This
>>    normally succeeds, giving a result that is inconsistent with the
>>    one that would result from using a NULL timep, since the kernel now
>>    honors vfs.timestamp_precision and that is not usually 2.
>
> Yes, this is wrong the same way as in "make -t".
>
>> - sometimes the first try fails.  Then touch retries with a NULL timep,
>>    if the semantics allow this.  Sometimes this succeeds.
>
> Another bug: it retries with a NULL timep even if -a or -m is given,
> requesting only one of the timestamps be updated.

Perhaps that is the best that it can do, or it can be treated as an error
since the application/user shouldn't try to set only one of the timestamps
on a file system that doesn't support both.  However, it might support
just the one asked for, and asking for both when only 1 is supported is
not treated as an error.

> The new functionality in futimens/utimensat allows doing "touch -a" and
> "touch -m" with the current time properly (setting one timestamp to
> vfs_timestamp and leaving the other timestamp unchanged). As with the
> old methods, only the owner of the file and root can do this.

When this gets down to individual file systems, the timestamp to be left
unchanged becomes VNOVAL and is always (?) ignored.  However, to make
the usual case when changes to both timestamps are requested work,
requests to change nonexistent attributes are usually ignored too.  I
don't see any better way to control this than a mount flag for every
nonexistent attribute.  msdosfs currently has hard-coded policies that
differ for each nonexistent attribute.  They are mostly too strict except
for atimes.

> The below patch fixes these problems.

Looks good.

> ...
>> The non-NULL timep in make gives the following bug since make doesn't
>> retry with a NULL timep: make -t fails if the files ar writeable but
>> not owned.
>
> This is best fixed by always using a NULL timep.

I think this is OK for msdosfs.  The problems with setting timestamps
in msdosfs are more in cp -p and mv.  There the NULL timep cannot be
used.  msdosfs is very strict in this case.  cp and mv are also strict,
except mv was broken recently to hide the corresponding problems for
unsettable flags (e.g., zfs sets the archive flag a lot, but nfs doesn't
support preserving it and hard-codes strict failures for attempts to set
it).  The strictness is mostly just to fail to copy uncopyable attributes
and just warn about this.  Even the warnings are too much when they occur
for every file.  Even mv is mostly not required by POSIX to duplicate
attributes.  But it is required to warn.  So nfs's strictness for
chflags() is the correct default and msdosfs's non-strictness for
atimes is an incorrect default, but more practical.

> Index: usr.bin/touch/touch.c
> ===================================================================
> --- usr.bin/touch/touch.c	(revision 277645)
> +++ usr.bin/touch/touch.c	(working copy)
> ...
> @@ -239,7 +222,7 @@
> 	int yearset;
> 	char *p;
> 					/* Start with the current time. */
> -	now = tvp[0].tv_sec;
> +	now = time(NULL);
> 	if ((t = localtime(&now)) == NULL)
> 		err(1, "localtime");
> 					/* [[CC]YY]MMDDhhmm[.SS] */

This seems to be broken now, since time() is broken now.  I think tvp
was obtained using clock_gettime(CLOCK_REALTIME, ... ) (previously
gettimeofday).  These give the correct time.  time() used to use
gettimeofday() to get the same correct time.  But someone broke time().
It now uses clock_gettime(TIME_SECOND, ... ).  TIME_SECOND is getbintime()
rounded down.  getbintime() lags the current time by an average of
tc_tick * 1/hz / 2 seconds.  After rounding down, TIME_SECOND lags the
current time by a full seconds for a duration of an  average of
tc_tick * 1/hz / 2 seconds every second.

The bug is compatible with the one given by vfs.timestamp_precision = 0
and other low settings of the sysctl.  The latter also uses the lagging
clock.  Previously, the incoherent clocks showed up in file times tests
like:

 	time(&t1);
 	write to a file
 	stat(file, &sb);
 	assert(t1 <= sb.st_mtime);  /* can fail due to incoherent clocks */

Add delays as necessary to compensate for the speed of light but not to
compensate for buggy clocks.

time() now uses the same buggy clock when vfs.timestamp_precision = 0, so
the clocks are coherent because they are the same (unless the real time is
stepped).  OTOH, if CLOCK_GETTIME is used to read t1, then the bug is
detected as before.

When vfs.timestamp_precision is higher so that the time is up to date,
then the incoherencies go in the other direction:

 	write to a file
 	stat(file, &sb);
 	time(&t1);
 	assert(t1 >= sb.st_mtime);  /* can fail due to incoherent clocks */

This looks like a file times test, but it actually finds the buggy time().

> @@ -301,7 +284,7 @@
> 	time_t now;
> 	struct tm *t;
> 					/* Start with the current time. */
> -	now = tvp[0].tv_sec;
> +	now = time(NULL);
> 	if ((t = localtime(&now)) == NULL)
> 		err(1, "localtime");

Similarly.

THe comment should be attached to the or just omitted now that the code
obiously fetches the (buggy) current time.

For a quick fix, keep using clock_gettime(CLOCK_REALTIME, ...).  I
don't see a good way to be bug for bug compatible with the file system.
A bad way is to write to a file on the same file system and stat it,
as in the above tests.  This gives a time slightly in the past according
to the file system's clock.  nfs gives interesting complications.  The
server's clock may be much more incoherent with CLOCK_REALTIME than
the local clock.

Using CLOCK_SECOND for time() is a bogus optimization since most of
the overhead of clock_gettime() is in the syscall unless the timecounter
hardware is slow.  On freefall now, it is not even optimal or wrong,
since on x86 clock_gettime() is not a syscall and the library is missing
all the bogus optimizations; it uses binuptime() for everything.  This
results in CLOCK_REALTIME and CLOCK_SECOND both taking 34 nsec on
freefall.  It might be possible to reduce this to about 1 nanosecond
for CLOCK_SECOND by duplicating more of the kernel.  This is not clear,
since the library uses excessive locking where the kernel is missing
locking.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20150207190728.W1334>