Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 17 Dec 2014 10:43:55 +1100 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        John Baldwin <jhb@freebsd.org>
Cc:        arch@freebsd.org
Subject:   Re: Change default VFS timestamp precision?
Message-ID:  <20141217085846.G1087@besplex.bde.org>
In-Reply-To: <201412161348.41219.jhb@freebsd.org>
References:  <201412161348.41219.jhb@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 16 Dec 2014, John Baldwin wrote:

> We still ship with vfs.timestamp_precision=0 by default meaning that VFS
> timestamps have a granularity of one second.  It is not unusual on modern
> systems for multiple updates to a file or directory to occur within a single
> second (and thus share the same effective timestamp).  This can break things
> that depend on timestamps to know when something has changed or is stale (such
> as make(1) or NFS clients).  On hardware that has a cheap timecounter, I we
> should use the most-precise timestamps (vfs.timestamp_precision=3).  However,

vfs.timestamp_precision=3 is too precise.  It is still impossible to preserve
nanoseconds resolution in backups and file copies, since no syscalls support
writing it and few backup formats support saving it (dump/restore(8) format
supports saving it, but then restore(8) silently truncates to microseconds.

In the other direction, POSIX is still far from specifying rounding of
unrepresentable times.  In the 2007 version, for utimes() it just says
that:

(1) "[for times in the timeval struct] rounding towards the nearest
     second may occur"
(2) "[for the null timeval pointer case] times of the file shall be set
     to the current time"

(1) allows all sorts of rounding provided it is to nearest.  Rounding to
     nearest has some advantage but is inconsistent with normal practice
     for times.
(2) is impossible to satisfy if the current time is not representable as
     a file time.

FreeBSD now uses vfs_timestamp() to fetch the  current time in case (2).
(2) seems to forbid this (unless vfs_timestamp() gives the current time).

The implementation has the following style bugs:
- vfs.timestamp_precision is not documented in any user man page (it is
   documented in vfs_timestamp(9))
- the kernel documentation is cloned ad nauseum into 2 other places,
   except for different wording errors: from vfs_subr.c:

X /*
X  * Knob to control the precision of file timestamps:
X  *
X  *   0 = seconds only; nanoseconds zeroed.
X  *   1 = seconds and nanoseconds, accurate within 1/HZ.
X  *   2 = seconds and nanoseconds, truncated to microseconds.
X  * >=3 = seconds and nanoseconds, maximum precision.
X  */

This was once the only documentation.  It is not very useful in the
kernel, since it does less than echo the code.

X enum { TSP_SEC, TSP_HZ, TSP_USEC, TSP_NSEC };

The code also uses this enum obfuscation.  This is just an obfuscation,
since the enum values are private and have to be translated into 0, 1,
2 and 3 in documentation and in the user API.  So the user places that
would benefit from names instead of magic numbers can see only the magic
numbers, while the kernel places that get negative benefits from the
names use both.

Bugs in the above comment include:
- case 1 is not accurate to 1/HZ, but to tc_tick/HZ.  tc_tick is just
   usually 1.
- no accuracy is guaranteed in case 1.  It is a certain precision that
   is guranteed, matching the name of the sysctl.
- real documention would also describe the details of the rounding in
   case 1.  It is not simple truncation to a multiple of tc_tick/HZ,
   although that behaviour would be most useful.  The actual behaviour
   is to leave near-noise in the low digits of tv_nsec.  The low 3
   digits are almost always 0, so they cannot be copied or backed up,
   giving the problems as in case 3 with no benefits except faster
   operation.
- the behaviour for (garbage) negative values is not described, although
   the behaviour for (garbage) values >3 is described.
- the differences between accuracy, precision and resolution are especially
   confusing in case 3 where the precision bumps into the resolution.
   Clearly the resolution for all cases is that of timespecs, i.e.,
   nanoseconds.  But nanoseconds precision is not possible with any
   supported hardware -- even with a 4GHz TSC as the timecounter, the
   precision of the timecounter read is a bit fuzzier than 1/4 nanoseconds.
   The APCI-fast clock has a frequency of ~14 MHz IIRC, so its precision
   is at most 1/14 microseconds = 70 nanoseconds.  A real man page would
   give a hint about this.

The documentation in vfs_timestamp(9) consists mainly of cloning the
above comment.

X static int timestamp_precision = TSP_SEC;
X SYSCTL_INT(_vfs, OID_AUTO, timestamp_precision, CTLFLAG_RW,
X     &timestamp_precision, 0, "File timestamp precision (0: seconds, "
X     "1: sec + ns accurate to 1/HZ, 2: sec + ns truncated to ms, "
X     "3+: sec + ns (max. precision))");

Sysctl descriptions are are not the place to write man pages.  This is
another bad copy of what should be in a user man page.  It is not quite
as verbose as the comment, but has an additional error from abbreviations:
"microseconds" is misabbreviated to "ms", ("milliseconds").

Quick fix (also fix other style bugs):

SYSCTL_INT(_vfs, OID_AUTO, timestamp_precision, CTLFLAG_RW,
     &timestamp_precision, 0,
     "File timestamp precision (0: secs; 1: ticks; 2: usecs; 3: nsecs)");

> I'm less sure of what to do for other cases such as i386/amd64 when not using
> TSC, or on other platforms.  OTOH, perhaps you aren't doing lots of heavy I/O
> access on a system with a slow timecounter (or if you are doing heavy I/O,
> slow timecounter access won't be your bottleneck)?

File times are rarely updated since the updates are normally delayed (except
in devfs, where accurate times are least needed so caching would be most
beneficial).  But the caching means that increasing the timestamp precision
won't help much to fix the problem with make.  Consider build activity
where there are a lot of writes which produce cached mtimes.  Some time
later, make runs and causes these times to be updated by stat()ing the
files.  It then sees times that are about the same, and likely to be out
of order with respect to the actual writes.  Extra precision actually
makes the problem worse: with seconds precision, make usually sees the
same time for all files, but with nanoseconds precision it sees its own
stat() order.  The extra precision is only helpful when there is a
certain serialization of the stat()s relative to the build operations.

Make must have some magic to handle equal file times.  When the file time
of a source is equal to the file time of the target, make cannot know if
the target file is up to date.  Make succeeds unsafely in this case -- it
considers the target to be up to date.  Without this, with seconds resolution
almost all builds would produce mostly out of date targets since most steps
have run in much less than 1 second.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20141217085846.G1087>