From owner-freebsd-fs@FreeBSD.ORG  Fri Feb 10 21:14:56 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id AB3521065672
	for <fs@FreeBSD.org>; Fri, 10 Feb 2012 21:14:56 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from fallbackmx10.syd.optusnet.com.au
	(fallbackmx10.syd.optusnet.com.au [211.29.132.251])
	by mx1.freebsd.org (Postfix) with ESMTP id C47448FC16
	for <fs@FreeBSD.org>; Fri, 10 Feb 2012 21:14:55 +0000 (UTC)
Received: from mail02.syd.optusnet.com.au (mail02.syd.optusnet.com.au
	[211.29.132.183])
	by fallbackmx10.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	q1AIPcjK005127 for <fs@FreeBSD.org>; Sat, 11 Feb 2012 05:25:38 +1100
Received: from c211-30-171-136.carlnfd1.nsw.optusnet.com.au
	(c211-30-171-136.carlnfd1.nsw.optusnet.com.au [211.30.171.136])
	by mail02.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	q1AIPSPZ007395
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Sat, 11 Feb 2012 05:25:29 +1100
Date: Sat, 11 Feb 2012 05:25:28 +1100 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Sergey Kandaurov <pluknet@gmail.com>
In-Reply-To: <CAE-mSOK2fo=PsvyQWW1Nz4XPqcr7fKDNCvVjHsUvR2uYmuqFMw@mail.gmail.com>
Message-ID: <20120211042121.B3653@besplex.bde.org>
References: <20120210135527.GR1860@hoeg.nl>
	<CAE-mSOK2fo=PsvyQWW1Nz4XPqcr7fKDNCvVjHsUvR2uYmuqFMw@mail.gmail.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: Ed Schouten <ed@80386.nl>, fs@FreeBSD.org
Subject: Re: Increase timestamp precision?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 10 Feb 2012 21:14:56 -0000

On Fri, 10 Feb 2012, Sergey Kandaurov wrote:

> On 10 February 2012 17:55, Ed Schouten <ed@80386.nl> wrote:
>> Hi all,
>>
>> It seems the default timestamp precision sysctl
>> (vfs.timestamp_precision) is currently set to 0 by default, meaning we
>> don't do any sub-second timestamps on files. Looking at the code, it
>> seems that vfs.timestamp_precision=1 will let it use a cached value with
>> 1 / HZ precision and it looks like it should have little overhead.
>>
>> Would anyone object if I were to change the default from 0 to 1?

Sure.  The setting of 1 is too buggy to use in its current implementation.
I don't know of any fixed version, so there is little experience with
a usable version.  I also wouldn't use getnanotime() for anything.
But I like nanotime().

% void
% vfs_timestamp(struct timespec *tsp)
% {
% 	struct timeval tv;
% 
% 	switch (timestamp_precision) {
% 	case TSP_SEC:
% 		tsp->tv_sec = time_second;
% 		tsp->tv_nsec = 0;

This gives seconds precision.  It is correct.

% 		break;
% 	case TSP_HZ:
% 		getnanotime(tsp);
% 		break;

I must have been asleep when I reviewed this for jdp in 1999.  This
doesn't give 1/HZ precision.  It gives nanoseconds precision with
garbage in the low bits, and about 1/HZ accuracy.

To fix it, round down to 1/HZ precision, or at least to microseconds
precision.

The garbage in the low bits matters mainly because there is no way to
preserve it.  utimes(2) only supports microseconds precision.

% 	case TSP_USEC:
% 		microtime(&tv);
% 		TIMEVAL_TO_TIMESPEC(&tv, tsp);
% 		break;

This gives microseconds precision, but in a silly way.  It should call
nanotime() and then round down to microseconds precision.

% 	case TSP_NSEC:
% 	default:

The default should be an error.

% 		nanotime(tsp);
% 		break;
% 	}
% }

I mostly use TSP_SEC, but there are some buggy file systems and/or
utilities (cvsup?, or perhaps just scp from a system using a different
timestamp precision) that produce sub-seconds precision.  I notice this
when I veryify timestamps in backups.  The backup formats support
microseconds precision at best, so any extra precision in the files
in active file systems gives a verification failure.

> [Yep, sorry I didn't read this mail before replying to your another mail.]
>
> I am for this idea. Increasing vfs.timestamp_precision will allow
> to use nanosecond precision for all those *stat() and *times()
> syscalls which operate on struct timespec.
>
> FWIW, NetBSD uses only nanotime() inside vfs_timestamp() since its
> initial appearance in 2006.

Does NetBSD's nanotime() have full nsec precision and hardware slowness?
There is no hardware yet that can deliver anywhere near nsec accuracy,
so the precision might as well be limited to usec.  i8254 timecounters
take/took 5-50 usec just to read.  ACPI-fast is relatively worse on
today's faster CPUs (1-2 usec).  The non-serializing TSC used to take
only ~10 instructions on Athlons, but it is non-serializing and was
always non-P-state-invariant.  P-state-invariant versions take much
longer (seems to be about 50 cycles in hardware and another 50 in
software for core2), and TSC-low intentionally wastes about 7 low
bits, so its precision is about 64 nsec which is about the same time
as nanotime() takes to read it.

I use TSP_NSEC only for POSIX conformance tests, to break the tests
finding of the bug that even TSP_SEC is broken.  It is broken because
time_second is incoherent with the time(3).  time_second and the
time reported by all the get*time() functions lags the time reported
by the (non-get)*time(), and the lag is random relative to seconds
(and other) boundaries, so rounding to a seconds (or other) boundary
gives different results.  These differences are visible to applications
doing tests like:

 	touch(file);
 	stat(file);
 	sleep(1);
 	touch(file);
 	stat(file);
 	assert(file_mtime_increased_y_at_least_1_second);

This should also show the leap seconds bug in POSIX times (the file time
shouldn't change across a leap second).

Some of the tests do lots of file timestamp changing (I also have to
turn off my usual optimization of mounting with noatime to get them
to pass).  They run fast enough even with TSP_NSEC, at least if the
timecounter is a fast TSC.  File time updates just don't happen enough
for their speed to matter much, provided they are cached enough.  ffs
uses the mark-for-update caching strategy which works well.  It avoids
not only writing to disk, but even reading the timer a lot.  Some other
filesystems like devfs are not careful about this, so the slowness of
silly operations like dd with a block size of 1 on /dev/zero to /dev/null
becmes even more extreme if TSP_NSEC or TSP_USEC is used and the
timecounter is slow.


Bruce