From owner-freebsd-arch@FreeBSD.ORG  Wed Dec 17 14:58:53 2014
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id CB14478B
 for <arch@freebsd.org>; Wed, 17 Dec 2014 14:58:53 +0000 (UTC)
Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1])
 (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id A43D597B
 for <arch@freebsd.org>; Wed, 17 Dec 2014 14:58:53 +0000 (UTC)
Received: from ralph.baldwin.cx (pool-173-70-85-31.nwrknj.fios.verizon.net
 [173.70.85.31])
 by bigwig.baldwin.cx (Postfix) with ESMTPSA id E12BBB97A;
 Wed, 17 Dec 2014 09:58:51 -0500 (EST)
From: John Baldwin <jhb@freebsd.org>
To: Jilles Tjoelker <jilles@stack.nl>
Subject: Re: Change default VFS timestamp precision?
Date: Wed, 17 Dec 2014 09:40:01 -0500
Message-ID: <2034186.iLaW9EGnEt@ralph.baldwin.cx>
User-Agent: KMail/4.14.2 (FreeBSD/10.1-STABLE; KDE/4.14.2; amd64; ; )
In-Reply-To: <20141216233844.GA1490@stack.nl>
References: <201412161348.41219.jhb@freebsd.org>
 <20141216233844.GA1490@stack.nl>
MIME-Version: 1.0
Content-Transfer-Encoding: 7Bit
Content-Type: text/plain; charset="us-ascii"
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7
 (bigwig.baldwin.cx); Wed, 17 Dec 2014 09:58:52 -0500 (EST)
Cc: arch@freebsd.org
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch/>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Dec 2014 14:58:54 -0000

On Wednesday, December 17, 2014 12:38:44 AM Jilles Tjoelker wrote:
> On Tue, Dec 16, 2014 at 01:48:41PM -0500, John Baldwin wrote:
> > We still ship with vfs.timestamp_precision=0 by default meaning that
> > VFS timestamps have a granularity of one second.  It is not unusual on
> > modern systems for multiple updates to a file or directory to occur
> > within a single second (and thus share the same effective timestamp).
> > This can break things that depend on timestamps to know when something
> > has changed or is stale (such as make(1) or NFS clients).  On hardware
> > that has a cheap timecounter, I we should use the most-precise
> > timestamps (vfs.timestamp_precision=3).  However, I'm less sure of
> > what to do for other cases such as i386/amd64 when not using TSC, or
> > on other platforms.  OTOH, perhaps you aren't doing lots of heavy I/O
> > access on a system with a slow timecounter (or if you are doing heavy
> > I/O, slow timecounter access won't be your bottleneck)?
> > 
> > I can think of a few options:
> >  1) Change vfs.timestamp_precision default to 3 for all systems.
> >  
> >  2) Only change vfs.timestamp_precision default to 3 for amd64/i386 using
> >  an
> >  
> >     #ifdef.
> >  
> >  3) Something else?
> 
> Although some breakage may be caused, increasing precision sounds fine
> to me, but only to the level of microseconds, since there is no way to
> set a timestamp to the nanosecond (this would be futimens/utimensat). It
> is easy to be surprised when cp -p creates an file that appears older
> than the original.

Note that vfs_timestamp() always returns a timespec, but 2 would do
microseconds.  The important difference for settings >= 2 is that it queries
the timecounter on each call rather than using a global value that is only
updated either once a second or once a millisecond or so.

> To avoid cross-arch surprises with applications that use
> second-resolution APIs, either all or no architectures should generate
> timestamps more accurate than seconds.

Actually, it will improve our interoperability with other OS's that already
use sub-second timestamps when sharing filesystems over NFS, for example.

> There is no benefit for the particular case of make(1), since it only
> uses timestamps in seconds.

My bad for not checking that further but for assuming make would be impacted.
The use case I _am_ familiar with is NFS servers and NFS v3 clients that 
depend on the mtime of a directory to know when the lookup cache for a 
directory can be invalidated.  Our NFS client now defaults to only trusting 
cached lookups for 60 seconds to workaround races due to seconds-granularity 
in timestamps from some NFS servers at the cost of reducing its effectiveness 
by a fair amount.  Note that Isilon already defaults vfs.timestamp_precision 
to 3 on their appliances, and I recently convinced the folks at TrueNAS to do 
the same.  However, it would also make stock FreeBSD NFS servers more reliable 
for NFS v3 if we changed our default.

-- 
John Baldwin