From owner-freebsd-fs@FreeBSD.ORG Fri Feb 10 21:14:56 2012 Return-Path: Delivered-To: fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AB3521065672 for ; Fri, 10 Feb 2012 21:14:56 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from fallbackmx10.syd.optusnet.com.au (fallbackmx10.syd.optusnet.com.au [211.29.132.251]) by mx1.freebsd.org (Postfix) with ESMTP id C47448FC16 for ; Fri, 10 Feb 2012 21:14:55 +0000 (UTC) Received: from mail02.syd.optusnet.com.au (mail02.syd.optusnet.com.au [211.29.132.183]) by fallbackmx10.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id q1AIPcjK005127 for ; Sat, 11 Feb 2012 05:25:38 +1100 Received: from c211-30-171-136.carlnfd1.nsw.optusnet.com.au (c211-30-171-136.carlnfd1.nsw.optusnet.com.au [211.30.171.136]) by mail02.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id q1AIPSPZ007395 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 11 Feb 2012 05:25:29 +1100 Date: Sat, 11 Feb 2012 05:25:28 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Sergey Kandaurov In-Reply-To: Message-ID: <20120211042121.B3653@besplex.bde.org> References: <20120210135527.GR1860@hoeg.nl> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Ed Schouten , fs@FreeBSD.org Subject: Re: Increase timestamp precision? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 10 Feb 2012 21:14:56 -0000 On Fri, 10 Feb 2012, Sergey Kandaurov wrote: > On 10 February 2012 17:55, Ed Schouten wrote: >> Hi all, >> >> It seems the default timestamp precision sysctl >> (vfs.timestamp_precision) is currently set to 0 by default, meaning we >> don't do any sub-second timestamps on files. Looking at the code, it >> seems that vfs.timestamp_precision=1 will let it use a cached value with >> 1 / HZ precision and it looks like it should have little overhead. >> >> Would anyone object if I were to change the default from 0 to 1? Sure. The setting of 1 is too buggy to use in its current implementation. I don't know of any fixed version, so there is little experience with a usable version. I also wouldn't use getnanotime() for anything. But I like nanotime(). % void % vfs_timestamp(struct timespec *tsp) % { % struct timeval tv; % % switch (timestamp_precision) { % case TSP_SEC: % tsp->tv_sec = time_second; % tsp->tv_nsec = 0; This gives seconds precision. It is correct. % break; % case TSP_HZ: % getnanotime(tsp); % break; I must have been asleep when I reviewed this for jdp in 1999. This doesn't give 1/HZ precision. It gives nanoseconds precision with garbage in the low bits, and about 1/HZ accuracy. To fix it, round down to 1/HZ precision, or at least to microseconds precision. The garbage in the low bits matters mainly because there is no way to preserve it. utimes(2) only supports microseconds precision. % case TSP_USEC: % microtime(&tv); % TIMEVAL_TO_TIMESPEC(&tv, tsp); % break; This gives microseconds precision, but in a silly way. It should call nanotime() and then round down to microseconds precision. % case TSP_NSEC: % default: The default should be an error. % nanotime(tsp); % break; % } % } I mostly use TSP_SEC, but there are some buggy file systems and/or utilities (cvsup?, or perhaps just scp from a system using a different timestamp precision) that produce sub-seconds precision. I notice this when I veryify timestamps in backups. The backup formats support microseconds precision at best, so any extra precision in the files in active file systems gives a verification failure. > [Yep, sorry I didn't read this mail before replying to your another mail.] > > I am for this idea. Increasing vfs.timestamp_precision will allow > to use nanosecond precision for all those *stat() and *times() > syscalls which operate on struct timespec. > > FWIW, NetBSD uses only nanotime() inside vfs_timestamp() since its > initial appearance in 2006. Does NetBSD's nanotime() have full nsec precision and hardware slowness? There is no hardware yet that can deliver anywhere near nsec accuracy, so the precision might as well be limited to usec. i8254 timecounters take/took 5-50 usec just to read. ACPI-fast is relatively worse on today's faster CPUs (1-2 usec). The non-serializing TSC used to take only ~10 instructions on Athlons, but it is non-serializing and was always non-P-state-invariant. P-state-invariant versions take much longer (seems to be about 50 cycles in hardware and another 50 in software for core2), and TSC-low intentionally wastes about 7 low bits, so its precision is about 64 nsec which is about the same time as nanotime() takes to read it. I use TSP_NSEC only for POSIX conformance tests, to break the tests finding of the bug that even TSP_SEC is broken. It is broken because time_second is incoherent with the time(3). time_second and the time reported by all the get*time() functions lags the time reported by the (non-get)*time(), and the lag is random relative to seconds (and other) boundaries, so rounding to a seconds (or other) boundary gives different results. These differences are visible to applications doing tests like: touch(file); stat(file); sleep(1); touch(file); stat(file); assert(file_mtime_increased_y_at_least_1_second); This should also show the leap seconds bug in POSIX times (the file time shouldn't change across a leap second). Some of the tests do lots of file timestamp changing (I also have to turn off my usual optimization of mounting with noatime to get them to pass). They run fast enough even with TSP_NSEC, at least if the timecounter is a fast TSC. File time updates just don't happen enough for their speed to matter much, provided they are cached enough. ffs uses the mark-for-update caching strategy which works well. It avoids not only writing to disk, but even reading the timer a lot. Some other filesystems like devfs are not careful about this, so the slowness of silly operations like dd with a block size of 1 on /dev/zero to /dev/null becmes even more extreme if TSP_NSEC or TSP_USEC is used and the timecounter is slow. Bruce