Date: Wed, 17 Dec 2014 10:43:55 +1100 (EST) From: Bruce Evans <brde@optusnet.com.au> To: John Baldwin <jhb@freebsd.org> Cc: arch@freebsd.org Subject: Re: Change default VFS timestamp precision? Message-ID: <20141217085846.G1087@besplex.bde.org> In-Reply-To: <201412161348.41219.jhb@freebsd.org> References: <201412161348.41219.jhb@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 16 Dec 2014, John Baldwin wrote: > We still ship with vfs.timestamp_precision=0 by default meaning that VFS > timestamps have a granularity of one second. It is not unusual on modern > systems for multiple updates to a file or directory to occur within a single > second (and thus share the same effective timestamp). This can break things > that depend on timestamps to know when something has changed or is stale (such > as make(1) or NFS clients). On hardware that has a cheap timecounter, I we > should use the most-precise timestamps (vfs.timestamp_precision=3). However, vfs.timestamp_precision=3 is too precise. It is still impossible to preserve nanoseconds resolution in backups and file copies, since no syscalls support writing it and few backup formats support saving it (dump/restore(8) format supports saving it, but then restore(8) silently truncates to microseconds. In the other direction, POSIX is still far from specifying rounding of unrepresentable times. In the 2007 version, for utimes() it just says that: (1) "[for times in the timeval struct] rounding towards the nearest second may occur" (2) "[for the null timeval pointer case] times of the file shall be set to the current time" (1) allows all sorts of rounding provided it is to nearest. Rounding to nearest has some advantage but is inconsistent with normal practice for times. (2) is impossible to satisfy if the current time is not representable as a file time. FreeBSD now uses vfs_timestamp() to fetch the current time in case (2). (2) seems to forbid this (unless vfs_timestamp() gives the current time). The implementation has the following style bugs: - vfs.timestamp_precision is not documented in any user man page (it is documented in vfs_timestamp(9)) - the kernel documentation is cloned ad nauseum into 2 other places, except for different wording errors: from vfs_subr.c: X /* X * Knob to control the precision of file timestamps: X * X * 0 = seconds only; nanoseconds zeroed. X * 1 = seconds and nanoseconds, accurate within 1/HZ. X * 2 = seconds and nanoseconds, truncated to microseconds. X * >=3 = seconds and nanoseconds, maximum precision. X */ This was once the only documentation. It is not very useful in the kernel, since it does less than echo the code. X enum { TSP_SEC, TSP_HZ, TSP_USEC, TSP_NSEC }; The code also uses this enum obfuscation. This is just an obfuscation, since the enum values are private and have to be translated into 0, 1, 2 and 3 in documentation and in the user API. So the user places that would benefit from names instead of magic numbers can see only the magic numbers, while the kernel places that get negative benefits from the names use both. Bugs in the above comment include: - case 1 is not accurate to 1/HZ, but to tc_tick/HZ. tc_tick is just usually 1. - no accuracy is guaranteed in case 1. It is a certain precision that is guranteed, matching the name of the sysctl. - real documention would also describe the details of the rounding in case 1. It is not simple truncation to a multiple of tc_tick/HZ, although that behaviour would be most useful. The actual behaviour is to leave near-noise in the low digits of tv_nsec. The low 3 digits are almost always 0, so they cannot be copied or backed up, giving the problems as in case 3 with no benefits except faster operation. - the behaviour for (garbage) negative values is not described, although the behaviour for (garbage) values >3 is described. - the differences between accuracy, precision and resolution are especially confusing in case 3 where the precision bumps into the resolution. Clearly the resolution for all cases is that of timespecs, i.e., nanoseconds. But nanoseconds precision is not possible with any supported hardware -- even with a 4GHz TSC as the timecounter, the precision of the timecounter read is a bit fuzzier than 1/4 nanoseconds. The APCI-fast clock has a frequency of ~14 MHz IIRC, so its precision is at most 1/14 microseconds = 70 nanoseconds. A real man page would give a hint about this. The documentation in vfs_timestamp(9) consists mainly of cloning the above comment. X static int timestamp_precision = TSP_SEC; X SYSCTL_INT(_vfs, OID_AUTO, timestamp_precision, CTLFLAG_RW, X ×tamp_precision, 0, "File timestamp precision (0: seconds, " X "1: sec + ns accurate to 1/HZ, 2: sec + ns truncated to ms, " X "3+: sec + ns (max. precision))"); Sysctl descriptions are are not the place to write man pages. This is another bad copy of what should be in a user man page. It is not quite as verbose as the comment, but has an additional error from abbreviations: "microseconds" is misabbreviated to "ms", ("milliseconds"). Quick fix (also fix other style bugs): SYSCTL_INT(_vfs, OID_AUTO, timestamp_precision, CTLFLAG_RW, ×tamp_precision, 0, "File timestamp precision (0: secs; 1: ticks; 2: usecs; 3: nsecs)"); > I'm less sure of what to do for other cases such as i386/amd64 when not using > TSC, or on other platforms. OTOH, perhaps you aren't doing lots of heavy I/O > access on a system with a slow timecounter (or if you are doing heavy I/O, > slow timecounter access won't be your bottleneck)? File times are rarely updated since the updates are normally delayed (except in devfs, where accurate times are least needed so caching would be most beneficial). But the caching means that increasing the timestamp precision won't help much to fix the problem with make. Consider build activity where there are a lot of writes which produce cached mtimes. Some time later, make runs and causes these times to be updated by stat()ing the files. It then sees times that are about the same, and likely to be out of order with respect to the actual writes. Extra precision actually makes the problem worse: with seconds precision, make usually sees the same time for all files, but with nanoseconds precision it sees its own stat() order. The extra precision is only helpful when there is a certain serialization of the stat()s relative to the build operations. Make must have some magic to handle equal file times. When the file time of a source is equal to the file time of the target, make cannot know if the target file is up to date. Make succeeds unsafely in this case -- it considers the target to be up to date. Without this, with seconds resolution almost all builds would produce mostly out of date targets since most steps have run in much less than 1 second. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20141217085846.G1087>