Date: Sat, 7 Feb 2015 21:15:30 +1100 (EST) From: Bruce Evans <brde@optusnet.com.au> To: Jilles Tjoelker <jilles@stack.nl> Cc: arch@freebsd.org Subject: Re: Change default VFS timestamp precision? Message-ID: <20150207190728.W1334@besplex.bde.org> In-Reply-To: <20150206231812.GA13408@stack.nl> References: <201412161348.41219.jhb@freebsd.org> <20141216233844.GA1490@stack.nl> <20141217105256.J1490@besplex.bde.org> <20150206231812.GA13408@stack.nl>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, 7 Feb 2015, Jilles Tjoelker wrote: > On Wed, Dec 17, 2014 at 11:39:37AM +1100, Bruce Evans wrote: >> On Wed, 17 Dec 2014, Jilles Tjoelker wrote: >* ... >>> A quick grep also shows various calls to utime(3) with a non-NULL timep >>> argument. The ones in contrib/bmake and usr.bin/make definitely look >>> incorrect. > >> Do you mean that they are incorrect because they use utime(3) and not >> utimes(3), or just because they use the non-NULL timep? The latter >> is just a style bug in at least usr.bin/make. Since it hasn't >> caught up with the existence of utimes(3), it is reasonable for it >> to also have not caught up with the existence of the non-NULL timep. >> It just wants to set the time to the current time. It uses its own >> copy of the current time for this. > > Calling any utime/utimes/utimens function with the current time subverts > vfs.timestamp_precision. This may cause timestamps set by different > applications to compare differently from provable ordering. For example, > if "make -t" finishes within one second, a file created before it was > started might have a timestamp 1423261861.1 while files touched by "make > -t" will have timestamp 1423261861.0. This annoys me for backups too. Verifying backups includes checking their timestamps, and it is too hard to determine if a difference is small enough to not be an error. I mostly use old versions of FreeBSD which don't have *timens*(). So utilities cannot restores timestamps with nanoseconds resolution. Backup utilities are also further than most utilities from being able to back them up (some backup formats barely have seconds resolution). So times with nanoseconds (or even milliseconds) resolution shouldn't be created. vfs.timestampe_precision gives this. Except touch(1) subverts it. > Also, with NFS it is better to use the server's clock for timestamps, so > timestamps in the same filesystem are consistent. I think this happens in most cases. nfs has remarkably little code for setting timestamps. Hmm, now I wonder how its caching timeouts work when the server times differ significantly. It compares file mtimes and ctimes with timeouts, but the mtimes and ctimes are usually in server real time (or worse when an application steps the mtime backwards), but timeouts should be in client monotonic time. > ... > Calling utime() with a NULL timep is not inherently wrong, although > futimes() or futimens() is preferable if the file is open. I think it is as correct as possible for the current time. It doesn't subvert vfs.timestamp_precision. > ... >> This reminds me of related bugs in touch(1): >> - touch also tries first with a non-NULL timep. So it uses its own idea >> of the current time. This is normally the one returned a little in >> the past by gettimeofday(). It has microseconds resolution. This >> normally succeeds, giving a result that is inconsistent with the >> one that would result from using a NULL timep, since the kernel now >> honors vfs.timestamp_precision and that is not usually 2. > > Yes, this is wrong the same way as in "make -t". > >> - sometimes the first try fails. Then touch retries with a NULL timep, >> if the semantics allow this. Sometimes this succeeds. > > Another bug: it retries with a NULL timep even if -a or -m is given, > requesting only one of the timestamps be updated. Perhaps that is the best that it can do, or it can be treated as an error since the application/user shouldn't try to set only one of the timestamps on a file system that doesn't support both. However, it might support just the one asked for, and asking for both when only 1 is supported is not treated as an error. > The new functionality in futimens/utimensat allows doing "touch -a" and > "touch -m" with the current time properly (setting one timestamp to > vfs_timestamp and leaving the other timestamp unchanged). As with the > old methods, only the owner of the file and root can do this. When this gets down to individual file systems, the timestamp to be left unchanged becomes VNOVAL and is always (?) ignored. However, to make the usual case when changes to both timestamps are requested work, requests to change nonexistent attributes are usually ignored too. I don't see any better way to control this than a mount flag for every nonexistent attribute. msdosfs currently has hard-coded policies that differ for each nonexistent attribute. They are mostly too strict except for atimes. > The below patch fixes these problems. Looks good. > ... >> The non-NULL timep in make gives the following bug since make doesn't >> retry with a NULL timep: make -t fails if the files ar writeable but >> not owned. > > This is best fixed by always using a NULL timep. I think this is OK for msdosfs. The problems with setting timestamps in msdosfs are more in cp -p and mv. There the NULL timep cannot be used. msdosfs is very strict in this case. cp and mv are also strict, except mv was broken recently to hide the corresponding problems for unsettable flags (e.g., zfs sets the archive flag a lot, but nfs doesn't support preserving it and hard-codes strict failures for attempts to set it). The strictness is mostly just to fail to copy uncopyable attributes and just warn about this. Even the warnings are too much when they occur for every file. Even mv is mostly not required by POSIX to duplicate attributes. But it is required to warn. So nfs's strictness for chflags() is the correct default and msdosfs's non-strictness for atimes is an incorrect default, but more practical. > Index: usr.bin/touch/touch.c > =================================================================== > --- usr.bin/touch/touch.c (revision 277645) > +++ usr.bin/touch/touch.c (working copy) > ... > @@ -239,7 +222,7 @@ > int yearset; > char *p; > /* Start with the current time. */ > - now = tvp[0].tv_sec; > + now = time(NULL); > if ((t = localtime(&now)) == NULL) > err(1, "localtime"); > /* [[CC]YY]MMDDhhmm[.SS] */ This seems to be broken now, since time() is broken now. I think tvp was obtained using clock_gettime(CLOCK_REALTIME, ... ) (previously gettimeofday). These give the correct time. time() used to use gettimeofday() to get the same correct time. But someone broke time(). It now uses clock_gettime(TIME_SECOND, ... ). TIME_SECOND is getbintime() rounded down. getbintime() lags the current time by an average of tc_tick * 1/hz / 2 seconds. After rounding down, TIME_SECOND lags the current time by a full seconds for a duration of an average of tc_tick * 1/hz / 2 seconds every second. The bug is compatible with the one given by vfs.timestamp_precision = 0 and other low settings of the sysctl. The latter also uses the lagging clock. Previously, the incoherent clocks showed up in file times tests like: time(&t1); write to a file stat(file, &sb); assert(t1 <= sb.st_mtime); /* can fail due to incoherent clocks */ Add delays as necessary to compensate for the speed of light but not to compensate for buggy clocks. time() now uses the same buggy clock when vfs.timestamp_precision = 0, so the clocks are coherent because they are the same (unless the real time is stepped). OTOH, if CLOCK_GETTIME is used to read t1, then the bug is detected as before. When vfs.timestamp_precision is higher so that the time is up to date, then the incoherencies go in the other direction: write to a file stat(file, &sb); time(&t1); assert(t1 >= sb.st_mtime); /* can fail due to incoherent clocks */ This looks like a file times test, but it actually finds the buggy time(). > @@ -301,7 +284,7 @@ > time_t now; > struct tm *t; > /* Start with the current time. */ > - now = tvp[0].tv_sec; > + now = time(NULL); > if ((t = localtime(&now)) == NULL) > err(1, "localtime"); Similarly. THe comment should be attached to the or just omitted now that the code obiously fetches the (buggy) current time. For a quick fix, keep using clock_gettime(CLOCK_REALTIME, ...). I don't see a good way to be bug for bug compatible with the file system. A bad way is to write to a file on the same file system and stat it, as in the above tests. This gives a time slightly in the past according to the file system's clock. nfs gives interesting complications. The server's clock may be much more incoherent with CLOCK_REALTIME than the local clock. Using CLOCK_SECOND for time() is a bogus optimization since most of the overhead of clock_gettime() is in the syscall unless the timecounter hardware is slow. On freefall now, it is not even optimal or wrong, since on x86 clock_gettime() is not a syscall and the library is missing all the bogus optimizations; it uses binuptime() for everything. This results in CLOCK_REALTIME and CLOCK_SECOND both taking 34 nsec on freefall. It might be possible to reduce this to about 1 nanosecond for CLOCK_SECOND by duplicating more of the kernel. This is not clear, since the library uses excessive locking where the kernel is missing locking. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20150207190728.W1334>