Date: Thu, 8 May 2014 13:39:09 +1000 (EST) From: Bruce Evans <brde@optusnet.com.au> To: Jilles Tjoelker <jilles@stack.nl> Cc: "svn-src-head@freebsd.org" <svn-src-head@freebsd.org>, "svn-src-all@freebsd.org" <svn-src-all@freebsd.org>, "src-committers@freebsd.org" <src-committers@freebsd.org>, Bruce Evans <brde@optusnet.com.au>, Alan Somers <asomers@freebsd.org> Subject: Re: svn commit: r265472 - head/bin/dd Message-ID: <20140508111443.S1000@besplex.bde.org> In-Reply-To: <20140507202623.GA14233@stack.nl> References: <201405062206.s46M6dxW060155@svn.freebsd.org> <20140507113345.B923@besplex.bde.org> <CAOtMX2h_%2B1G18Nv5JvDE0H7_TZ96p81JotOwhq1Jm-dOOeahPw@mail.gmail.com> <20140507202623.GA14233@stack.nl>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 7 May 2014, Jilles Tjoelker wrote: > On Wed, May 07, 2014 at 12:10:31PM -0600, Alan Somers wrote: >> On Tue, May 6, 2014 at 9:47 PM, Bruce Evans <brde@optusnet.com.au> wrote: >>> On Tue, 6 May 2014, Alan Somers wrote: >>>> ... >>>> The solution is to use clock_gettime(2) with CLOCK_MONOTONIC_PRECISE as >>>> the >>>> clock_id. That clock advances steadily, regardless of changes to the >>>> system >>>> clock. >>>> ... >>>> +#include <sysexits.h> > >>> Use of <sysexits.h> is a style bug. It is not used in BSD or KNF code >>> like dd used to be. > >> sysexits.h is recommended by the err(3) man page. Is that >> recommendation meant to apply selectively, or is it obsolete, or is >> some sort of edit war being waged by man page authors? Bug in the err(3) man page. Sort of an edit war. Just 2 FreeBSD committers liked sysexits and used it in their code and added a recommendation to use it in some man pages. But it has negative advantages, and normal BSD programs don't use it. It has been edited in and out of style(9). > The recommendation for <sysexits.h> was incompletely removed, yes. It is still in err(3), and sysexits(3) still justifies itself by pointing to partly-removed words in style(9). err(3) is the last place that should recommend using sysexits. err() gives a nice way of encouraging text descriptions for all exits. With text descriptions, there is almost no need for cryptic numeric exit codes. Only sets of programs that communicate a little status in the exit code should use sysexits (or perhaps their own exit codes, or certain standard exit codes like 126 or 127 for xargs and some other utilities). Some of the uses of the standard exit codes are even. I don't know of any utility except possibly sendmail that documents that it uses sysexits enough for its exit codes to be useful for more than a binary success/fail decision. Certainly not dd after these changes. If its use of sysexits were documented, then the documentation would say "dd uses sysexits to report 3 errors that can't happen; otherwise, it uses the normal 2-state exit codes (there is a macro for them. It expands to the concise but grammatically challenged "exits 0 on success, and >0 if an error occurs". Here ">0" standardises the usual sloppiness of not distinguishing codes between 1 and 127). sysexits(3) now says: % DESCRIPTION % According to style(9), it is not a good practice to call exit(3) with % arbitrary values to indicate a failure condition when ending a program. % Instead, the pre-defined exit codes from sysexits should be used, so the % caller of the process can get a rough estimation about the failure class % without looking up the source code. but style(9) now says: % Exits should be 0 on success, or 1 on failure. % % exit(0); /* % * Avoid obvious comments such as % * "Exit 0 on success." % */ % } The latter is not what I asked for either. In previous discussion of this, I think we agreed to at least mention EXIT_SUCCESS and EXIT_FAILURE, and possibly deprecate sysexits. This is a weakened version of the 4.4BSD style rules, which say: % /* % * Exits should be 0 on success, and 1 on failure. Don't denote % * all the possible exit points, using the integers 1 through 300. % */ % exit(0); /* Avoid obvious comments such as "Exit 0 on success." */ The main point of this is to disallow cryptic undocumented exit statuses. Recommending sysexits almost reverses this. It gives cryptic undocumented error statuses that are not even easy to decrypt for programs. Programs can look up sysexits, but without documentation there is no guarantee that the encoding is according to sysexits. Actually documenting use of sysexits would make it even more painful to use. >> [snip] >>>> - st.start = tv.tv_sec + tv.tv_usec * 1e-6; >>>> + if (clock_gettime(CLOCK_MONOTONIC_PRECISE, &tv)) >>>> + err(EX_OSERR, "clock_gettime"); >> [snip] >>>> + st.start = tv.tv_sec + tv.tv_nsec * 1.0e-9; >>>> } > > The floating point addition starts losing precision after 8388608 > seconds (slightly more than 97 days, a plausible uptime for a server). > It is better to subtract the timespecs to avoid this issue. No, it is better to use floating point for results that only need to be approximate. Especially when the inputs are approximate and the final approximation doesn't need to be very accurate. Floating point is good for all timespec and timeval calculations, except in the kernel where it is unavailable. timespecs and timevals are mostly used for timeouts, and the kernel isn't very careful about exact timeouts. Short timeouts have inherent large inaccuracy due to interrupt granularity and latency. Long timeouts can be relatively more accurate, but only if the kernel is careful about them. It is only careful in some places. > With microseconds, the precision of a double is sufficient for 272 > years, so that calculation is probably acceptable. dd actually uses double, but float would be plenty. systat uses a mixture of float and double. double througout is better because using the smaller type float tends to give negative optimizations. devstat uses long double. That's really silly for statistics. On some arches, it is no different from double (so nothing can depend on extra precision from it). On sparc64, it is a negative optimization by a factor of hundreds. >> [snip] >> Even if nanosecond resolution isn't useful, monotonicity is. Nobody >> should be using a nonmonotonic clock just to measure durations. I >> started an audit of all of FreeBSD to look for other programs that use >> gettimeofday to measure durations. I haven't finished, but I've >> already found a lot, including xz, ping, hastd, fetch, systat, powerd, >> and others. I don't have time to fix them, though. Would you be >> interested, or do you know anyone else who would? > > I have a local patch for time(1). > > Whether the monotonic clock is right also depends on how long the > durations typically are. For very long durations, users might refer to > wall clocks and CLOCK_REALTIME may be more appropriate. Yes, monotonic clocks are often best, but there are many bugs in this area. The most relevant one is perhaps that CLOCK_MONOTONIC is only monotonic. It is unclear if standards require it to have any relation to actual time. In practice in FreeBSD, it gives the actual time that the system is up and is not suspended. It is thus especially unusable for setting alarm clocks in the morning since suspension overnight is more likely than at other times. Alarm clocks need to use real time anyway. nanosleep() is almost unusable for setting alarm clocks due to this problem, its bugs, and other reasons: - nanosleep() is specified to sleep on real time, but in FreeBSD it sleeps on monotonic time. clock_nanosleep() is specified to sleep on a specified clock id, but is not implemented in FreeBSD. - I don't see any way to use the broken nanosleep() for setting realtime alarms except to take short sleeps and check the real time on waking up. Kernel timer code does things like this internally, but not very accurately, and for nanosleep() its sleeps are not short enough to work and it checks the wrong clock id on waking up. - nanosleep() takes a relative time, so even a nanosleep() that sleeps on the correct clock id would be hard to use with an overnight timeout. You would have to know about daylight savings adjustments and either compensate for them up front or wake up an hour or 2 early to check for a switch. - there are some POSIX realtime functions that support sleeping on an arbitrary clock id, and also support sleeping until an absolute time. These are supported FreeBSD. I haven't actually used them. They are sloppy in different ways than older FreeBSD timer code (and not as up to date with the change to sbintime_t). They seem to be unaware of daylight savings and not use short enough sleeps to work across switches. - nanosleep() is specified to sleep in realtime. Actually more specifically, to use CLOCK_REALTIME for its clock id. But its interval is relative, so it is unclear even what this means. Taking averages over days has similar problems. They should probably use the monotonic system up time, not the system up time less the system suspension time. Due to the bug of not counting suspension time, using the real time clock is probably better. It may jump by up to about 1 hour across daylight savings switches, but that won't take it backwards, but the monotonic clock may fail to advance by much more than 1 hour. POSIX doesn't actually teh monotonic clock to fail to advance across suspsensions or for other reasons. From an old draft: % 6679 MON If the Monotonic Clock option is supported, all implementations shall support a clock_id of % 6680 CLOCK_MONOTONIC defined in <time.h>. This clock represents the monotonic clock for the % 6681 system. For this clock, the value returned by clock_gettime( ) represents the amount of time (in % 6682 seconds and nanoseconds) since an unspecified point in the past (for example, system start-up % 6683 time, or the Epoch). This point does not change after system start-up time. The value of the Here "amount of time" is fuzzy, but clearly it should be in physical time and as accurate as possible. FreeBSD's implementation also breaks the "unspecified point in the past" by frobbing it to implement the real time. It is only unspecified in POSIX. In FreeBSD, you can see it using sysctl kern.boottime and indirectly using uptime(1). uptime (that is, w), has been changed to use CLOCK_UPTIME, and that gives some of the long-term timing bugs mentioned above. Suppose for example that the system booted at 1:00 am on a certain day. The boot time is whatever it is, and shouldn't change. It serves as the "unspecified point in the past". It is not affected by DST switches or by micro-adjustments using adjtime() or ntpd. However, suppose the clock drifts by 1 second and the real time is fixed up by stepping the clock. The real time becomes correct, but the monotonic time remains off by 1 second. This is implemented by stepping the boot time to 1:01 am or 0:59 am. The boot time becomes wrong too. CLOCK_UPTIME is the same as CLOCK_MONOTONIC, so it is also off by 1 second. This can be seen in uptime(1) output. The errors may accumulate. Of course, the monotonic clock cannot be stepped backwards. Stepping it foward wouldn't break it much more than leaving it off by 1 second forever. However, the only reasonably correct implementation is to micro-adjust it until it catches up with any steps in the realtime clock. Only do this for small adjustments. After suspension, it should be stepped forwards by a large amount. I think bad things happen to the boot time after suspension too. The real time must be stepped forward by a large amount, and doing that steps the boot time by a large amount. Similarly for booting if the realtime is initially local. It is stepped to make it UTC. This is confusing. It happens on my system, and sysctl kern.boottime shows the boot time apparently-correctly. But it is correct as a local time. The boot time is in UTC. sysctl doesn't translate to local time, so the apparently-correct time is actually off by the step (10 hours). Bugs in the boot time can be fixed more easily than by micro-adjusting the monotonic clock. Just keep the initial boot time (except adjust it when it was initially local instead of UTC) and frob the real time using a different variable. Export both variables so that applications can compensate for the frobbing at the cost of some complexity. E.g., in uptime(1): clock_gettime(CLOCK_UPTIME, &ts); /* * Actually, do the compensations in the kernel for CLOCK_UPTIME. * It doesn't need to be monotonic. But suppose it is the same * as the unfixed CLOCK_MONOTONIC and compensate here. * * Also fix the bogus variable name 'tp'. */ sysctl_mumble(&boottime); sysctl_mumble(&frobbed_boottime); uptime = ts.tv_sec +- (boottime.tv_sec - frobbed_boottime.tv_sec); Note that the compensation may go backwards, so this method doesn't work in general for monotonic times. However, it can be used if the compensation is non-negative or relatively small negative. dd could use this method. It already has to fix up for zero times and still has parts of the old method that fixes up for negative times. Note that the compensation may be very large across a suspension. You might start dd, SIGSTOP it, suspend the system and restart everything a day later. The compensation would be about 1 day. The average from this wouldn't be very useful, but it would be the same as if dd was stopped for a day but the system was not suspended. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20140508111443.S1000>