From owner-svn-src-head@FreeBSD.ORG Thu May 8 15:37:40 2014 Return-Path: Delivered-To: svn-src-head@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 1634C7E0; Thu, 8 May 2014 15:37:40 +0000 (UTC) Received: from mail-we0-x22d.google.com (mail-we0-x22d.google.com [IPv6:2a00:1450:400c:c03::22d]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 2A84BE63; Thu, 8 May 2014 15:37:39 +0000 (UTC) Received: by mail-we0-f173.google.com with SMTP id u57so2678512wes.18 for ; Thu, 08 May 2014 08:37:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=c1VAbJ10lsFZ/1gy8p3w2gMMIK9AKR2EwZXzK75oRgA=; b=zRT24L402qaUgs0BRlBF9KnngtsK7Vyo+nfx5mmlpS33jmmqIglOk5lZ3VSoKfS1EA fB9HryZJiU3JditB1F5/Oqm3rjQNV8rRtd1fbeCd4YxtxEHh8/JCCx15HRj9xnE5lmOJ dhavdB5DMV5bONcYOAn4xtSfjajZER/wYIyYMJWp5n4WzIJwPmzdVp7dZlqBCj6zT0zi aKzZiVfBrH+x1WzjAAKmM9fAovk0dcAJe05M0YWdeWxY4wt69VfWnMyguXrfagy9Rxlm 7qc7XoH+vUpy6PtfH15QPv5zKDHAMX57Vb4pveKYX2u174H4Lfw1I5bG8RzbVbG5VDqv ihjA== MIME-Version: 1.0 X-Received: by 10.180.14.233 with SMTP id s9mr4066226wic.53.1399563457474; Thu, 08 May 2014 08:37:37 -0700 (PDT) Sender: asomers@gmail.com Received: by 10.194.168.130 with HTTP; Thu, 8 May 2014 08:37:37 -0700 (PDT) In-Reply-To: <20140508111443.S1000@besplex.bde.org> References: <201405062206.s46M6dxW060155@svn.freebsd.org> <20140507113345.B923@besplex.bde.org> <20140507202623.GA14233@stack.nl> <20140508111443.S1000@besplex.bde.org> Date: Thu, 8 May 2014 09:37:37 -0600 X-Google-Sender-Auth: 8nRL954M4MwMJZh0djxcaEywkUo Message-ID: Subject: Re: svn commit: r265472 - head/bin/dd From: Alan Somers To: Bruce Evans Content-Type: text/plain; charset=UTF-8 Cc: "svn-src-head@freebsd.org" , "svn-src-all@freebsd.org" , "src-committers@freebsd.org" , Alan Somers , Jilles Tjoelker X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 08 May 2014 15:37:40 -0000 On Wed, May 7, 2014 at 9:39 PM, Bruce Evans wrote: > On Wed, 7 May 2014, Jilles Tjoelker wrote: > >> On Wed, May 07, 2014 at 12:10:31PM -0600, Alan Somers wrote: >>> >>> On Tue, May 6, 2014 at 9:47 PM, Bruce Evans wrote: >>>> >>>> On Tue, 6 May 2014, Alan Somers wrote: >>>>> >>>>> ... >>>>> >>>>> The solution is to use clock_gettime(2) with CLOCK_MONOTONIC_PRECISE >>>>> as >>>>> the >>>>> clock_id. That clock advances steadily, regardless of changes to the >>>>> system >>>>> clock. >>>>> ... >>>>> +#include >> >> >>>> Use of is a style bug. It is not used in BSD or KNF code >>>> like dd used to be. >> >> >>> sysexits.h is recommended by the err(3) man page. Is that >>> recommendation meant to apply selectively, or is it obsolete, or is >>> some sort of edit war being waged by man page authors? > > > Bug in the err(3) man page. Sort of an edit war. Just 2 FreeBSD > committers liked sysexits and used it in their code and added a > recommendation to use it in some man pages. But it has negative > advantages, and normal BSD programs don't use it. It has been > edited in and out of style(9). > > >> The recommendation for was incompletely removed, yes. > > > It is still in err(3), and sysexits(3) still justifies itself by > pointing to partly-removed words in style(9). > > err(3) is the last place that should recommend using sysexits. err() > gives a nice way of encouraging text descriptions for all exits. > With text descriptions, there is almost no need for cryptic numeric > exit codes. Only sets of programs that communicate a little status > in the exit code should use sysexits (or perhaps their own exit > codes, or certain standard exit codes like 126 or 127 for xargs and > some other utilities). Some of the uses of the standard exit codes > are even. I don't know of any utility except possibly sendmail that > documents that it uses sysexits enough for its exit codes to be > useful for more than a binary success/fail decision. Certainly not > dd after these changes. If its use of sysexits were documented, > then the documentation would say "dd uses sysexits to report 3 errors > that can't happen; otherwise, it uses the normal 2-state exit codes > (there is a macro for them. It expands to the concise but > grammatically challenged "exits 0 on success, and >0 if an error > occurs". Here ">0" standardises the usual sloppiness of not > distinguishing codes between 1 and 127). > > sysexits(3) now says: > > % DESCRIPTION > % According to style(9), it is not a good practice to call exit(3) with > % arbitrary values to indicate a failure condition when ending a > program. > % Instead, the pre-defined exit codes from sysexits should be used, so > the > % caller of the process can get a rough estimation about the failure > class > % without looking up the source code. > > but style(9) now says: > > % Exits should be 0 on success, or 1 on failure. > % % exit(0); /* > % * Avoid obvious comments such as > % * "Exit 0 on success." > % */ > % } > > The latter is not what I asked for either. In previous discussion > of this, I think we agreed to at least mention EXIT_SUCCESS and > EXIT_FAILURE, and possibly deprecate sysexits. > > This is a weakened version of the 4.4BSD style rules, which say: > > % /* > % * Exits should be 0 on success, and 1 on failure. Don't denote > % * all the possible exit points, using the integers 1 through 300. > % */ > % exit(0); /* Avoid obvious comments such as "Exit 0 on success." > */ > > The main point of this is to disallow cryptic undocumented exit statuses. > Recommending sysexits almost reverses this. It gives cryptic undocumented > error statuses that are not even easy to decrypt for programs. Programs > can look up sysexits, but without documentation there is no guarantee that > the encoding is according to sysexits. Actually documenting use of > sysexits would make it even more painful to use. > > >>> [snip] >>>>> >>>>> - st.start = tv.tv_sec + tv.tv_usec * 1e-6; >>>>> + if (clock_gettime(CLOCK_MONOTONIC_PRECISE, &tv)) >>>>> + err(EX_OSERR, "clock_gettime"); >>> >>> [snip] >>>>> >>>>> + st.start = tv.tv_sec + tv.tv_nsec * 1.0e-9; >>>>> } >> >> >> The floating point addition starts losing precision after 8388608 >> seconds (slightly more than 97 days, a plausible uptime for a server). >> It is better to subtract the timespecs to avoid this issue. > > > No, it is better to use floating point for results that only need to > be approximate. Especially when the inputs are approximate and the > final approximation doesn't need to be very accurate. > > Floating point is good for all timespec and timeval calculations, > except in the kernel where it is unavailable. timespecs and timevals > are mostly used for timeouts, and the kernel isn't very careful about > exact timeouts. Short timeouts have inherent large inaccuracy due > to interrupt granularity and latency. Long timeouts can be relatively > more accurate, but only if the kernel is careful about them. It is > only careful in some places. No, Jilles is right. The problem isn't that dd uses doubles; it's that dd converts longs to doubles _before_ subtracting the values. That causes rounding if the tv_sec values are large. If the implementation of CLOCK_MONOTONIC ever changed to measure time since the Epoch, or something similar, then the rounding error would be extremely significant. Better to subtract the timespecs, then convert to double. > > >> With microseconds, the precision of a double is sufficient for 272 >> years, so that calculation is probably acceptable. > > > dd actually uses double, but float would be plenty. systat uses a > mixture of float and double. double througout is better because > using the smaller type float tends to give negative optimizations. > devstat uses long double. That's really silly for statistics. > On some arches, it is no different from double (so nothing can > depend on extra precision from it). On sparc64, it is a negative > optimization by a factor of hundreds. > > >>> [snip] >>> Even if nanosecond resolution isn't useful, monotonicity is. Nobody >>> should be using a nonmonotonic clock just to measure durations. I >>> started an audit of all of FreeBSD to look for other programs that use >>> gettimeofday to measure durations. I haven't finished, but I've >>> already found a lot, including xz, ping, hastd, fetch, systat, powerd, >>> and others. I don't have time to fix them, though. Would you be >>> interested, or do you know anyone else who would? >> >> >> I have a local patch for time(1). >> >> Whether the monotonic clock is right also depends on how long the >> durations typically are. For very long durations, users might refer to >> wall clocks and CLOCK_REALTIME may be more appropriate. > > > Yes, monotonic clocks are often best, but there are many bugs in this > area. The most relevant one is perhaps that CLOCK_MONOTONIC is only > monotonic. It is unclear if standards require it to have any relation > to actual time. In practice in FreeBSD, it gives the actual time that > the system is up and is not suspended. It is thus especially unusable > for setting alarm clocks in the morning since suspension overnight is > more likely than at other times. Alarm clocks need to use real time > anyway. nanosleep() is almost unusable for setting alarm clocks due > to this problem, its bugs, and other reasons: > - nanosleep() is specified to sleep on real time, but in FreeBSD it sleeps > on monotonic time. clock_nanosleep() is specified to sleep on a > specified clock id, but is not implemented in FreeBSD. > - I don't see any way to use the broken nanosleep() for setting realtime > alarms except to take short sleeps and check the real time on waking > up. Kernel timer code does things like this internally, but not > very accurately, and for nanosleep() its sleeps are not short enough > to work and it checks the wrong clock id on waking up. > - nanosleep() takes a relative time, so even a nanosleep() that sleeps > on the correct clock id would be hard to use with an overnight timeout. > You would have to know about daylight savings adjustments and either > compensate for them up front or wake up an hour or 2 early to check > for a switch. > - there are some POSIX realtime functions that support sleeping on an > arbitrary clock id, and also support sleeping until an absolute > time. These are supported FreeBSD. I haven't actually used them. > They are sloppy in different ways than older FreeBSD timer code (and > not as up to date with the change to sbintime_t). They seem to be > unaware of daylight savings and not use short enough sleeps to work > across switches. > - nanosleep() is specified to sleep in realtime. Actually more > specifically, to use CLOCK_REALTIME for its clock id. But its interval > is relative, so it is unclear even what this means. > > Taking averages over days has similar problems. They should probably > use the monotonic system up time, not the system up time less the > system suspension time. Due to the bug of not counting suspension > time, using the real time clock is probably better. It may jump by > up to about 1 hour across daylight savings switches, but that won't > take it backwards, but the monotonic clock may fail to advance by > much more than 1 hour. > > POSIX doesn't actually teh monotonic clock to fail to advance across > suspsensions or for other reasons. From an old draft: > > % 6679 MON If the Monotonic Clock option is supported, all > implementations shall support a clock_id of > % 6680 CLOCK_MONOTONIC defined in . This clock > represents the monotonic clock for the > % 6681 system. For this clock, the value returned by > clock_gettime( ) represents the amount of time (in > % 6682 seconds and nanoseconds) since an unspecified point in > the past (for example, system start-up > % 6683 time, or the Epoch). This point does not change after > system start-up time. The value of the > > Here "amount of time" is fuzzy, but clearly it should be in physical time > and as accurate as possible. > > FreeBSD's implementation also breaks the "unspecified point in the past" > by frobbing it to implement the real time. It is only unspecified in > POSIX. In FreeBSD, you can see it using sysctl kern.boottime and > indirectly using uptime(1). uptime (that is, w), has been changed to > use CLOCK_UPTIME, and that gives some of the long-term timing bugs > mentioned above. Suppose for example that the system booted at 1:00 am > on a certain day. The boot time is whatever it is, and shouldn't > change. It serves as the "unspecified point in the past". It is not > affected by DST switches or by micro-adjustments using adjtime() or ntpd. > However, suppose the clock drifts by 1 second and the real time is fixed > up by stepping the clock. The real time becomes correct, but the monotonic > time remains off by 1 second. This is implemented by stepping the boot > time to 1:01 am or 0:59 am. The boot time becomes wrong too. CLOCK_UPTIME > is the same as CLOCK_MONOTONIC, so it is also off by 1 second. This can > be seen in uptime(1) output. The errors may accumulate. > > Of course, the monotonic clock cannot be stepped backwards. Stepping > it foward wouldn't break it much more than leaving it off by 1 second > forever. However, the only reasonably correct implementation is to > micro-adjust it until it catches up with any steps in the realtime > clock. Only do this for small adjustments. After suspension, it > should be stepped forwards by a large amount. > > I think bad things happen to the boot time after suspension too. The > real time must be stepped forward by a large amount, and doing that > steps the boot time by a large amount. > > Similarly for booting if the realtime is initially local. It is > stepped to make it UTC. This is confusing. It happens on my > system, and sysctl kern.boottime shows the boot time > apparently-correctly. But it is correct as a local time. The boot > time is in UTC. sysctl doesn't translate to local time, so the > apparently-correct time is actually off by the step (10 hours). > > Bugs in the boot time can be fixed more easily than by micro-adjusting > the monotonic clock. Just keep the initial boot time (except adjust it > when it was initially local instead of UTC) and frob the real time > using a different variable. Export both variables so that applications > can compensate for the frobbing at the cost of some complexity. E.g., > in uptime(1): > > clock_gettime(CLOCK_UPTIME, &ts); > /* > * Actually, do the compensations in the kernel for CLOCK_UPTIME. > * It doesn't need to be monotonic. But suppose it is the same > * as the unfixed CLOCK_MONOTONIC and compensate here. > * > * Also fix the bogus variable name 'tp'. > */ > sysctl_mumble(&boottime); > sysctl_mumble(&frobbed_boottime); > uptime = ts.tv_sec +- (boottime.tv_sec - frobbed_boottime.tv_sec); > > Note that the compensation may go backwards, so this method doesn't work > in general for monotonic times. However, it can be used if the compensation > is non-negative or relatively small negative. dd could use this method. > It already has to fix up for zero times and still has parts of the old > method that fixes up for negative times. Note that the compensation may > be very large across a suspension. You might start dd, SIGSTOP it, suspend > the system and restart everything a day later. The compensation would be > about 1 day. The average from this wouldn't be very useful, but it would > be the same as if dd was stopped for a day but the system was not suspended. Wouldn't it be simpler just for the kernel to adjust CLOCK_MONOTONIC to add suspension time? -Alan > > Bruce