From owner-svn-src-head@freebsd.org Wed Mar 22 03:45:00 2017 Return-Path: Delivered-To: svn-src-head@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 297A3D17710; Wed, 22 Mar 2017 03:45:00 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail107.syd.optusnet.com.au (mail107.syd.optusnet.com.au [211.29.132.53]) by mx1.freebsd.org (Postfix) with ESMTP id CC5491DC3; Wed, 22 Mar 2017 03:44:59 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from besplex.bde.org (c122-106-153-191.carlnfd1.nsw.optusnet.com.au [122.106.153.191]) by mail107.syd.optusnet.com.au (Postfix) with ESMTPS id A591DD43AE4; Wed, 22 Mar 2017 14:44:51 +1100 (AEDT) Date: Wed, 22 Mar 2017 14:44:50 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Eric van Gyzen cc: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: Re: svn commit: r315694 - head/lib/libc/sys In-Reply-To: <201703220050.v2M0ob0p012506@repo.freebsd.org> Message-ID: <20170322120102.P963@besplex.bde.org> References: <201703220050.v2M0ob0p012506@repo.freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=VbSHBBh9 c=1 sm=1 tr=0 a=Tj3pCpwHnMupdyZSltBt7Q==:117 a=Tj3pCpwHnMupdyZSltBt7Q==:17 a=kj9zAlcOel0A:10 a=_9wxYhfm8z9f399oXpcA:9 a=CjuIK1q_8ugA:10 X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 22 Mar 2017 03:45:00 -0000 On Wed, 22 Mar 2017, Eric van Gyzen wrote: > Log: > clock_gettime.2: add some clock IDs > > Add the CLOCK_PROCESS_CPUTIME_ID and CLOCK_THREAD_CPUTIME_ID clock_id > values to the clock_gettime(2) man page. Reformat the excessively > long paragraph (sentence!) into a tag list. Some old bugs are now clearer. > Modified: head/lib/libc/sys/clock_gettime.2 > ============================================================================== > --- head/lib/libc/sys/clock_gettime.2 Tue Mar 21 22:41:37 2017 (r315693) > +++ head/lib/libc/sys/clock_gettime.2 Wed Mar 22 00:50:36 2017 (r315694) > ... > +.Pp > +.Bl -tag -width indent -compact > +.It Dv CLOCK_REALTIME > +.It Dv CLOCK_REALTIME_PRECISE > +.It Dv CLOCK_REALTIME_FAST > +Increments as a wall clock should. How should a wall clock increment? :-) Not with leap seconds adjustments or even DST adjustments, since those are confusing for humans, and normally aren't even increments. Some DST adjustments are large decrements, and all leap seconds adjustments have been positive, which causes the wall clock to sort of decrement, but this is usually implemented by not incrementing. I haven't seen a wall clock that displays the 61st second of a minute as ":60" or even stops at ":59". Implementing this extra or sticking for a real clockwork clock would be interesting. Certainly wall clocks in POSIX time, which are what this syscall exists. POSIX time is broken as designed, especially for real times, so nothing "should" track it. I think FreeBSD currently implements leap seconds in POSIX time, so its CLOCK_REALTIME doesn't increment, but leaps back by 1 second, then increments across the second again. This is very bad or large networks, so some systems use a hack to slew the adjustment over a very long period. Also, CLOCK_REALTIME_PRECISE is an unportable alias for CLOCK_REALTIME, and CLOCK_REALTIME_FAST is a bad optimization which doesn't increment like a wall clock should. > +.It Dv CLOCK_MONOTONIC > +.It Dv CLOCK_MONOTONIC_PRECISE > +.It Dv CLOCK_MONOTONIC_FAST > +Increments in SI seconds. These don't increment in SI seconds, since they are the same as CLOCK_UPTIME_* which are documented to not increment in some cases. CLOCK_MONOTONIC_PRECISE and CLOCK_MONOTONIC_FAST are unportable, so they can do anything, but POSIX requires CLOCK_MONOTONIC to increment like a clock should, except even more fuzzily than that. POSIX requires that CLOCK_MONOTONIC returns a value "representing" the amount of time (in seconds or nanoseconds) since an unspecified time in the past which does not change after system startup. I don't know exactly what an SI second is, or how an "amount of time" differs from a time or how bad the "representation" can be (it is not required to be accurate to any amount in SI seconds, whatever they are), but clearly the value "should" be as close as the system can get to physical seconds. In another section, POSIX specifies that CLOCK_MONOTONIC shall not jump backwards and that the implementation shall have an implementation-defined limit on forward jumps. "Implementation-defined" requires documentation, but there is nothing about this near here. Of course, since the "representation" doesn't seem to have any limits or require documentation, it can conceal arbitrary large jumps. > +.It Dv CLOCK_UPTIME > +.It Dv CLOCK_UPTIME_PRECISE > +.It Dv CLOCK_UPTIME_FAST > +Starts at zero when the kernel boots and increments > +monotonically in SI seconds while the machine is running. This is almost correct, but means that CLOCK_UPTIME doesn't give the uptime. It is not stated when the machine is running, but in practice it isn't running for the purposes of this statement when it is suspended. On resume, CLOCK_REALTIME is updated (not like a wall clock should be, but with larger jumps and inaccuracies (usually acpi does it and it is off by a few seconds and ntpd struggles to fix this, but sometimes acpi doesn't do this so it is off by a few hours and ntpd gives up and it takes the user restarting ntpd or a more manual fixup to fix this). CLOCK_REALTIME is is not updated on resume, except accidentally by the next bug. CLOCK_UPTIME is also broken by stepping CLOCK_REALTIME. This is misimplemented by stepping boottime. If the step is to adjust for an initially wrong (boot) time, then this adjustment is correct -- the boot time was wrong, so it should change; the real time was wrong so it should change; the step is not to fix drift so CLOCK_MONOTONIC shouldn't change, and CLOCK_UPTIME is CLOCK_MONOTONIC so it shouldn't change. Otherwise, the step is to fix up drift and all clock should change but the boot time shouldn't change. CLOCK_UPTIME* should always CLOCK_MONOTONIC* (and shouldn't exist) once the latter is implemented correctly. The latter is required to be the time since some fixed time, and this time can be the boot time once the latter is fixed (invariant, with its current variation fixed). Adjustments on resume are an example of such a step. They are large jumps forwards and represent reality, so there is no problem in making them (except thundering heres of timeouts, but making CLOCK_MONOTONIC not advance is not the way to fix timeouts). In general, adjustments to fix drift might be backwards, but they can't be for CLOCK_MONOTONIC. They should be handled by slewing the clock backwards over a long period. This already happens for ntpd micro- adjustments. > +.It Dv CLOCK_VIRTUAL > +Increments only when > +the CPU is running in user mode on behalf of the calling process. > +.It Dv CLOCK_PROF > +Increments when the CPU is running in user or kernel mode. It's more obvious that the clock stops incrementing for the other timeouts. > +.It Dv CLOCK_SECOND > +Returns the current second without performing a full time counter > +query, using an in-kernel cached value of the current second. This doeesn't say that the current second is a for a "FAST" CLOCK_REALTIME_FAST, or as usual document any incoherencies. CLOCK_SECOND is another unportability that shouldn't exist. It is useless for its intended purpose in most cases when it is implemented in libc, and then it doesn't do what the man page says that it does -- libc always does a full timecounter query (so the other "FAST" clocks aren't efficient either, but just throw away precision). The "FAST" clocks are incoherent with the "PRECISE" clocks since they use different rounding (round down in more cases). Except when FAST libc versions are mixed with FAST syscall versions. Then user FAST is coherent with user and kernel PRECISE, but not with kernel FAST. The bug is most obvious for CLOCK_SECOND since it has largest granularity: now = user_CLOCK_SECOND; // value n later = kernel_CLOCK_SECOND; // value n - 1 assert(now <= later); // fail Ignore the complication that CLOCK_SECOND is a real time so it can go backwards, and assume that it is monotonic. The user should be aware that FAST clocks are only coherent with each other and that CLOCK_SECOND is FAST and not do assertions similar to the above but with 'now' obtained from a non-FAST clock (e.g., from CLOCK_REALTIME rounded down explicitly). But it is difficult for users to know the coherency classes of clocks when these are not only undocumented but depend on the library. E.g., versions of time(1) used CLOCK_REALTIME, so time() was coherent with CLOCK_REALTIME but not with CLOCK_SECOND. time() was changed to use CLOCK_SECOND, so it became incoherent with CLOCK_REALTIME but coherent with FAST clocks. Then CLOCK_SECOND was changed to be in libc on most arches/hardware, so time() usually because coherent with CLOCK_REALTIME again, but incoherent with kernel CLOCK_SECOND and other FAST clocks. The incoherency is most obvious for file times, since file times normally use CLOCK_SECOND or a FAST clock. These are incoherent with user times if user times are read on a clock in a different coherency class (e.g., old time(), and current time() if user CLOCK_SECOND is not in the FAST class). The general description of the FAST and PRECISE versions still give incomplete old implementation details but no new implementation details: - FAST is documented to not do a full timecounter query and thus have accuracy 1 timer tick - actually, some implementations do do a full timecounter query - FASTness might be dominated by syscall overhead and not by timecounter query time. This is why the FAST versions shouldn't exist. When they are syscalls, they are still slow, and when they aren't syscalls they do they do do a full timecounter query. - it doesn't follow that accuracy is 1 timer tick. The authors of this man page and API should know better than to conflate accuracy with precision, and didn't make that mistake for the PRECISE versions. FAST really means the opposite of PRECISE, and sometimes significantly more efficient. The accuracy is the same as that of the PRECISE version, less the granularity of the precision. - the precision isn't 1 timer tick for kernel versions. It is of course 1 second for CLOCK_SECOND, and for the others it is tc_tick ticks. About max(1 tick, 1 msec). - the precision isn't 1 timer tick for user versions. It is of course 1 second for CLOCK_SECOND, and for the others it is the same as for the PRECISE version (should be the timecounter period, but this is impossible for TSC clocks at a few GHz so we advertise 1 nsec and depend on the accuracy being much worse than 1 nsec so the wrong precision doesn't matter). - PRECISE is documented to get the most "exact" value possible at the expense of execution time. I guess "exact" can be read as "accurate && precise", but since precision is invariant and accuracy is not under control of clock_gettime(), what the PRECISE versions actually give is correctness possibly at the expense of execution time. - it isn't documented that the PRECISE are no different from the standard versions - there is a problem with clock_getres()'s name, design and implemenation. I forget the difference between resolution and precision. Anyway, clock_getres() clearly has the same resolution as timespecs (1 nsec), and was standardized just in time for that to be not fine enough. It needs to return the precision (or is it the resolution?) of the underlying clock. We conflate resolution with precision and mostly return the precision. This is only clearly wrong for the FAST clocks. We return the same resolution for them as for the PRECISE clocks. This is probably technically correct as a resolution, but it would be more useful to return the precision. E.g, for a TSC at 4GHz, the TSC's resolution is clearly 1/4 nsec, but clock_getres() can't return that so we return 1 nsec since normal rounding down would give a physically impossible resolution. But even the PRECISE version can't read the clock every nsec, so it has a practical precision/resolution of much larger than 1 nsec. Perhaps it is 10 nsec, but we don't know what it is and just return the counter's resolution (rounded) to avoid surprising callers with values not a multiple of the resolution. We do the same for the FAST versions, although now the practical precision/resolution is 1 msec or larger. It takes unportable code to use the FAST versions, and this code must know that clock_getres() is useless for determining the practical precision, and calculate it as 1/hz times the undocumented tc_tick, or just not care if the result is correct provided it is FAST. > +.Xr clock_getcpuclockid 3 , > .Xr ctime 3 , > +.Xr pthread_getcpuclockid 3 , > .Xr timed 8 > .Sh STANDARDS > The Not very standard. There seem to be 14 extensions and only 8 listed in the extensions list (mainly ones that shouldn't exist). Bruce