From nobody Thu May 9 23:31:34 2024 X-Original-To: freebsd-hackers@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4Vb7ZQ4ZKtz5KHpw for ; Thu, 09 May 2024 23:31:42 +0000 (UTC) (envelope-from jeffpc@josefsipek.net) Received: from smtp.jeffnet.31bits.net (josefsipek.net [71.174.62.3]) by mx1.freebsd.org (Postfix) with ESMTP id 4Vb7ZP6R8Rz47L6 for ; Thu, 9 May 2024 23:31:41 +0000 (UTC) (envelope-from jeffpc@josefsipek.net) Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=pass (mx1.freebsd.org: domain of jeffpc@josefsipek.net designates 71.174.62.3 as permitted sender) smtp.mailfrom=jeffpc@josefsipek.net Received: from satis (satis [172.27.0.85]) by smtp.jeffnet.31bits.net (Postfix) with ESMTPSA id 530A62E56D for ; Thu, 9 May 2024 23:31:35 +0000 (UTC) Date: Thu, 9 May 2024 19:31:34 -0400 From: Josef 'Jeff' Sipek To: freebsd-hackers@freebsd.org Subject: Precision Hardware Clocks Message-ID: List-Id: Technical discussions relating to FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-hackers List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-hackers@FreeBSD.org MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Spamd-Bar: -- X-Spamd-Result: default: False [-2.60 / 15.00]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_MEDIUM(-1.00)[-0.999]; NEURAL_HAM_SHORT(-1.00)[-0.999]; MID_RHS_NOT_FQDN(0.50)[]; R_SPF_ALLOW(-0.20)[+mx]; RCVD_NO_TLS_LAST(0.10)[]; ONCE_RECEIVED(0.10)[]; MIME_GOOD(-0.10)[text/plain]; RCVD_COUNT_ONE(0.00)[1]; MIME_TRACE(0.00)[0:+]; PREVIOUSLY_DELIVERED(0.00)[freebsd-hackers@freebsd.org]; FROM_EQ_ENVFROM(0.00)[]; ASN(0.00)[asn:701, ipnet:71.174.0.0/16, country:US]; ARC_NA(0.00)[]; RCPT_COUNT_ONE(0.00)[1]; RCVD_VIA_SMTP_AUTH(0.00)[]; MISSING_XM_UA(0.00)[]; TO_DN_NONE(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; DMARC_NA(0.00)[josefsipek.net]; R_DKIM_NA(0.00)[]; MLMMJ_DEST(0.00)[freebsd-hackers@freebsd.org]; FROM_HAS_DN(0.00)[] X-Rspamd-Queue-Id: 4Vb7ZP6R8Rz47L6 Hello all, I've been playing with the idea of extending the kernel to expose various clock sources to userspace via a character devices. (Yesterday's thread about the OCP TAP Time Card nudged me to send this out sooner than I planned. :) ) The code is *very* hacky and full of TODOs & FIXMEs, but I thought I'd share it now. What I'm calling a 'precision hardware clock' (PHC for short) is conceptually some piece of hardware which can provide the consumer a sense of time passing. Roughly speaking, there are two types of precision hardware clocks - those that return the current time using some defined timescale (e.g., kvmclock) and those that are simple oscillators with counters (e.g., many e1000e devices). My aim is to support both. My initial goal is to provide a *read-only* access to PHCs as this is sufficient to make use of them for stabilizing the system clock. That is, an application can only query them for the current time. Eventually, I think it'd make sense to allow *setting* PHCs as well. The devices that return the current time are fairly straight forward to work with. The ioctl simply calls a device specific method and forwards the result to the caller. The counter-type devices are more complicated to support. In my code I took the approach that's very similar to the timecounter code in the kernel. My first attempt actually tried to extend timecounters but that resulted in a lot of additional computation being done in hardclock regardless of whether or not the additional clocks were in use. That didn't feel right. [1] My current code borrows the timecounter idea (and some code) of extending the hardware counter in software. The overflow check is done via a per-devices callout that's scheduled based for an interval based on the oscillator's frequency and the counter's width. (For debugging, I cap it at 10s max interval.) Regardless of which type of PHC it is, the ioctl caller gets what amounts to a reading. Ideally, the two correspond to the same instant, but there may be some error due to hardware limitations. [2] Because there is a lot of hardware that doesn't provide a way to capture these correlated timestamps, a "capture many readings" ioctl is a useful addition. This ioctl returns a set of interleaved PHC and system clock readings, which lets the application (e.g., chrony) do the appropriate filtering to remove noise. In addition to adding the PHC code to core kernel, I hacked up the if_em driver to start the 25MHz timekeeping counters on 82574 devices and register with the PHC code. Finally, I hacked up chrony's PHC refclock driver to make use of the "get timestamp pair" ioctl. I ran this code on my test box with two 82574 NICs with both registered as chrony refclocks [3] for a while. Unsurprisingly, the 82574 oscillators are not that accurate but they are reasonably stable. (I posted histograms and allan deviation plots on mastodon [4]. Since the system's oscillator is in no way special, it is a bit silly to read too much into the graphs. However, I'd argue that it still shows that the 82574 refclocks were reasonably good and would likely help in real world scenarios [5].) You can find my patches can be found at: https://www.josefsipek.net/freebsd/phc-v1/ There are 3 patches: 1. chrony.patch modifies chronyd to use the PHC ioctls 2. fbsd-phc.patch adds the generic PHC code 3. fbsd-em.patch modifies if_em to register 82574 timekeeping counter with PHC In addition to cleaning up and generally improving the existing patches, I hope to implement the bit of code that wires up KVM's KVM_HC_CLOCK_PAIRING hypercall as a PHC. While 82574 provides a counter-type PHC, this kvm PHC would be the absolute time-type PHC. Support for kvm PHC would allow FreeBSD guests to sync *very* accurately to host's system clock. I also have an incomplete patch that adds support for clock_gettime(3) using PHC fds as clockid_t values, but since it isn't complete I'll keep it to myself for now :) So, that's what I've been up to. As I said in the beginning, I wanted to get more of this done, but I think it makes sense for me to let others know about my code now. I plan to continue hacking away on this, but if people have opinions about any of this, I'd love to hear them. It really pains me that there is so much duplication between the PHC and timecounter code, but the current tc_windup code runs in a rather special context (hardclock) and having it process *all* devices regardless of use would increase its runtime quite a bit. I've been thinking about trying to move some of the timecounter and PHC code into a generic set of helpers or try to reorganize kern_tc.c to fold the PHC login into it sanely, but that's currently very far down the todo list. To summarize, the goals/non-goals for this work are: Goals: * read-only interface to various precision hardware clocks (PHCs) * support for both absolute time and counter-only PHCs * ability to use software like chrony to stabilize system clocks Non-goals/future work: * adjusting PHCs * support for cross-timestamping techniques (like Intel's ART) * support for if_em PTP packet timestamping * external pin timestamping support Thanks for reading this far. Let me know if you have any questions, suggestions, etc. Jeff. [1] I actually ran for about a week with a e1000e card in my box providing timekeeping by selecting it via the kern.timecounter sysctls. It worked and was quite amusing to see, but the additional complexity in tc_windup made it unworkable. [2] At some point, Intel added the Always Running Timer (ART) which can be used by devices to get timestamps that are easily convertible to TSC readings. Support for this is part of future work. [3] The chrony config was the following. I ran chronyd with the -x flag to prevent it from trying to set the clock. The system clock was disciplined with ptp2d, which was syncing to ptp2d running on the same server that chrony used for NTP. Note that the refclocks are marked as 'pps local', meaning that they are to be used only as a frequency source. ('pps' means that the refclock isn't reporting UTC, and 'local' means that the clock isn't aligned to UTC seconds) server iburst minpoll 0 maxpoll 4 xleave refclock PHC /dev/phc-em0 refid EM0 pps local refclock PHC /dev/phc-em1 refid EM1 pps local logdir /tmp log measurements statistics tracking refclocks selection rtc logbanner 0 [4] https://mastodon.radio/@jeffpc/112230743393202103 [5] A huge problem with NTP is that it suffers greatly from any network latency jitter and asymmetrical routing. Having a stable reference clock (even if the stability is short-term only) helps NTP software quite a bit.