From owner-svn-src-all@freebsd.org Wed Jun 29 07:58:18 2016 Return-Path: Delivered-To: svn-src-all@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0F181B86B1E; Wed, 29 Jun 2016 07:58:18 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail107.syd.optusnet.com.au (mail107.syd.optusnet.com.au [211.29.132.53]) by mx1.freebsd.org (Postfix) with ESMTP id B2B8429E2; Wed, 29 Jun 2016 07:58:17 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from besplex.bde.org (c110-21-100-149.carlnfd1.nsw.optusnet.com.au [110.21.100.149]) by mail107.syd.optusnet.com.au (Postfix) with ESMTPS id 95BD5D4FC80; Wed, 29 Jun 2016 17:58:05 +1000 (AEST) Date: Wed, 29 Jun 2016 17:58:05 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Konstantin Belousov cc: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: Re: svn commit: r302251 - head/sys/kern In-Reply-To: <201606281642.u5SGge18061528@repo.freebsd.org> Message-ID: <20160629154400.F968@besplex.bde.org> References: <201606281642.u5SGge18061528@repo.freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.1 cv=OtmysHLt c=1 sm=1 tr=0 a=XDAe9YG+7EcdVXYrgT+/UQ==:117 a=XDAe9YG+7EcdVXYrgT+/UQ==:17 a=L9H7d07YOLsA:10 a=9cW_t1CCXrUA:10 a=s5jvgZ67dGcA:10 a=kj9zAlcOel0A:10 a=RdlyHFHmpbaWVazvlCcA:9 a=CjuIK1q_8ugA:10 X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Jun 2016 07:58:18 -0000 On Tue, 28 Jun 2016, Konstantin Belousov wrote: > Log: > Do not use Giant to prevent parallel calls to CLOCK_SETTIME(). Use > private mtx in resettodr(), no implementation of CLOCK_SETTIME() is > allowed to sleep. Thanks. As you know, clock locking is still quite broken. The most obvious bugs near here are now: - settime() still begins with splclock(). This gave fewer bugs in clock locking when it last worked in FreeBSD-4. Now it is just a hint that the clock locking has still not been converted to use mutexes. - bitrot near the end of settime() removed the splx() that should be paired with the splclock(). - the locking should be stronger than mutexes (or splclock()) to minimise the delay between deciding to set the time and actually setting it. The delay between making the decision in userland and acquiring a lock in the kernel is hard to control, so this would be better handled by another API, and often is by using the adjtime(x) API to apply a small adjustment. The only safe way to set an absolute time with POSIX APIs seems to be sometime like: - clock_gettime(CLOCK_MONOTONIC, &start) - clock_settime(CLOCK_REALTIME, &now) - clock_gettime(CLOCK_MONOTONIC, &finish) - loop doing this while 'finish' is too much later than 'start'. My kernel code for reading (but not writing) does this, but without the loop (except it loops for an average of 0.5 seconds before starting to synchronize). It just prints the difference finish - start. Even though this is all at a low level, the difference is typically about 40 usec on a 2GHz i386 system (since reading the AT rtc registers is slow). A delay of 40 usec is too much using ntpd if you want closer to microseconds accuracy than milliseconds. However, ntpd mostly uses adjtimex(). The software part of setting the time can be much faster. This requires ordering things carefully. settime() now uses the reasonable order of tc_setclock() first. The AT rtc register only has a resolution and precision of 1 second, and sloppy programming gives it an accuracy of at most 1 second too. Uncontrolled delays in settime() don't matter compared with the larger sloppiness at lower levels. My version fixes this for some cases, so that the rtc can be used to restore the software time to within ~40 microseconds after interrupts have been stopped by ddb. Similar methods would work for restoring the software time accurately enough to avoid steps in ntp after suspend/resume. - settime() still does bogus Giant locking at the end (where it lost the splx()): X mtx_lock(&Giant); X tc_setclock(&ts); X resettodr(); X mtx_unlock(&Giant); Giant locking for tc_setlock() is even less necessary and/or sufficient than it is for the resettodr(). Locking for tc_setclock() is more important than for the rtc, but it has no locking except for this Giant locking, and none of the variables accessed by it is locked by Giant. - more below, with references to changes in this commit. > Modified: head/sys/kern/subr_rtc.c > ============================================================================== > --- head/sys/kern/subr_rtc.c Tue Jun 28 16:41:50 2016 (r302250) > +++ head/sys/kern/subr_rtc.c Tue Jun 28 16:42:40 2016 (r302251) > @@ -62,6 +62,8 @@ __FBSDID("$FreeBSD$"); > #include > #include > #include > +#include > +#include > #include > #ifdef FFCLOCK > #include > @@ -73,6 +75,8 @@ __FBSDID("$FreeBSD$"); > static device_t clock_dev = NULL; > static long clock_res; > static struct timespec clock_adj; > +static struct mtx resettodr_lock; > +MTX_SYSINIT(resettodr_init, &resettodr_lock, "tod2rl", MTX_DEF); > > /* XXX: should be kern. now, it's no longer machdep. */ > static int disable_rtc_set; > @@ -168,11 +172,14 @@ resettodr(void) > if (disable_rtc_set || clock_dev == NULL) > return; > > + mtx_lock(&resettodr_lock); This could be later (only around the hardware parts) since we are sloppy with the delays. > getnanotime(&ts); This doesn't even try to get nanoseconds accuracy. > timespecadd(&ts, &clock_adj); > ts.tv_sec -= utc_offset(); Who knows what utc_offset() is locked by? Certainly not this new locking, since it is set in another file but the new locking is only in this file. It still seems to be most under Giant :-(. The change in the other part of the diff is related to this, and doesn't seem quite right. utc_offset() uses the variables tz_minuteswest, wall_cmos_clock and adjkerntz: - tz_minuteswest is set with no locking in kern_settimeofday(), just after the settime() call. This last had a chance of being correct when the kernel was UP and not preemptible. But it is broken in a more fundamental way back to at least FreeBSD-1: resettodr() is called _before_ updating the timezone, so it uses an out-of-date tz_minuteswest. - wall_cmos_clock is still under a simple (not MPSAFE) sysctl - adjkerntz is under a PROC sysctl, and you just changed this sysctl to be MPSAFE. The sysctl calls here, so we don't want all of it giant-locked. But it now does unlocked accesses to the adjkerntz variable, and the access to the disable_rtc_set variable is unlocked because we do it before acquiring the lock in this function (this variable is still under a simple (not MPSAFE) sysctl. Most of the variables are ints, so access to them are atomic, but without locking the correct order only occurs accidentally and is hard to understand. > /* XXX: We should really set all registered RTCs */ > - if ((error = CLOCK_SETTIME(clock_dev, &ts)) != 0) > + error = CLOCK_SETTIME(clock_dev, &ts); > + mtx_unlock(&resettodr_lock); > + if (error != 0) > printf("warning: clock_settime failed (%d), time-of-day clock " > "not adjusted to system time\n", error); > } The lock needs to cover all high-level accesses to the rtc and related variables to avoid races, and a bit more to minimise delays. In particular, inittodr() should be under this lock, and there is a problem with "show rtc" in ddb. ddb commands can't be under any lock since that gives deadlock, but "show rtc" uses rtcin() which already uses RTC_LOCK. RTC_LOCK locks the rtc at a lower level, but that is not enough for either resettodr() or inittodr() or a combination. This commit fixes the problem for resettodr() only. This is not a large problem in practice because it is rare for these functions to be called concurrently, and it is only easy to arrange for concurrent calls to resettodr(), but that takes root privilege and in normal operation settime() is only called once from userland (for initialization) since ntpd mostly uses adjtimex(). It might be useful to have an MI "show todr" ddb command. This would be easier to lock. The MD "show rtc" command is still useful for showing status registers. BTW, subr_rtc.c is misnamed. rtcs are MD things, and subr_rtc mostly handles higher level things which have the even worse, but historical name "todr". Since the MD interfaces have the even worse still name CLOCK_*TIME, the only rtc names in subr_rtc.c is just one due to another namespace error -- disable_rtc_set. The only other public function in subr_rtc.c is another namespace error -- clock_register(), which might belong in subr_clock.c. The namespace error utc_offset() is in subr_clock.c. "show rtc" now deadlocks when rtc clock is single stepped through. To fix this, use the usual kdb_active hack or mtx_trylock(), and make rtcin() reentrant when it can't acquire the lock. Then in resettodr() and inittodr(), "show rtc" won't destroy the state. Just stopping resettodr() or inittodr() (or any time-related function) in ddb breaks the timing like preemption does, but usually for longer. If there is any error handling for the preempted casse then it works for the ddb case too. Bruce