Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 29 Jun 2016 17:58:05 +1000 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Konstantin Belousov <kib@freebsd.org>
Cc:        src-committers@freebsd.org, svn-src-all@freebsd.org,  svn-src-head@freebsd.org
Subject:   Re: svn commit: r302251 - head/sys/kern
Message-ID:  <20160629154400.F968@besplex.bde.org>
In-Reply-To: <201606281642.u5SGge18061528@repo.freebsd.org>
References:  <201606281642.u5SGge18061528@repo.freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 28 Jun 2016, Konstantin Belousov wrote:

> Log:
>  Do not use Giant to prevent parallel calls to CLOCK_SETTIME().  Use
>  private mtx in resettodr(), no implementation of CLOCK_SETTIME() is
>  allowed to sleep.

Thanks.

As you know, clock locking is still quite broken.  The most obvious
bugs near here are now:
- settime() still begins with splclock().  This gave fewer bugs in
   clock locking when it last worked in FreeBSD-4.  Now it is just
   a hint that the clock locking has still not been converted to use
   mutexes.
- bitrot near the end of settime() removed the splx() that should
   be paired with the splclock().
- the locking should be stronger than mutexes (or splclock()) to
   minimise the delay between deciding to set the time and actually
   setting it.  The delay between making the decision in userland
   and acquiring a lock in the kernel is hard to control, so this
   would be better handled by another API, and often is by using
   the adjtime(x) API to apply a small adjustment.

   The only safe way to set an absolute time with POSIX APIs seems
   to be sometime like:
   - clock_gettime(CLOCK_MONOTONIC, &start)
   - clock_settime(CLOCK_REALTIME, &now)
   - clock_gettime(CLOCK_MONOTONIC, &finish)
   - loop doing this while 'finish' is too much later than 'start'.

   My kernel code for reading (but not writing) does this, but without
   the loop (except it loops for an average of 0.5 seconds before
   starting to synchronize).  It just prints the difference finish -
   start.  Even though this is all at a low level, the difference is
   typically about 40 usec on a 2GHz i386 system (since reading the AT
   rtc registers is slow).

   A delay of 40 usec is too much using ntpd if you want closer to
   microseconds accuracy than milliseconds.  However, ntpd mostly
   uses adjtimex().

   The software part of setting the time can be much faster.  This
   requires ordering things carefully.  settime() now uses the
   reasonable order of tc_setclock() first.

   The AT rtc register only has a resolution and precision of 1 second,
   and sloppy programming gives it an accuracy of at most 1 second too.
   Uncontrolled delays in settime() don't matter compared with the
   larger sloppiness at lower levels.  My version fixes this for some
   cases, so that the rtc can be used to restore the software time to
   within ~40 microseconds after interrupts have been stopped by ddb.
   Similar methods would work for restoring the software time accurately
   enough to avoid steps in ntp after suspend/resume.

- settime() still does bogus Giant locking at the end (where it lost the
   splx()):

   X 	mtx_lock(&Giant);
   X 	tc_setclock(&ts);
   X 	resettodr();
   X 	mtx_unlock(&Giant);

   Giant locking for tc_setlock() is even less necessary and/or sufficient
   than it is for the resettodr().  Locking for tc_setclock() is more
   important than for the rtc, but it has no locking except for this
   Giant locking, and none of the variables accessed by it is locked by
   Giant.

- more below, with references to changes in this commit.

> Modified: head/sys/kern/subr_rtc.c
> ==============================================================================
> --- head/sys/kern/subr_rtc.c	Tue Jun 28 16:41:50 2016	(r302250)
> +++ head/sys/kern/subr_rtc.c	Tue Jun 28 16:42:40 2016	(r302251)
> @@ -62,6 +62,8 @@ __FBSDID("$FreeBSD$");
> #include <sys/kernel.h>
> #include <sys/bus.h>
> #include <sys/clock.h>
> +#include <sys/lock.h>
> +#include <sys/mutex.h>
> #include <sys/sysctl.h>
> #ifdef FFCLOCK
> #include <sys/timeffc.h>
> @@ -73,6 +75,8 @@ __FBSDID("$FreeBSD$");
> static device_t clock_dev = NULL;
> static long clock_res;
> static struct timespec clock_adj;
> +static struct mtx resettodr_lock;
> +MTX_SYSINIT(resettodr_init, &resettodr_lock, "tod2rl", MTX_DEF);
>
> /* XXX: should be kern. now, it's no longer machdep.  */
> static int disable_rtc_set;
> @@ -168,11 +172,14 @@ resettodr(void)
> 	if (disable_rtc_set || clock_dev == NULL)
> 		return;
>
> +	mtx_lock(&resettodr_lock);

This could be later (only around the hardware parts) since we are sloppy
with the delays.

> 	getnanotime(&ts);

This doesn't even try to get nanoseconds accuracy.

> 	timespecadd(&ts, &clock_adj);
> 	ts.tv_sec -= utc_offset();

Who knows what utc_offset() is locked by?  Certainly not this new locking,
since it is set in another file but the new locking is only in this file.
It still seems to be most under Giant :-(.

The change in the other part of the diff is related to this, and doesn't
seem quite right.  utc_offset() uses the variables tz_minuteswest,
wall_cmos_clock and adjkerntz:
- tz_minuteswest is set with no locking in kern_settimeofday(), just
   after the settime() call.  This last had a chance of being correct
   when the kernel was UP and not preemptible.  But it is broken in
   a more fundamental way back to at least FreeBSD-1: resettodr() is
   called _before_ updating the timezone, so it uses an out-of-date
   tz_minuteswest.
- wall_cmos_clock is still under a simple (not MPSAFE) sysctl
- adjkerntz is under a PROC sysctl, and you just changed this sysctl to
   be MPSAFE.  The sysctl calls here, so we don't want all of it
   giant-locked.  But it now does unlocked accesses to the adjkerntz
   variable, and the access to the disable_rtc_set variable is unlocked
   because we do it before acquiring the lock in this function (this
   variable is still under a simple (not MPSAFE) sysctl.

Most of the variables are ints, so access to them are atomic, but without
locking the correct order only occurs accidentally and is hard to
understand.

> 	/* XXX: We should really set all registered RTCs */
> -	if ((error = CLOCK_SETTIME(clock_dev, &ts)) != 0)
> +	error = CLOCK_SETTIME(clock_dev, &ts);
> +	mtx_unlock(&resettodr_lock);
> +	if (error != 0)
> 		printf("warning: clock_settime failed (%d), time-of-day clock "
> 		    "not adjusted to system time\n", error);
> }

The lock needs to cover all high-level accesses to the rtc and related
variables to avoid races, and a bit more to minimise delays.  In particular,
inittodr() should be under this lock, and there is a problem with
"show rtc" in ddb.   ddb commands can't be under any lock since that gives
deadlock, but "show rtc" uses rtcin() which already uses RTC_LOCK.
RTC_LOCK locks the rtc at a lower level, but that is not enough for either
resettodr() or inittodr() or a combination.  This commit fixes the problem
for resettodr() only.  This is not a large problem in practice because it
is rare for these functions to be called concurrently, and it is only easy
to arrange for concurrent calls to resettodr(), but that takes root
privilege and in normal operation settime() is only called once from userland
(for initialization) since ntpd mostly uses adjtimex().

It might be useful to have an MI "show todr" ddb command.  This would be
easier to lock.  The MD "show rtc" command is still useful for showing
status registers.  BTW, subr_rtc.c is misnamed.  rtcs are MD things, and
subr_rtc mostly handles higher level things which have the even worse, but
historical name "todr".  Since the MD interfaces have the even worse still
name CLOCK_*TIME, the only rtc names in subr_rtc.c is just one due to
another namespace error -- disable_rtc_set.  The only other public function
in subr_rtc.c is another namespace error -- clock_register(), which might
belong in subr_clock.c.  The namespace error utc_offset() is in subr_clock.c.

"show rtc" now deadlocks when rtc clock is single stepped through.  To
fix this, use the usual kdb_active hack or mtx_trylock(), and make rtcin()
reentrant when it can't acquire the lock.  Then in resettodr() and
inittodr(), "show rtc" won't destroy the state.  Just stopping resettodr()
or inittodr() (or any time-related function) in ddb breaks the timing like
preemption does, but usually for longer.  If there is any error handling
for the preempted casse then it works for the ddb case too.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20160629154400.F968>