From owner-svn-src-all@freebsd.org Wed Jun 29 10:08:17 2016 Return-Path: Delivered-To: svn-src-all@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id F0E4CB86F8C; Wed, 29 Jun 2016 10:08:17 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail106.syd.optusnet.com.au (mail106.syd.optusnet.com.au [211.29.132.42]) by mx1.freebsd.org (Postfix) with ESMTP id A0F532752; Wed, 29 Jun 2016 10:08:17 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from besplex.bde.org (c110-21-100-149.carlnfd1.nsw.optusnet.com.au [110.21.100.149]) by mail106.syd.optusnet.com.au (Postfix) with ESMTPS id A08DC3D031F; Wed, 29 Jun 2016 19:39:45 +1000 (AEST) Date: Wed, 29 Jun 2016 19:39:45 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Konstantin Belousov cc: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: Re: svn commit: r302252 - head/sys/kern In-Reply-To: <201606281643.u5SGhNsi061606@repo.freebsd.org> Message-ID: <20160629175917.O968@besplex.bde.org> References: <201606281643.u5SGhNsi061606@repo.freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.1 cv=EfU1O6SC c=1 sm=1 tr=0 a=XDAe9YG+7EcdVXYrgT+/UQ==:117 a=XDAe9YG+7EcdVXYrgT+/UQ==:17 a=L9H7d07YOLsA:10 a=9cW_t1CCXrUA:10 a=s5jvgZ67dGcA:10 a=kj9zAlcOel0A:10 a=YBTR0DOa03FoEZgt01gA:9 a=To-spq41Ae8Nz7We:21 a=wCAWHQOgn0QvNm88:21 a=CjuIK1q_8ugA:10 X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Jun 2016 10:08:18 -0000 On Tue, 28 Jun 2016, Konstantin Belousov wrote: > Log: > Currently the ntptime code and resettodr() are Giant-locked. In > particular, the Giant is supposed to protect against parallel > ntp_adjtime(2) invocations. But, for instance, sys_ntp_adjtime() does > copyout(9) under Giant and then examines time_status to return syscall > result. Since copyout(9) could sleep, the syscall result might be > inconsistent. Thanks. As you know, there are still many errors near here. This is more complicated than for settime() vs resettodr() so I won't try to remember them all. Some are: - kern_ntptime.c still says that "all routines must run priority splclock or higher". Since spls are null, this just points to missing locking. The pointers are quite weak since it only calls splclock() once internally and explicit splclock() calls have been removed from all callers. There was only this one call in kern_ntptime.c as far back as FreeBSD-3. Then the syscalls didn't have any locking, so I think there were lots of races even with UP non-preemptible kernels. > Another and more important issue is that if PPS is configured, > hardpps(9) is executed without any protection against the parallel > top-level code invocation. Potentially, this may result in the > inconsistent state of the ntptime state variables, but I cannot say > how serious such distortion is. The non-functional splclock() call in It is surprising how unserious it is. > sys_ntp_adjtime() protected against clock interrupts calling hardpps() > in the pre-SMP era. > > Modernize the locking. A mutex protects ntptime data. Due to the > hardpps() KPI legitimately serving from the interrupt filters (and > e.g. uart(4) does call it from filter), the lock cannot be sleepable > mutex if PPS_SYNC is defined. Otherwise, use normal sleepable mutex > to reduce interrupt latency. The largest locking errors that I knew about were for pps and ntp_update_second(). I still don't trust calling time code from fast interrupt handlers (now misnamed filters) and don't allow it in my version. So my hardclock(), statclock() and profclock() interrupt handlers are not fast, although this adds large overheads (nearly 1% at low hz and np profileing). I have a pps handler in the RTC interrupt handler. I didn't get around to fixing the pps locking, but since the handler is not fast the fixes using sleep locks would be easier. I have a pps handler in sio. sio normally uses fast interrupts, but I never use pps for it so I didn't need to fix this. > Modified: head/sys/kern/kern_ntptime.c > ============================================================================== > --- head/sys/kern/kern_ntptime.c Tue Jun 28 16:42:40 2016 (r302251) > +++ head/sys/kern/kern_ntptime.c Tue Jun 28 16:43:23 2016 (r302252) > @@ -162,6 +162,30 @@ static l_fp time_adj; /* tick adjust ( > > static int64_t time_adjtime; /* correction from adjtime(2) (usec) */ > > +static struct mtx ntpadj_lock; > +MTX_SYSINIT(ntpadj, &ntpadj_lock, "ntpadj", > +#ifdef PPS_SYNC > + MTX_SPIN > +#else > + MTX_DEF > +#endif > +); This needs to be a spinlock in all cases, since ntp_update_second() needs to be locked and ntp_update_second() is called from tc_windup() is usually called from a fast interrupt handler. ntp_update_second() is still unlocked. It accesses lots of variables, so it obviously needs locking. E.g., its very first access is time_maxerror += MAXFREQ / 1000. This has the usual non-atomicity for a read-modify write. This obiously races with the locked access on the 28th linke of sys_ntp_adjtime(). Only the latter is locked now. > + > +/* > + * When PPS_SYNC is defined, hardpps() function is provided which can > + * be legitimately called from interrupt filters. Due to this, use > + * spinlock for ntptime state protection, otherwise sleepable mutex is > + * adequate. > + */ > +#ifdef PPS_SYNC > +#define NTPADJ_LOCK() mtx_lock_spin(&ntpadj_lock) > +#define NTPADJ_UNLOCK() mtx_unlock_spin(&ntpadj_lock) > +#else > +#define NTPADJ_LOCK() mtx_lock(&ntpadj_lock) > +#define NTPADJ_UNLOCK() mtx_unlock(&ntpadj_lock) > +#endif > +#define NTPADJ_ASSERT_LOCKED() mtx_assert(&ntpadj_lock, MA_OWNED) *ADJ* and *adj* are not good names, since much more than ntp_adj*() needs to be locked. Probably the locking should be shared with kern_tc.c. tc_windup() still uses only fancy time-domaind/atomic-op locking. This mostly works, but is quite broken by calling tc_windup() from places other than hardclock(). The most problematic other place is from tc_setclock(). That is unlocked, except accidentally by callers with only Giant locking, so it races with the fast interrupt handler. tc_windup() doesn't need the fancy locking for efficiency -- it only needs it to be consistent with binuptime() and friends. So it may as well use the same lock as ntp. Its lock must be around the whole function to protect it from tc_setclock(). Then the same lock works for ntp_update_second() and pps. Any lock that is only acquired once per second is not worth optimizing. Hopefully the same for ntp's other locks. ntp syscalls should only be frequent under unusual/malicious loads, so they shouldn't cause much contention with the tc_windup() lock. > @@ -203,11 +227,12 @@ static long pps_errcnt; /* calibration > static void ntp_init(void); > static void hardupdate(long offset); > static void ntp_gettime1(struct ntptimeval *ntvp); > -static int ntp_is_time_error(void); > +static bool ntp_is_time_error(int tsl); > > -static int > -ntp_is_time_error(void) > +static bool > +ntp_is_time_error(int tsl) > { > + Is it worth changing this to make a single variable less volatile? bool shoudn't be used in old portable code. > ... > @@ -291,14 +315,17 @@ ntp_sysctl(SYSCTL_HANDLER_ARGS) > { > struct ntptimeval ntv; /* temporary structure */ > > + NTPADJ_LOCK(); > ntp_gettime1(&ntv); > + NTPADJ_UNLOCK(); > > return (sysctl_handle_opaque(oidp, &ntv, sizeof(ntv), req)); > } > #ifdef PPS_SYNC > SYSCTL_INT(_kern_ntp_pll, OID_AUTO, pps_shiftmax, CTLFLAG_RW, > @@ -308,10 +335,12 @@ SYSCTL_INT(_kern_ntp_pll, OID_AUTO, pps_ > SYSCTL_LONG(_kern_ntp_pll, OID_AUTO, time_monitor, CTLFLAG_RD, > &time_monitor, 0, "Last time offset scaled (ns)"); > > -SYSCTL_OPAQUE(_kern_ntp_pll, OID_AUTO, pps_freq, CTLFLAG_RD, > - &pps_freq, sizeof(pps_freq), "I", "Scaled frequency offset (ns/sec)"); > -SYSCTL_OPAQUE(_kern_ntp_pll, OID_AUTO, time_freq, CTLFLAG_RD, > - &time_freq, sizeof(time_freq), "I", "Frequency offset (ns/sec)"); > +SYSCTL_S64(_kern_ntp_pll, OID_AUTO, pps_freq, CTLFLAG_RD | CTLFLAG_MPSAFE, > + &pps_freq, 0, > + "Scaled frequency offset (ns/sec)"); > +SYSCTL_S64(_kern_ntp_pll, OID_AUTO, time_freq, CTLFLAG_RD | CTLFLAG_MPSAFE, > + &time_freq, 0, > + "Frequency offset (ns/sec)"); > #endif These are not really MPSAFE, and OPAQUE was better than S64 since it doesn't hard-code the size. None of the simple sysctl handlers actually gives atomic/coherent data. They just attempt it, and have comments ad nauseum saying this even for the int8_t case where the access is surely (?) atomic. For OPAQUE 64 bits and S64, the accesses are equally non-atomic for 32-bit arches. pps_freq and time_freq are volatile, so it is technically necessary to use a SYSCTL_PROC() to copy them to a local variable with locking as above. S64 does give better printing that OPAQUE. My version adds sysctls for some more variables (mainly time_offset and time_adj). I still use OPAQUE, and with this, at least i386 the variables are printed confusingly as 2 decimal intergers. Bruce