Date: Thu, 30 Jun 2016 21:01:06 +0300 From: Konstantin Belousov <kostikbel@gmail.com> To: Bruce Evans <brde@optusnet.com.au> Cc: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: Re: svn commit: r302252 - head/sys/kern Message-ID: <20160630180106.GU38613@kib.kiev.ua> In-Reply-To: <20160701005401.Q1084@besplex.bde.org> References: <201606281643.u5SGhNsi061606@repo.freebsd.org> <20160629175917.O968@besplex.bde.org> <20160629145443.GG38613@kib.kiev.ua> <20160629153233.GI38613@kib.kiev.ua> <20160630040123.F791@besplex.bde.org> <20160629211953.GK38613@kib.kiev.ua> <20160701005401.Q1084@besplex.bde.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Jul 01, 2016 at 03:30:41AM +1000, Bruce Evans wrote: > On Thu, 30 Jun 2016, Konstantin Belousov wrote: > > > On Thu, Jun 30, 2016 at 05:47:39AM +1000, Bruce Evans wrote: > >> On Wed, 29 Jun 2016, Konstantin Belousov wrote: > >> > >>> On Wed, Jun 29, 2016 at 05:54:43PM +0300, Konstantin Belousov wrote: > >>>> As an additional, but probably excessive measure, invocation of > >>>> tc_windup() from hardclock could ask the setclock() to re-execute > >>>> tc_windup() after its call is done. > >> > >> I don't quite understand this. hardclock -> tc_windup() normally doesn't > >> have interference from tc_setclock(). > > In what sense it does not have interference ? tc_windup() may be executed > > from hardclock/hardclock_cnt (low half) and concurrently from setclock() > > (top half). And this is exactly what we want to avoid, isn't it ? > > Ues, but you seemed to be saying that hardclock should schedule or call > tc_setclock(). That makes no sense. So I thought that you meant > tc_setclock() calling or scheduling tc_windup() in a different way than > it does now (it just calls it now). More precisely, I mean that hardclock, when not able to execute tc_windup() due to the top-half currently executing tc_windup(), may ask top-half to re-execute tc_windup() (and not tc_setclock()) once the current invocation finishes. > > >>> ... > >>> diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c > >>> index 0f015b3..a5776d2 100644 > >>> --- a/sys/kern/kern_tc.c > >>> +++ b/sys/kern/kern_tc.c > >>> ... > >>> @@ -1237,6 +1238,7 @@ tc_setclock(struct timespec *ts) > >>> struct timespec tbef, taft; > >>> struct bintime bt, bt2; > >>> > >>> + critical_enter(); > >>> cpu_tick_calibrate(1); > >>> nanotime(&tbef); > >>> timespec2bintime(ts, &bt); > >>> @@ -1247,8 +1249,10 @@ tc_setclock(struct timespec *ts) > >>> bintime2timeval(&bt, &boottime); > >> > >> boottime is one of the critical variables. Before this, there are 2 > >> accesses to boottimebin. The Giant locking that you just removed gave > >> only some protection here. It prevented concurrent acccess by > >> malicious/stress-testing code doing settime(). It didn't prevent > >> nanotime() and friends seeing boottime in an in-between state. Oops, > >> mutex locking won't fix that either. I think the update of these > >> variables needs to be moved into tc_windup() where it will hopefully > >> automatically be protected by the generation count, like the timehands > >> offsets already are. > > I do not understand how that would help. Do you mean to move the vars > > into current timehands ? > > I hoped that it would just work without putting it in timehands, but > now I don't see a better way that putting it in timehands. The space > wastage is not too bad, and we might need it in timehands for other > reasons. One is that the actual boot time is invariant, but FreeBSD > varies it; a global is needed for the actual boot time and then it is > natural to put the fudged boot time in timehands (it is not a boot > time, but an offset which is sometimes close to the boot time). > > Note that boottimebin is also updated in tc_windup() (for leap seconds). > The new mutex doesn't protect it there. This is the next of other > reasons to put it in timehands. Yes. > > >*... > >>> + for (;;) { > >>> + if (atomic_cmpset_int(&tc_windup_lock, 0, 1)) { > >>> + atomic_thread_fence_acq(); > >>> + tc_windup_locked(); > >>> + atomic_store_rel_int(&tc_windup_lock, 0); > >>> + break; > >>> + } else if (!top_call) { > >>> + break; > >>> + } > >>> + } > >>> } > >> > >> I like to write optimized locking like that, but don't see why you > >> want it here. > > Because I do not want the fast clock interrupt handler to wait on lock, > > which neccessary becomes spinlock. The ntp_lock is bad enough already. > > What I coded above is not real lock. In contention case it bails out, > > so there is no blocking. In particular, witness is not relevant to it. > > I think this is too complicated. Spinlocks aren't very expensive, and > hardclock() already has some. I think it has several. It has at least > one silly one -- in hardclock_cpu(), a thread locking to lock a single > flags update. hardclock() and tc_windup() just aren't called enough > for the locking overhead or contention to matter, even at too-large hz. Spinlocks are quite expensive. They delay interrupt delivery. I do not want the fast interrupt handler to be delayed due to the top-half of the kernel executing settimeofday(2) in loop. > > > ... > > diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c > > index 0f015b3..7bb767c 100644 > > --- a/sys/kern/kern_tc.c > > +++ b/sys/kern/kern_tc.c > >... > > +static struct mtx tc_setclock_mtx; > > +MTX_SYSINIT(tc_setclock_init, &tc_setclock_mtx, "tcsetc", MTX_DEF); > > I thought it would be a spinlock. The sleep lock is better if you can > make ot work, but this is complicated. For mutual exclusion of invocation of tc_setclock(), I do not see why sleepable lock would not suffice. > > > + > > /* > > * Step our concept of UTC. This is done by modifying our estimate of > > * when we booted. > > - * XXX: not locked. > > */ > > Now "not properly locked" -). > > > void > > tc_setclock(struct timespec *ts) > > @@ -1237,6 +1240,8 @@ tc_setclock(struct timespec *ts) > > struct timespec tbef, taft; > > struct bintime bt, bt2; > > > > + mtx_lock(&tc_setclock_mtx); > > + critical_enter(); > > I thought at first that this critical section was redundant, since a > spinlock would already have it. It is not spinlock. > > > cpu_tick_calibrate(1); > > nanotime(&tbef); > > timespec2bintime(ts, &bt); > > @@ -1247,8 +1252,10 @@ tc_setclock(struct timespec *ts) > > bintime2timeval(&bt, &boottime); > > The leap second update to boottimebin in tc_windup() races with the > update to boottimebin here. This can be fixed partially by using the > same locking there. Then the locking would have to be a spin mutex > (or special). This still doesn't fix non-atomic accesses to boottimebin > in nanotime() and friends. Putting it in timehands seems to be best for > fixing that. Yes, timehands for bootimebin should be the solution, but not in the scope of this patch. I will work on this right after the current changeset lands in svn. > boottime is easier to fix (and doesn't need to be in timehands). It > is only used to copy it out in a historical sysctl. The locking for > that is buggy. We convert boottimebin to boottime here (now with locking) > but in the sysctl we copy out boottime non-atomically. Do we need boottime at all ? Can't it be calculated from boottimebin on the sysctl invocation ?
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20160630180106.GU38613>