Date: Sat, 2 Jul 2016 17:51:34 +0300 From: Konstantin Belousov <kostikbel@gmail.com> To: Bruce Evans <brde@optusnet.com.au> Cc: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: Re: svn commit: r302252 - head/sys/kern Message-ID: <20160702145134.GI38613@kib.kiev.ua> In-Reply-To: <20160702153817.O1458@besplex.bde.org> References: <20160629145443.GG38613@kib.kiev.ua> <20160629153233.GI38613@kib.kiev.ua> <20160630040123.F791@besplex.bde.org> <20160629211953.GK38613@kib.kiev.ua> <20160701005401.Q1084@besplex.bde.org> <20160630180106.GU38613@kib.kiev.ua> <20160701031549.GV38613@kib.kiev.ua> <20160701185743.Q1600@besplex.bde.org> <20160701142516.GW38613@kib.kiev.ua> <20160702153817.O1458@besplex.bde.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, Jul 02, 2016 at 06:04:43PM +1000, Bruce Evans wrote: > On Fri, 1 Jul 2016, Konstantin Belousov wrote: > > > On Fri, Jul 01, 2016 at 08:39:48PM +1000, Bruce Evans wrote: > >> It seems simple and clean enough, but is too much during a re freeze. > >> > >> I will only make some minor comments about style. > > Well, it is not only about style. If you have no more comments, I will > > ask for testing. The patch is about fixing bugs, although in somewhat > > extended scope, so I think it is still fine as the things do not explode. > > What about FFCLOCK? That is hard to test. I will only ensure that it compiles. > > > I added the stats to the patch, it is not that intrusive actually. > > > > I still do not see why/do not want to use spinlock for the tc_windup() > > exclusion. Patch is at the end of the message. > > It subverts the mutex/witness method for no good reason. You can use > mtx_trylock() for the conditional locking. Not so good reasons for > doing this are to micro-optimize and to avoid hard-disabling interrupts > on the current CPU). But here optimization is not important and > hard-disabling interrupts is a feature. Mutexes are only slightly > slower, except with debugging options they are much slower but give > more features. A reason not to use what you described is that our spinlocks lack try method. I implemented non-recursive trylock_spin. > > >>> diff --git a/sys/compat/linprocfs/linprocfs.c b/sys/compat/linprocfs/linprocfs.c > >>> index 56b2ade..a0dce47 100644 > >>> --- a/sys/compat/linprocfs/linprocfs.c > >>> +++ b/sys/compat/linprocfs/linprocfs.c > >>> @@ -447,9 +447,11 @@ linprocfs_dostat(PFS_FILL_ARGS) > >>> struct pcpu *pcpu; > >>> long cp_time[CPUSTATES]; > >>> long *cp; > >>> + struct timeval boottime; > >>> int i; > >>> > >>> read_cpu_time(cp_time); > >>> + getboottime(&boottime); > >> > >> This is used surprisingly often by too many subsystems. With the value still > >> broken so that locking it doesn't help much, I would leave it as a global. > > I prefer to keep the KPI consistent. > > Not changing the KPI means keeping boottime as a global. getboottime() is > a good KPI for accessing/converting a volatile implementation detail, but > a correctly implemented boottime wouldn't be volatile and most uses of > boottime are apparently wrong. Most uses are apparently to convert from > monotic time to real time. For that, the KPI should be a conversion function. I have to stop somewhere with this patch. In particular, I decided to not do the sweeping pass over random subsystems doing bugfixes for what you described above. Current patch bug-to-bug compatible with the existing code, and I will not expand it. > We need to understand what such conversions are trying to do. ... some time later. > > diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c > > index 0f015b3..c9676fc 100644 > > --- a/sys/kern/kern_tc.c > > +++ b/sys/kern/kern_tc.c > > @@ -70,31 +70,22 @@ struct timehands { > > ... > > static struct timehands th0 = { > > - &dummy_timecounter, > > - 0, > > - (uint64_t)-1 / 1000000, > > - 0, > > - {1, 0}, > > - {0, 0}, > > - {0, 0}, > > - 1, > > - &th1 > > + .th_counter = &dummy_timecounter, > > + .th_scale = (uint64_t)-1 / 1000000, > > + .th_offset = {1, 0}, > > Is there a syntax for avoiding the explicit 0 in a nested initializer? > Something like th_offset.tv_sec = 1. .th_offset = { .sec = 1 }, > > > @@ -378,8 +384,18 @@ microuptime(struct timeval *tvp) > > void > > bintime(struct bintime *bt) > > { > > + struct bintime boottimebin; > > + struct timehands *th; > > + u_int gen; > > > > - binuptime(bt); > > + do { > > + th = timehands; > > + gen = atomic_load_acq_int(&th->th_generation); > > + *bt = th->th_offset; > > + bintime_addx(bt, th->th_scale * tc_delta(th)); > > + boottimebin = th->th_boottime; > > + atomic_thread_fence_acq(); > > + } while (gen == 0 || gen != th->th_generation); > > bintime_add(bt, &boottimebin); > > } > > Better add th_boottime in the loop (and not use a local variable). This > saves copying it in the usual case where the loop is only iterated once. Ok. > > Note that th_offset is already copied to the caller's variable and not > to a local variable. This is not so good for adding the boot time to > it. It might be better to go the other way and copy everything to > local variables, but I fear that register pressure and memory clobbers > will prevent generating best code then. Best code is to copy everything > to registers, then check the generation count, then combine the registers > outside the loop. Lets postpone this, the patch does already enough rototiling. > > > ... > > diff --git a/sys/sys/time.h b/sys/sys/time.h > > index 395e888..659f8e0 100644 > > --- a/sys/sys/time.h > > +++ b/sys/sys/time.h > > @@ -372,8 +372,6 @@ void resettodr(void); > > > > extern volatile time_t time_second; > > extern volatile time_t time_uptime; > > -extern struct bintime boottimebin; > > -extern struct timeval boottime; > > extern struct bintime tc_tick_bt; > > extern sbintime_t tc_tick_sbt; > > extern struct bintime tick_bt; > > Can we fix more style bugs in this file? Here the variables were unsorted > and misindented. And this. diff --git a/sys/compat/linprocfs/linprocfs.c b/sys/compat/linprocfs/linprocfs.c index 56b2ade..a0dce47 100644 --- a/sys/compat/linprocfs/linprocfs.c +++ b/sys/compat/linprocfs/linprocfs.c @@ -447,9 +447,11 @@ linprocfs_dostat(PFS_FILL_ARGS) struct pcpu *pcpu; long cp_time[CPUSTATES]; long *cp; + struct timeval boottime; int i; read_cpu_time(cp_time); + getboottime(&boottime); sbuf_printf(sb, "cpu %ld %ld %ld %ld\n", T2J(cp_time[CP_USER]), T2J(cp_time[CP_NICE]), @@ -624,10 +626,12 @@ static int linprocfs_doprocstat(PFS_FILL_ARGS) { struct kinfo_proc kp; + struct timeval boottime; char state; static int ratelimit = 0; vm_offset_t startcode, startdata; + getboottime(&boottime); sx_slock(&proctree_lock); PROC_LOCK(p); fill_kinfo_proc(p, &kp); diff --git a/sys/fs/devfs/devfs_vnops.c b/sys/fs/devfs/devfs_vnops.c index 7cc0f9e..afa3da4 100644 --- a/sys/fs/devfs/devfs_vnops.c +++ b/sys/fs/devfs/devfs_vnops.c @@ -707,10 +707,11 @@ devfs_getattr(struct vop_getattr_args *ap) { struct vnode *vp = ap->a_vp; struct vattr *vap = ap->a_vap; - int error; struct devfs_dirent *de; struct devfs_mount *dmp; struct cdev *dev; + struct timeval boottime; + int error; error = devfs_populate_vp(vp); if (error != 0) @@ -740,6 +741,7 @@ devfs_getattr(struct vop_getattr_args *ap) vap->va_blocksize = DEV_BSIZE; vap->va_type = vp->v_type; + getboottime(&boottime); #define fix(aa) \ do { \ if ((aa).tv_sec <= 3600) { \ diff --git a/sys/fs/fdescfs/fdesc_vnops.c b/sys/fs/fdescfs/fdesc_vnops.c index 4f6e1b9..65b8a54 100644 --- a/sys/fs/fdescfs/fdesc_vnops.c +++ b/sys/fs/fdescfs/fdesc_vnops.c @@ -394,7 +394,9 @@ fdesc_getattr(struct vop_getattr_args *ap) { struct vnode *vp = ap->a_vp; struct vattr *vap = ap->a_vap; + struct timeval boottime; + getboottime(&boottime); vap->va_mode = S_IRUSR|S_IXUSR|S_IRGRP|S_IXGRP|S_IROTH|S_IXOTH; vap->va_fileid = VTOFDESC(vp)->fd_ix; vap->va_uid = 0; diff --git a/sys/fs/nfs/nfsport.h b/sys/fs/nfs/nfsport.h index 921df2d..6b41e2f 100644 --- a/sys/fs/nfs/nfsport.h +++ b/sys/fs/nfs/nfsport.h @@ -872,7 +872,7 @@ int newnfs_realign(struct mbuf **, int); /* * Set boottime. */ -#define NFSSETBOOTTIME(b) ((b) = boottime) +#define NFSSETBOOTTIME(b) (getboottime(&b)) /* * The size of directory blocks in the buffer cache. diff --git a/sys/fs/procfs/procfs_status.c b/sys/fs/procfs/procfs_status.c index 5a00ee1..defdec3 100644 --- a/sys/fs/procfs/procfs_status.c +++ b/sys/fs/procfs/procfs_status.c @@ -70,6 +70,7 @@ procfs_doprocstatus(PFS_FILL_ARGS) const char *wmesg; char *pc; char *sep; + struct timeval boottime; int pid, ppid, pgid, sid; int i; @@ -129,6 +130,7 @@ procfs_doprocstatus(PFS_FILL_ARGS) calcru(p, &ut, &st); PROC_STATUNLOCK(p); start = p->p_stats->p_start; + getboottime(&boottime); timevaladd(&start, &boottime); sbuf_printf(sb, " %jd,%ld %jd,%ld %jd,%ld", (intmax_t)start.tv_sec, start.tv_usec, diff --git a/sys/kern/kern_acct.c b/sys/kern/kern_acct.c index ef3fd2e..46e6d9b 100644 --- a/sys/kern/kern_acct.c +++ b/sys/kern/kern_acct.c @@ -389,7 +389,7 @@ acct_process(struct thread *td) acct.ac_stime = encode_timeval(st); /* (4) The elapsed time the command ran (and its starting time) */ - tmp = boottime; + getboottime(&tmp); timevaladd(&tmp, &p->p_stats->p_start); acct.ac_btime = tmp.tv_sec; microuptime(&tmp); diff --git a/sys/kern/kern_clock.c b/sys/kern/kern_clock.c index e7a7a99..39ffdb3 100644 --- a/sys/kern/kern_clock.c +++ b/sys/kern/kern_clock.c @@ -381,7 +381,9 @@ volatile int ticks; int psratio; static DPCPU_DEFINE(int, pcputicks); /* Per-CPU version of ticks. */ -static int global_hardclock_run = 0; +#ifdef DEVICE_POLLING +static int devpoll_run = 0; +#endif /* * Initialize clock frequencies and start both clocks running. @@ -584,15 +586,15 @@ hardclock_cnt(int cnt, int usermode) #endif /* We are in charge to handle this tick duty. */ if (newticks > 0) { - /* Dangerous and no need to call these things concurrently. */ - if (atomic_cmpset_acq_int(&global_hardclock_run, 0, 1)) { - tc_ticktock(newticks); + tc_ticktock(newticks); #ifdef DEVICE_POLLING + /* Dangerous and no need to call these things concurrently. */ + if (atomic_cmpset_acq_int(&devpoll_run, 0, 1)) { /* This is very short and quick. */ hardclock_device_poll(); -#endif /* DEVICE_POLLING */ - atomic_store_rel_int(&global_hardclock_run, 0); + atomic_store_rel_int(&devpoll_run, 0); } +#endif /* DEVICE_POLLING */ #ifdef SW_WATCHDOG if (watchdog_enabled > 0) { i = atomic_fetchadd_int(&watchdog_ticks, -newticks); diff --git a/sys/kern/kern_mutex.c b/sys/kern/kern_mutex.c index d167205..ae236eb 100644 --- a/sys/kern/kern_mutex.c +++ b/sys/kern/kern_mutex.c @@ -281,6 +281,39 @@ __mtx_lock_spin_flags(volatile uintptr_t *c, int opts, const char *file, WITNESS_LOCK(&m->lock_object, opts | LOP_EXCLUSIVE, file, line); } +int +__mtx_trylock_spin_flags(volatile uintptr_t *c, int opts, const char *file, + int line) +{ + struct mtx *m; + + if (SCHEDULER_STOPPED()) + return (1); + + m = mtxlock2mtx(c); + + KASSERT(m->mtx_lock != MTX_DESTROYED, + ("mtx_lock_spin() of destroyed mutex @ %s:%d", file, line)); + KASSERT(LOCK_CLASS(&m->lock_object) == &lock_class_mtx_spin, + ("mtx_lock_spin() of sleep mutex %s @ %s:%d", + m->lock_object.lo_name, file, line)); + if (mtx_owned(m)) + KASSERT((m->lock_object.lo_flags & LO_RECURSABLE) != 0 || + (opts & MTX_RECURSE) != 0, + ("mtx_lock_spin: recursed on non-recursive mutex %s @ %s:%d\n", + m->lock_object.lo_name, file, line)); + opts &= ~MTX_RECURSE; + WITNESS_CHECKORDER(&m->lock_object, opts | LOP_NEWORDER | LOP_EXCLUSIVE, + file, line, NULL); + if (__mtx_trylock_spin(m, curthread, opts, file, line)) { + LOCK_LOG_TRY("LOCK", &m->lock_object, opts, 1, file, line); + WITNESS_LOCK(&m->lock_object, opts | LOP_EXCLUSIVE, file, line); + return (1); + } + LOCK_LOG_TRY("LOCK", &m->lock_object, opts, 0, file, line); + return (0); +} + void __mtx_unlock_spin_flags(volatile uintptr_t *c, int opts, const char *file, int line) diff --git a/sys/kern/kern_ntptime.c b/sys/kern/kern_ntptime.c index d352ee7..efc3713 100644 --- a/sys/kern/kern_ntptime.c +++ b/sys/kern/kern_ntptime.c @@ -162,29 +162,12 @@ static l_fp time_adj; /* tick adjust (ns/s) */ static int64_t time_adjtime; /* correction from adjtime(2) (usec) */ -static struct mtx ntpadj_lock; -MTX_SYSINIT(ntpadj, &ntpadj_lock, "ntpadj", -#ifdef PPS_SYNC - MTX_SPIN -#else - MTX_DEF -#endif -); +static struct mtx ntp_lock; +MTX_SYSINIT(ntp, &ntp_lock, "ntp", MTX_SPIN); -/* - * When PPS_SYNC is defined, hardpps() function is provided which can - * be legitimately called from interrupt filters. Due to this, use - * spinlock for ntptime state protection, otherwise sleepable mutex is - * adequate. - */ -#ifdef PPS_SYNC -#define NTPADJ_LOCK() mtx_lock_spin(&ntpadj_lock) -#define NTPADJ_UNLOCK() mtx_unlock_spin(&ntpadj_lock) -#else -#define NTPADJ_LOCK() mtx_lock(&ntpadj_lock) -#define NTPADJ_UNLOCK() mtx_unlock(&ntpadj_lock) -#endif -#define NTPADJ_ASSERT_LOCKED() mtx_assert(&ntpadj_lock, MA_OWNED) +#define NTP_LOCK() mtx_lock_spin(&ntp_lock) +#define NTP_UNLOCK() mtx_unlock_spin(&ntp_lock) +#define NTP_ASSERT_LOCKED() mtx_assert(&ntp_lock, MA_OWNED) #ifdef PPS_SYNC /* @@ -271,7 +254,7 @@ ntp_gettime1(struct ntptimeval *ntvp) { struct timespec atv; /* nanosecond time */ - NTPADJ_ASSERT_LOCKED(); + NTP_ASSERT_LOCKED(); nanotime(&atv); ntvp->time.tv_sec = atv.tv_sec; @@ -302,9 +285,9 @@ sys_ntp_gettime(struct thread *td, struct ntp_gettime_args *uap) { struct ntptimeval ntv; - NTPADJ_LOCK(); + NTP_LOCK(); ntp_gettime1(&ntv); - NTPADJ_UNLOCK(); + NTP_UNLOCK(); td->td_retval[0] = ntv.time_state; return (copyout(&ntv, uap->ntvp, sizeof(ntv))); @@ -315,9 +298,9 @@ ntp_sysctl(SYSCTL_HANDLER_ARGS) { struct ntptimeval ntv; /* temporary structure */ - NTPADJ_LOCK(); + NTP_LOCK(); ntp_gettime1(&ntv); - NTPADJ_UNLOCK(); + NTP_UNLOCK(); return (sysctl_handle_opaque(oidp, &ntv, sizeof(ntv), req)); } @@ -382,7 +365,7 @@ sys_ntp_adjtime(struct thread *td, struct ntp_adjtime_args *uap) error = priv_check(td, PRIV_NTP_ADJTIME); if (error != 0) return (error); - NTPADJ_LOCK(); + NTP_LOCK(); if (modes & MOD_MAXERROR) time_maxerror = ntv.maxerror; if (modes & MOD_ESTERROR) @@ -484,7 +467,7 @@ sys_ntp_adjtime(struct thread *td, struct ntp_adjtime_args *uap) ntv.stbcnt = pps_stbcnt; #endif /* PPS_SYNC */ retval = ntp_is_time_error(time_status) ? TIME_ERROR : time_state; - NTPADJ_UNLOCK(); + NTP_UNLOCK(); error = copyout((caddr_t)&ntv, (caddr_t)uap->tp, sizeof(ntv)); if (error == 0) @@ -506,6 +489,8 @@ ntp_update_second(int64_t *adjustment, time_t *newsec) int tickrate; l_fp ftemp; /* 32/64-bit temporary */ + NTP_LOCK(); + /* * On rollover of the second both the nanosecond and microsecond * clocks are updated and the state machine cranked as @@ -627,6 +612,8 @@ ntp_update_second(int64_t *adjustment, time_t *newsec) else time_status &= ~STA_PPSSIGNAL; #endif /* PPS_SYNC */ + + NTP_UNLOCK(); } /* @@ -690,7 +677,7 @@ hardupdate(offset) long mtemp; l_fp ftemp; - NTPADJ_ASSERT_LOCKED(); + NTP_ASSERT_LOCKED(); /* * Select how the phase is to be controlled and from which @@ -772,7 +759,7 @@ hardpps(tsp, nsec) long u_sec, u_nsec, v_nsec; /* temps */ l_fp ftemp; - NTPADJ_LOCK(); + NTP_LOCK(); /* * The signal is first processed by a range gate and frequency @@ -956,7 +943,7 @@ hardpps(tsp, nsec) time_freq = pps_freq; out: - NTPADJ_UNLOCK(); + NTP_UNLOCK(); } #endif /* PPS_SYNC */ @@ -999,11 +986,11 @@ kern_adjtime(struct thread *td, struct timeval *delta, struct timeval *olddelta) return (error); ltw = (int64_t)delta->tv_sec * 1000000 + delta->tv_usec; } - NTPADJ_LOCK(); + NTP_LOCK(); ltr = time_adjtime; if (delta != NULL) time_adjtime = ltw; - NTPADJ_UNLOCK(); + NTP_UNLOCK(); if (olddelta != NULL) { atv.tv_sec = ltr / 1000000; atv.tv_usec = ltr % 1000000; diff --git a/sys/kern/kern_proc.c b/sys/kern/kern_proc.c index 2f1f620..892b23a 100644 --- a/sys/kern/kern_proc.c +++ b/sys/kern/kern_proc.c @@ -872,6 +872,7 @@ fill_kinfo_proc_only(struct proc *p, struct kinfo_proc *kp) struct session *sp; struct ucred *cred; struct sigacts *ps; + struct timeval boottime; /* For proc_realparent. */ sx_assert(&proctree_lock, SX_LOCKED); @@ -953,6 +954,7 @@ fill_kinfo_proc_only(struct proc *p, struct kinfo_proc *kp) kp->ki_nice = p->p_nice; kp->ki_fibnum = p->p_fibnum; kp->ki_start = p->p_stats->p_start; + getboottime(&boottime); timevaladd(&kp->ki_start, &boottime); PROC_STATLOCK(p); rufetch(p, &kp->ki_rusage); diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c index 0f015b3..e917b5a 100644 --- a/sys/kern/kern_tc.c +++ b/sys/kern/kern_tc.c @@ -70,31 +70,22 @@ struct timehands { struct bintime th_offset; struct timeval th_microtime; struct timespec th_nanotime; + struct bintime th_boottime; /* Fields not to be copied in tc_windup start with th_generation. */ u_int th_generation; struct timehands *th_next; }; static struct timehands th0; -static struct timehands th9 = { NULL, 0, 0, 0, {0, 0}, {0, 0}, {0, 0}, 0, &th0}; -static struct timehands th8 = { NULL, 0, 0, 0, {0, 0}, {0, 0}, {0, 0}, 0, &th9}; -static struct timehands th7 = { NULL, 0, 0, 0, {0, 0}, {0, 0}, {0, 0}, 0, &th8}; -static struct timehands th6 = { NULL, 0, 0, 0, {0, 0}, {0, 0}, {0, 0}, 0, &th7}; -static struct timehands th5 = { NULL, 0, 0, 0, {0, 0}, {0, 0}, {0, 0}, 0, &th6}; -static struct timehands th4 = { NULL, 0, 0, 0, {0, 0}, {0, 0}, {0, 0}, 0, &th5}; -static struct timehands th3 = { NULL, 0, 0, 0, {0, 0}, {0, 0}, {0, 0}, 0, &th4}; -static struct timehands th2 = { NULL, 0, 0, 0, {0, 0}, {0, 0}, {0, 0}, 0, &th3}; -static struct timehands th1 = { NULL, 0, 0, 0, {0, 0}, {0, 0}, {0, 0}, 0, &th2}; +static struct timehands th1 = { + .th_next = &th0 +}; static struct timehands th0 = { - &dummy_timecounter, - 0, - (uint64_t)-1 / 1000000, - 0, - {1, 0}, - {0, 0}, - {0, 0}, - 1, - &th1 + .th_counter = &dummy_timecounter, + .th_scale = (uint64_t)-1 / 1000000, + .th_offset = { .sec = 1 }, + .th_generation = 1, + .th_next = &th1 }; static struct timehands *volatile timehands = &th0; @@ -106,8 +97,6 @@ int tc_min_ticktock_freq = 1; volatile time_t time_second = 1; volatile time_t time_uptime = 1; -struct bintime boottimebin; -struct timeval boottime; static int sysctl_kern_boottime(SYSCTL_HANDLER_ARGS); SYSCTL_PROC(_kern, KERN_BOOTTIME, boottime, CTLTYPE_STRUCT|CTLFLAG_RD, NULL, 0, sysctl_kern_boottime, "S,timeval", "System boottime"); @@ -135,7 +124,7 @@ SYSCTL_PROC(_kern_timecounter, OID_AUTO, alloweddeviation, static int tc_chosen; /* Non-zero if a specific tc was chosen via sysctl. */ -static void tc_windup(void); +static void tc_windup(struct bintime *new_boottimebin); static void cpu_tick_calibrate(int); void dtrace_getnanotime(struct timespec *tsp); @@ -143,6 +132,10 @@ void dtrace_getnanotime(struct timespec *tsp); static int sysctl_kern_boottime(SYSCTL_HANDLER_ARGS) { + struct timeval boottime; + + getboottime(&boottime); + #ifndef __mips__ #ifdef SCTL_MASK32 int tv[2]; @@ -150,11 +143,11 @@ sysctl_kern_boottime(SYSCTL_HANDLER_ARGS) if (req->flags & SCTL_MASK32) { tv[0] = boottime.tv_sec; tv[1] = boottime.tv_usec; - return SYSCTL_OUT(req, tv, sizeof(tv)); - } else + return (SYSCTL_OUT(req, tv, sizeof(tv))); + } #endif #endif - return SYSCTL_OUT(req, &boottime, sizeof(boottime)); + return (SYSCTL_OUT(req, &boottime, sizeof(boottime))); } static int @@ -164,7 +157,7 @@ sysctl_kern_timecounter_get(SYSCTL_HANDLER_ARGS) struct timecounter *tc = arg1; ncount = tc->tc_get_timecount(tc); - return sysctl_handle_int(oidp, &ncount, 0, req); + return (sysctl_handle_int(oidp, &ncount, 0, req)); } static int @@ -174,7 +167,7 @@ sysctl_kern_timecounter_freq(SYSCTL_HANDLER_ARGS) struct timecounter *tc = arg1; freq = tc->tc_frequency; - return sysctl_handle_64(oidp, &freq, 0, req); + return (sysctl_handle_64(oidp, &freq, 0, req)); } /* @@ -198,7 +191,7 @@ tc_delta(struct timehands *th) */ #ifdef FFCLOCK -void +static void fbclock_binuptime(struct bintime *bt) { struct timehands *th; @@ -234,9 +227,17 @@ fbclock_microuptime(struct timeval *tvp) void fbclock_bintime(struct bintime *bt) { + struct timehands *th; + unsigned int gen; - fbclock_binuptime(bt); - bintime_add(bt, &boottimebin); + do { + th = timehands; + gen = atomic_load_acq_int(&th->th_generation); + *bt = th->th_offset; + bintime_addx(bt, th->th_scale * tc_delta(th)); + bintime_add(bt, &th->th_boottime); + atomic_thread_fence_acq(); + } while (gen == 0 || gen != th->th_generation); } void @@ -309,9 +310,9 @@ fbclock_getbintime(struct bintime *bt) th = timehands; gen = atomic_load_acq_int(&th->th_generation); *bt = th->th_offset; + bintime_add(bt, &th->th_boottime); atomic_thread_fence_acq(); } while (gen == 0 || gen != th->th_generation); - bintime_add(bt, &boottimebin); } void @@ -378,9 +379,17 @@ microuptime(struct timeval *tvp) void bintime(struct bintime *bt) { + struct timehands *th; + u_int gen; - binuptime(bt); - bintime_add(bt, &boottimebin); + do { + th = timehands; + gen = atomic_load_acq_int(&th->th_generation); + *bt = th->th_offset; + bintime_addx(bt, th->th_scale * tc_delta(th)); + bintime_add(bt, &th->th_boottime); + atomic_thread_fence_acq(); + } while (gen == 0 || gen != th->th_generation); } void @@ -453,9 +462,9 @@ getbintime(struct bintime *bt) th = timehands; gen = atomic_load_acq_int(&th->th_generation); *bt = th->th_offset; + bintime_add(bt, &th->th_boottime); atomic_thread_fence_acq(); } while (gen == 0 || gen != th->th_generation); - bintime_add(bt, &boottimebin); } void @@ -487,6 +496,29 @@ getmicrotime(struct timeval *tvp) } #endif /* FFCLOCK */ +void +getboottime(struct timeval *boottime) +{ + struct bintime boottimebin; + + getboottimebin(&boottimebin); + bintime2timeval(&boottimebin, boottime); +} + +void +getboottimebin(struct bintime *boottimebin) +{ + struct timehands *th; + u_int gen; + + do { + th = timehands; + gen = atomic_load_acq_int(&th->th_generation); + *boottimebin = th->th_boottime; + atomic_thread_fence_acq(); + } while (gen == 0 || gen != th->th_generation); +} + #ifdef FFCLOCK /* * Support for feed-forward synchronization algorithms. This is heavily inspired @@ -1103,6 +1135,7 @@ int sysclock_snap2bintime(struct sysclock_snap *cs, struct bintime *bt, int whichclock, uint32_t flags) { + struct bintime boottimebin; #ifdef FFCLOCK struct bintime bt2; uint64_t period; @@ -1116,8 +1149,10 @@ sysclock_snap2bintime(struct sysclock_snap *cs, struct bintime *bt, if (cs->delta > 0) bintime_addx(bt, cs->fb_info.th_scale * cs->delta); - if ((flags & FBCLOCK_UPTIME) == 0) + if ((flags & FBCLOCK_UPTIME) == 0) { + getboottimebin(&boottimebin); bintime_add(bt, &boottimebin); + } break; #ifdef FFCLOCK case SYSCLOCK_FFWD: @@ -1226,10 +1261,12 @@ tc_getfrequency(void) return (timehands->th_counter->tc_frequency); } +static struct mtx tc_setclock_mtx; +MTX_SYSINIT(tc_setclock_init, &tc_setclock_mtx, "tcsetc", MTX_SPIN); + /* * Step our concept of UTC. This is done by modifying our estimate of * when we booted. - * XXX: not locked. */ void tc_setclock(struct timespec *ts) @@ -1237,26 +1274,24 @@ tc_setclock(struct timespec *ts) struct timespec tbef, taft; struct bintime bt, bt2; - cpu_tick_calibrate(1); - nanotime(&tbef); timespec2bintime(ts, &bt); + nanotime(&tbef); + mtx_lock_spin(&tc_setclock_mtx); + cpu_tick_calibrate(1); binuptime(&bt2); bintime_sub(&bt, &bt2); - bintime_add(&bt2, &boottimebin); - boottimebin = bt; - bintime2timeval(&bt, &boottime); /* XXX fiddle all the little crinkly bits around the fiords... */ - tc_windup(); - nanotime(&taft); + tc_windup(&bt); + mtx_unlock_spin(&tc_setclock_mtx); if (timestepwarnings) { + nanotime(&taft); log(LOG_INFO, "Time stepped from %jd.%09ld to %jd.%09ld (%jd.%09ld)\n", (intmax_t)tbef.tv_sec, tbef.tv_nsec, (intmax_t)taft.tv_sec, taft.tv_nsec, (intmax_t)ts->tv_sec, ts->tv_nsec); } - cpu_tick_calibrate(1); } /* @@ -1265,7 +1300,7 @@ tc_setclock(struct timespec *ts) * timecounter and/or do seconds processing in NTP. Slightly magic. */ static void -tc_windup(void) +tc_windup(struct bintime *new_boottimebin) { struct bintime bt; struct timehands *th, *tho; @@ -1289,6 +1324,8 @@ tc_windup(void) th->th_generation = 0; atomic_thread_fence_rel(); bcopy(tho, th, offsetof(struct timehands, th_generation)); + if (new_boottimebin != NULL) + th->th_boottime = *new_boottimebin; /* * Capture a timecounter delta on the current timecounter and if @@ -1338,7 +1375,7 @@ tc_windup(void) * case we missed a leap second. */ bt = th->th_offset; - bintime_add(&bt, &boottimebin); + bintime_add(&bt, &th->th_boottime); i = bt.sec - tho->th_microtime.tv_sec; if (i > LARGE_STEP) i = 2; @@ -1346,7 +1383,7 @@ tc_windup(void) t = bt.sec; ntp_update_second(&th->th_adjustment, &bt.sec); if (bt.sec != t) - boottimebin.sec += bt.sec - t; + th->th_boottime.sec += bt.sec - t; } /* Update the UTC timestamps used by the get*() functions. */ /* XXX shouldn't do this here. Should force non-`get' versions. */ @@ -1769,7 +1806,7 @@ pps_event(struct pps_state *pps, int event) tcount &= pps->capth->th_counter->tc_counter_mask; bt = pps->capth->th_offset; bintime_addx(&bt, pps->capth->th_scale * tcount); - bintime_add(&bt, &boottimebin); + bintime_add(&bt, &pps->capth->th_boottime); bintime2timespec(&bt, &ts); /* If the timecounter was wound up underneath us, bail out. */ @@ -1842,11 +1879,14 @@ tc_ticktock(int cnt) { static int count; - count += cnt; - if (count < tc_tick) - return; - count = 0; - tc_windup(); + if (mtx_trylock_spin(&tc_setclock_mtx)) { + count += cnt; + if (count >= tc_tick) { + count = 0; + tc_windup(NULL); + } + mtx_unlock_spin(&tc_setclock_mtx); + } } static void __inline @@ -1921,7 +1961,9 @@ inittimecounter(void *dummy) /* warm up new timecounter (again) and get rolling. */ (void)timecounter->tc_get_timecount(timecounter); (void)timecounter->tc_get_timecount(timecounter); - tc_windup(); + mtx_lock_spin(&tc_setclock_mtx); + tc_windup(NULL); + mtx_unlock_spin(&tc_setclock_mtx); } SYSINIT(timecounter, SI_SUB_CLOCKS, SI_ORDER_SECOND, inittimecounter, NULL); @@ -2095,7 +2137,7 @@ tc_fill_vdso_timehands(struct vdso_timehands *vdso_th) vdso_th->th_offset_count = th->th_offset_count; vdso_th->th_counter_mask = th->th_counter->tc_counter_mask; vdso_th->th_offset = th->th_offset; - vdso_th->th_boottime = boottimebin; + vdso_th->th_boottime = th->th_boottime; enabled = cpu_fill_vdso_timehands(vdso_th, th->th_counter); if (!vdso_th_enable) enabled = 0; @@ -2116,8 +2158,8 @@ tc_fill_vdso_timehands32(struct vdso_timehands32 *vdso_th32) vdso_th32->th_counter_mask = th->th_counter->tc_counter_mask; vdso_th32->th_offset.sec = th->th_offset.sec; *(uint64_t *)&vdso_th32->th_offset.frac[0] = th->th_offset.frac; - vdso_th32->th_boottime.sec = boottimebin.sec; - *(uint64_t *)&vdso_th32->th_boottime.frac[0] = boottimebin.frac; + vdso_th32->th_boottime.sec = th->th_boottime.sec; + *(uint64_t *)&vdso_th32->th_boottime.frac[0] = th->th_boottime.frac; enabled = cpu_fill_vdso_timehands32(vdso_th32, th->th_counter); if (!vdso_th_enable) enabled = 0; diff --git a/sys/kern/kern_time.c b/sys/kern/kern_time.c index 148da2b..82710f7 100644 --- a/sys/kern/kern_time.c +++ b/sys/kern/kern_time.c @@ -115,9 +115,7 @@ settime(struct thread *td, struct timeval *tv) struct timeval delta, tv1, tv2; static struct timeval maxtime, laststep; struct timespec ts; - int s; - s = splclock(); microtime(&tv1); delta = *tv; timevalsub(&delta, &tv1); @@ -147,10 +145,8 @@ settime(struct thread *td, struct timeval *tv) printf("Time adjustment clamped to -1 second\n"); } } else { - if (tv1.tv_sec == laststep.tv_sec) { - splx(s); + if (tv1.tv_sec == laststep.tv_sec) return (EPERM); - } if (delta.tv_sec > 1) { tv->tv_sec = tv1.tv_sec + 1; printf("Time adjustment clamped to +1 second\n"); @@ -161,10 +157,8 @@ settime(struct thread *td, struct timeval *tv) ts.tv_sec = tv->tv_sec; ts.tv_nsec = tv->tv_usec * 1000; - mtx_lock(&Giant); tc_setclock(&ts); resettodr(); - mtx_unlock(&Giant); return (0); } diff --git a/sys/kern/subr_rtc.c b/sys/kern/subr_rtc.c index dbad36d..4bac324 100644 --- a/sys/kern/subr_rtc.c +++ b/sys/kern/subr_rtc.c @@ -172,11 +172,11 @@ resettodr(void) if (disable_rtc_set || clock_dev == NULL) return; - mtx_lock(&resettodr_lock); getnanotime(&ts); timespecadd(&ts, &clock_adj); ts.tv_sec -= utc_offset(); /* XXX: We should really set all registered RTCs */ + mtx_lock(&resettodr_lock); error = CLOCK_SETTIME(clock_dev, &ts); mtx_unlock(&resettodr_lock); if (error != 0) diff --git a/sys/kern/sys_procdesc.c b/sys/kern/sys_procdesc.c index 37139c1..f47ae7c 100644 --- a/sys/kern/sys_procdesc.c +++ b/sys/kern/sys_procdesc.c @@ -517,7 +517,7 @@ procdesc_stat(struct file *fp, struct stat *sb, struct ucred *active_cred, struct thread *td) { struct procdesc *pd; - struct timeval pstart; + struct timeval pstart, boottime; /* * XXXRW: Perhaps we should cache some more information from the @@ -532,6 +532,7 @@ procdesc_stat(struct file *fp, struct stat *sb, struct ucred *active_cred, /* Set birth and [acm] times to process start time. */ pstart = pd->pd_proc->p_stats->p_start; + getboottime(&boottime); timevaladd(&pstart, &boottime); TIMEVAL_TO_TIMESPEC(&pstart, &sb->st_birthtim); sb->st_atim = sb->st_birthtim; diff --git a/sys/net/bpf.c b/sys/net/bpf.c index 3b12cf4..4251f71 100644 --- a/sys/net/bpf.c +++ b/sys/net/bpf.c @@ -2328,12 +2328,13 @@ bpf_hdrlen(struct bpf_d *d) static void bpf_bintime2ts(struct bintime *bt, struct bpf_ts *ts, int tstype) { - struct bintime bt2; + struct bintime bt2, boottimebin; struct timeval tsm; struct timespec tsn; if ((tstype & BPF_T_MONOTONIC) == 0) { bt2 = *bt; + getboottimebin(&boottimebin); bintime_add(&bt2, &boottimebin); bt = &bt2; } diff --git a/sys/netpfil/ipfw/ip_fw_sockopt.c b/sys/netpfil/ipfw/ip_fw_sockopt.c index d186ba5..7faf99f 100644 --- a/sys/netpfil/ipfw/ip_fw_sockopt.c +++ b/sys/netpfil/ipfw/ip_fw_sockopt.c @@ -395,6 +395,7 @@ swap_map(struct ip_fw_chain *chain, struct ip_fw **new_map, int new_len) static void export_cntr1_base(struct ip_fw *krule, struct ip_fw_bcounter *cntr) { + struct timeval boottime; cntr->size = sizeof(*cntr); @@ -403,21 +404,26 @@ export_cntr1_base(struct ip_fw *krule, struct ip_fw_bcounter *cntr) cntr->bcnt = counter_u64_fetch(krule->cntr + 1); cntr->timestamp = krule->timestamp; } - if (cntr->timestamp > 0) + if (cntr->timestamp > 0) { + getboottime(&boottime); cntr->timestamp += boottime.tv_sec; + } } static void export_cntr0_base(struct ip_fw *krule, struct ip_fw_bcounter0 *cntr) { + struct timeval boottime; if (krule->cntr != NULL) { cntr->pcnt = counter_u64_fetch(krule->cntr); cntr->bcnt = counter_u64_fetch(krule->cntr + 1); cntr->timestamp = krule->timestamp; } - if (cntr->timestamp > 0) + if (cntr->timestamp > 0) { + getboottime(&boottime); cntr->timestamp += boottime.tv_sec; + } } /* @@ -2048,11 +2054,13 @@ ipfw_getrules(struct ip_fw_chain *chain, void *buf, size_t space) char *ep = bp + space; struct ip_fw *rule; struct ip_fw_rule0 *dst; + struct timeval boottime; int error, i, l, warnflag; time_t boot_seconds; warnflag = 0; + getboottime(&boottime); boot_seconds = boottime.tv_sec; for (i = 0; i < chain->n_rules; i++) { rule = chain->map[i]; diff --git a/sys/nfs/nfs_lock.c b/sys/nfs/nfs_lock.c index 7d11672..c84413e 100644 --- a/sys/nfs/nfs_lock.c +++ b/sys/nfs/nfs_lock.c @@ -241,6 +241,7 @@ nfs_dolock(struct vop_advlock_args *ap) struct flock *fl; struct proc *p; struct nfsmount *nmp; + struct timeval boottime; td = curthread; p = td->td_proc; @@ -284,6 +285,7 @@ nfs_dolock(struct vop_advlock_args *ap) p->p_nlminfo = malloc(sizeof(struct nlminfo), M_NLMINFO, M_WAITOK | M_ZERO); p->p_nlminfo->pid_start = p->p_stats->p_start; + getboottime(&boottime); timevaladd(&p->p_nlminfo->pid_start, &boottime); } msg.lm_msg_ident.pid_start = p->p_nlminfo->pid_start; diff --git a/sys/rpc/rpcsec_gss/svc_rpcsec_gss.c b/sys/rpc/rpcsec_gss/svc_rpcsec_gss.c index 1d07943..0879299 100644 --- a/sys/rpc/rpcsec_gss/svc_rpcsec_gss.c +++ b/sys/rpc/rpcsec_gss/svc_rpcsec_gss.c @@ -504,11 +504,13 @@ svc_rpc_gss_find_client(struct svc_rpc_gss_clientid *id) { struct svc_rpc_gss_client *client; struct svc_rpc_gss_client_list *list; + struct timeval boottime; unsigned long hostid; rpc_gss_log_debug("in svc_rpc_gss_find_client(%d)", id->ci_id); getcredhostid(curthread->td_ucred, &hostid); + getboottime(&boottime); if (id->ci_hostid != hostid || id->ci_boottime != boottime.tv_sec) return (NULL); @@ -537,6 +539,7 @@ svc_rpc_gss_create_client(void) { struct svc_rpc_gss_client *client; struct svc_rpc_gss_client_list *list; + struct timeval boottime; unsigned long hostid; rpc_gss_log_debug("in svc_rpc_gss_create_client()"); @@ -547,6 +550,7 @@ svc_rpc_gss_create_client(void) sx_init(&client->cl_lock, "GSS-client"); getcredhostid(curthread->td_ucred, &hostid); client->cl_id.ci_hostid = hostid; + getboottime(&boottime); client->cl_id.ci_boottime = boottime.tv_sec; client->cl_id.ci_id = svc_rpc_gss_next_clientid++; list = &svc_rpc_gss_client_hash[client->cl_id.ci_id % CLIENT_HASH_SIZE]; diff --git a/sys/sys/mutex.h b/sys/sys/mutex.h index 0443922..db05fc9 100644 --- a/sys/sys/mutex.h +++ b/sys/sys/mutex.h @@ -112,6 +112,8 @@ void __mtx_unlock_flags(volatile uintptr_t *c, int opts, const char *file, int line); void __mtx_lock_spin_flags(volatile uintptr_t *c, int opts, const char *file, int line); +int __mtx_trylock_spin_flags(volatile uintptr_t *c, int opts, + const char *file, int line); void __mtx_unlock_spin_flags(volatile uintptr_t *c, int opts, const char *file, int line); #if defined(INVARIANTS) || defined(INVARIANT_SUPPORT) @@ -152,6 +154,8 @@ void thread_lock_flags_(struct thread *, int, const char *, int); __mtx_unlock_flags(&(m)->mtx_lock, o, f, l) #define _mtx_lock_spin_flags(m, o, f, l) \ __mtx_lock_spin_flags(&(m)->mtx_lock, o, f, l) +#define _mtx_trylock_spin_flags(m, o, f, l) \ + __mtx_trylock_spin_flags(&(m)->mtx_lock, o, f, l) #define _mtx_unlock_spin_flags(m, o, f, l) \ __mtx_unlock_spin_flags(&(m)->mtx_lock, o, f, l) #if defined(INVARIANTS) || defined(INVARIANT_SUPPORT) @@ -212,6 +216,21 @@ void thread_lock_flags_(struct thread *, int, const char *, int); LOCKSTAT_PROFILE_OBTAIN_LOCK_SUCCESS(spin__acquire, \ mp, 0, 0, file, line); \ } while (0) +#define __mtx_trylock_spin(mp, tid, opts, file, line) __extension__ ({ \ + uintptr_t _tid = (uintptr_t)(tid); \ + int _ret; \ + \ + spinlock_enter(); \ + if (((mp)->mtx_lock != MTX_UNOWNED || !_mtx_obtain_lock((mp), _tid))) {\ + spinlock_exit(); \ + _ret = 0; \ + } else { \ + LOCKSTAT_PROFILE_OBTAIN_LOCK_SUCCESS(spin__acquire, \ + mp, 0, 0, file, line); \ + _ret = 1; \ + } \ + _ret; \ +}) #else /* SMP */ #define __mtx_lock_spin(mp, tid, opts, file, line) do { \ uintptr_t _tid = (uintptr_t)(tid); \ @@ -224,6 +243,20 @@ void thread_lock_flags_(struct thread *, int, const char *, int); (mp)->mtx_lock = _tid; \ } \ } while (0) +#define __mtx_trylock_spin(mp, tid, opts, file, line) __extension__ ({ \ + uintptr_t _tid = (uintptr_t)(tid); \ + int _ret; \ + \ + spinlock_enter(); \ + if (((mp)->mtx_lock != MTX_UNOWNED) \ + spinlock_exit(); \ + _ret = 0; \ + } else { \ + (mp)->mtx_lock = _tid; \ + _ret = 1; \ + } \ + _ret; \ +}) #endif /* SMP */ /* Unlock a normal mutex. */ @@ -302,6 +335,7 @@ void thread_lock_flags_(struct thread *, int, const char *, int); #define mtx_lock(m) mtx_lock_flags((m), 0) #define mtx_lock_spin(m) mtx_lock_spin_flags((m), 0) #define mtx_trylock(m) mtx_trylock_flags((m), 0) +#define mtx_trylock_spin(m) mtx_trylock_spin_flags((m), 0) #define mtx_unlock(m) mtx_unlock_flags((m), 0) #define mtx_unlock_spin(m) mtx_unlock_spin_flags((m), 0) @@ -335,6 +369,8 @@ extern struct mtx_pool *mtxpool_sleep; _mtx_unlock_flags((m), (opts), (file), (line)) #define mtx_lock_spin_flags_(m, opts, file, line) \ _mtx_lock_spin_flags((m), (opts), (file), (line)) +#define mtx_trylock_spin_flags_(m, opts, file, line) \ + _mtx_trylock_spin_flags((m), (opts), (file), (line)) #define mtx_unlock_spin_flags_(m, opts, file, line) \ _mtx_unlock_spin_flags((m), (opts), (file), (line)) #else /* LOCK_DEBUG == 0 && !MUTEX_NOINLINE */ @@ -344,6 +380,8 @@ extern struct mtx_pool *mtxpool_sleep; __mtx_unlock((m), curthread, (opts), (file), (line)) #define mtx_lock_spin_flags_(m, opts, file, line) \ __mtx_lock_spin((m), curthread, (opts), (file), (line)) +#define mtx_trylock_spin_flags_(m, opts, file, line) \ + __mtx_trylock_spin((m), curthread, (opts), (file), (line)) #define mtx_unlock_spin_flags_(m, opts, file, line) \ __mtx_unlock_spin((m)) #endif /* LOCK_DEBUG > 0 || MUTEX_NOINLINE */ @@ -369,6 +407,8 @@ extern struct mtx_pool *mtxpool_sleep; mtx_unlock_spin_flags_((m), (opts), LOCK_FILE, LOCK_LINE) #define mtx_trylock_flags(m, opts) \ mtx_trylock_flags_((m), (opts), LOCK_FILE, LOCK_LINE) +#define mtx_trylock_spin_flags(m, opts) \ + mtx_trylock_spin_flags_((m), (opts), LOCK_FILE, LOCK_LINE) #define mtx_assert(m, what) \ mtx_assert_((m), (what), __FILE__, __LINE__) diff --git a/sys/sys/time.h b/sys/sys/time.h index 395e888..659f8e0 100644 --- a/sys/sys/time.h +++ b/sys/sys/time.h @@ -372,8 +372,6 @@ void resettodr(void); extern volatile time_t time_second; extern volatile time_t time_uptime; -extern struct bintime boottimebin; -extern struct timeval boottime; extern struct bintime tc_tick_bt; extern sbintime_t tc_tick_sbt; extern struct bintime tick_bt; @@ -440,6 +438,9 @@ void getbintime(struct bintime *bt); void getnanotime(struct timespec *tsp); void getmicrotime(struct timeval *tvp); +void getboottime(struct timeval *boottime); +void getboottimebin(struct bintime *boottimebin); + /* Other functions */ int itimerdecr(struct itimerval *itp, int usec); int itimerfix(struct timeval *tv);
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20160702145134.GI38613>