From owner-svn-src-all@freebsd.org Fri Jul 1 14:25:31 2016 Return-Path: Delivered-To: svn-src-all@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 82232B88F31; Fri, 1 Jul 2016 14:25:31 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 02EB720A9; Fri, 1 Jul 2016 14:25:30 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id u61EPHxW089967 (version=TLSv1 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Fri, 1 Jul 2016 17:25:17 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua u61EPHxW089967 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id u61EPGgK089963; Fri, 1 Jul 2016 17:25:16 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Fri, 1 Jul 2016 17:25:16 +0300 From: Konstantin Belousov To: Bruce Evans Cc: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: Re: svn commit: r302252 - head/sys/kern Message-ID: <20160701142516.GW38613@kib.kiev.ua> References: <201606281643.u5SGhNsi061606@repo.freebsd.org> <20160629175917.O968@besplex.bde.org> <20160629145443.GG38613@kib.kiev.ua> <20160629153233.GI38613@kib.kiev.ua> <20160630040123.F791@besplex.bde.org> <20160629211953.GK38613@kib.kiev.ua> <20160701005401.Q1084@besplex.bde.org> <20160630180106.GU38613@kib.kiev.ua> <20160701031549.GV38613@kib.kiev.ua> <20160701185743.Q1600@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160701185743.Q1600@besplex.bde.org> User-Agent: Mutt/1.6.1 (2016-04-27) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Jul 2016 14:25:31 -0000 On Fri, Jul 01, 2016 at 08:39:48PM +1000, Bruce Evans wrote: > It seems simple and clean enough, but is too much during a re freeze. > > I will only make some minor comments about style. Well, it is not only about style. If you have no more comments, I will ask for testing. The patch is about fixing bugs, although in somewhat extended scope, so I think it is still fine as the things do not explode. I added the stats to the patch, it is not that intrusive actually. I still do not see why/do not want to use spinlock for the tc_windup() exclusion. Patch is at the end of the message. > > > diff --git a/sys/compat/linprocfs/linprocfs.c b/sys/compat/linprocfs/linprocfs.c > > index 56b2ade..a0dce47 100644 > > --- a/sys/compat/linprocfs/linprocfs.c > > +++ b/sys/compat/linprocfs/linprocfs.c > > @@ -447,9 +447,11 @@ linprocfs_dostat(PFS_FILL_ARGS) > > struct pcpu *pcpu; > > long cp_time[CPUSTATES]; > > long *cp; > > + struct timeval boottime; > > int i; > > > > read_cpu_time(cp_time); > > + getboottime(&boottime); > > This is used surprisingly often by too many subsystems. With the value still > broken so that locking it doesn't help much, I would leave it as a global. I prefer to keep the KPI consistent. > > + .th_generation = 0, > > + .th_next = &th0 > > +}; > > This shouldn't spell out all the 0 initializers. That was only needed to > reach the non-0 initializer at the end of the initializer. I thought about it, but initially did not liked the implicit initialization. Changed. > > static int > > sysctl_kern_boottime(SYSCTL_HANDLER_ARGS) > > { > > + struct bintime boottimebin; > > + struct timeval boottime; > > + > > + binuptime1(NULL, &boottimebin); > > + bintime2timeval(&boottimebin, &boottime); > > Use the wrapper function getboottime() if you keep it. Yes, I wrote this before I added the helper. > So maybe use a new general function that returns boottimebin for the > non-uptime functions only. Possibly it can add the offset directly. I changed getbintime() and getboottimebin() to use the fenced magic and fetch boottime inside the loop. This removed the need for binuptime1(). > > > @@ -1116,8 +1170,10 @@ sysclock_snap2bintime(struct sysclock_snap *cs, struct bintime *bt, > > if (cs->delta > 0) > > bintime_addx(bt, cs->fb_info.th_scale * cs->delta); > > > > - if ((flags & FBCLOCK_UPTIME) == 0) > > + if ((flags & FBCLOCK_UPTIME) == 0) { > > + binuptime1(NULL, &boottimebin); > > Perhaps use a more direct way to get boottimebin(). binuptime1() is > pessimized by null pointer checks for both its args. Use the new > function getboottimebin() even if is not direct. Used it, and now it is a direct interface into timehands as well. > > > ... > > diff --git a/sys/net/bpf.c b/sys/net/bpf.c > > index 3b12cf4..4251f71 100644 > > --- a/sys/net/bpf.c > > +++ b/sys/net/bpf.c > > @@ -2328,12 +2328,13 @@ bpf_hdrlen(struct bpf_d *d) > > static void > > bpf_bintime2ts(struct bintime *bt, struct bpf_ts *ts, int tstype) > > { > > - struct bintime bt2; > > + struct bintime bt2, boottimebin; > > struct timeval tsm; > > struct timespec tsn; > > > > if ((tstype & BPF_T_MONOTONIC) == 0) { > > bt2 = *bt; > > + getboottimebin(&boottimebin); > > bintime_add(&bt2, &boottimebin); > > bt = &bt2; > > } > > This is still too chummy with the (mis)implementation. I think it is to > convert a relative CLOCK_MONOTONIC time to an absolute CLOCK_REALTIME > time. That should be done in the time subsystem. It is unclear if > we care about strict real time with leap seconds adjustments. > > BPF_T_MONOTONIC is not documented in any man page or used in any library > or applicatiion in /usr/src. BPF_T_BINTIME is documented bpf(4) but not > used in any library or application in /usr/src. Similarly for all of > BPF_T_*. contrib/libpcap has 47 lines matching timeval, 2 matching > timespec and 0 matching bintime. I will leave this to bpf maintainers. > > > diff --git a/sys/netpfil/ipfw/ip_fw_sockopt.c b/sys/netpfil/ipfw/ip_fw_sockopt.c > > index d186ba5..7faf99f 100644 > > --- a/sys/netpfil/ipfw/ip_fw_sockopt.c > > +++ b/sys/netpfil/ipfw/ip_fw_sockopt.c > > @@ -395,6 +395,7 @@ swap_map(struct ip_fw_chain *chain, struct ip_fw **new_map, int new_len) > > static void > > export_cntr1_base(struct ip_fw *krule, struct ip_fw_bcounter *cntr) > > { > > + struct timeval boottime; > > > > cntr->size = sizeof(*cntr); > > > > @@ -403,21 +404,26 @@ export_cntr1_base(struct ip_fw *krule, struct ip_fw_bcounter *cntr) > > cntr->bcnt = counter_u64_fetch(krule->cntr + 1); > > cntr->timestamp = krule->timestamp; > > } > > - if (cntr->timestamp > 0) > > + if (cntr->timestamp > 0) { > > + getboottime(&boottime); > > cntr->timestamp += boottime.tv_sec; > > + } > > } > > Looks like another home made conversion from CLOCK_MONOTONIC to > CLOCK_REALTIME. Easier to understand since it is actually used. > ipfw seems to use only getmicrouptime() for timestamping. There are > good reasons for using monotonic time for everything except presenting > the time to the user and the conversion for that should be more careful > than the above. I leave this one to net- people as well. > > diff --git a/sys/rpc/rpcsec_gss/svc_rpcsec_gss.c b/sys/rpc/rpcsec_gss/svc_rpcsec_gss.c > > index 1d07943..0879299 100644 > > --- a/sys/rpc/rpcsec_gss/svc_rpcsec_gss.c > > +++ b/sys/rpc/rpcsec_gss/svc_rpcsec_gss.c > > @@ -504,11 +504,13 @@ svc_rpc_gss_find_client(struct svc_rpc_gss_clientid *id) > > { > > struct svc_rpc_gss_client *client; > > struct svc_rpc_gss_client_list *list; > > + struct timeval boottime; > > unsigned long hostid; > > > > rpc_gss_log_debug("in svc_rpc_gss_find_client(%d)", id->ci_id); > > > > getcredhostid(curthread->td_ucred, &hostid); > > + getboottime(&boottime); > > if (id->ci_hostid != hostid || id->ci_boottime != boottime.tv_sec) > > return (NULL); > > Here it is hopefully just a magic id, with the user being a remote system. > Any time that doesn't go backwards or forwards so far that it is in the > lieftime of an old or new boot instance works well for identifying the > boot instance. It does not work for leap seconds in the same way as is does not work after setclock(). So I just leave this conversion as is. sys/compat/linprocfs/linprocfs.c | 4 + sys/fs/devfs/devfs_vnops.c | 4 +- sys/fs/fdescfs/fdesc_vnops.c | 2 + sys/fs/nfs/nfsport.h | 2 +- sys/fs/procfs/procfs_status.c | 2 + sys/kern/kern_acct.c | 2 +- sys/kern/kern_ntptime.c | 55 +++++------- sys/kern/kern_proc.c | 2 + sys/kern/kern_tc.c | 161 +++++++++++++++++++++++++----------- sys/kern/kern_time.c | 8 +- sys/kern/subr_rtc.c | 2 +- sys/kern/sys_procdesc.c | 3 +- sys/net/bpf.c | 3 +- sys/netpfil/ipfw/ip_fw_sockopt.c | 12 ++- sys/nfs/nfs_lock.c | 2 + sys/rpc/rpcsec_gss/svc_rpcsec_gss.c | 4 + sys/sys/time.h | 5 +- 17 files changed, 174 insertions(+), 99 deletions(-) diff --git a/sys/compat/linprocfs/linprocfs.c b/sys/compat/linprocfs/linprocfs.c index 56b2ade..a0dce47 100644 --- a/sys/compat/linprocfs/linprocfs.c +++ b/sys/compat/linprocfs/linprocfs.c @@ -447,9 +447,11 @@ linprocfs_dostat(PFS_FILL_ARGS) struct pcpu *pcpu; long cp_time[CPUSTATES]; long *cp; + struct timeval boottime; int i; read_cpu_time(cp_time); + getboottime(&boottime); sbuf_printf(sb, "cpu %ld %ld %ld %ld\n", T2J(cp_time[CP_USER]), T2J(cp_time[CP_NICE]), @@ -624,10 +626,12 @@ static int linprocfs_doprocstat(PFS_FILL_ARGS) { struct kinfo_proc kp; + struct timeval boottime; char state; static int ratelimit = 0; vm_offset_t startcode, startdata; + getboottime(&boottime); sx_slock(&proctree_lock); PROC_LOCK(p); fill_kinfo_proc(p, &kp); diff --git a/sys/fs/devfs/devfs_vnops.c b/sys/fs/devfs/devfs_vnops.c index 7cc0f9e..afa3da4 100644 --- a/sys/fs/devfs/devfs_vnops.c +++ b/sys/fs/devfs/devfs_vnops.c @@ -707,10 +707,11 @@ devfs_getattr(struct vop_getattr_args *ap) { struct vnode *vp = ap->a_vp; struct vattr *vap = ap->a_vap; - int error; struct devfs_dirent *de; struct devfs_mount *dmp; struct cdev *dev; + struct timeval boottime; + int error; error = devfs_populate_vp(vp); if (error != 0) @@ -740,6 +741,7 @@ devfs_getattr(struct vop_getattr_args *ap) vap->va_blocksize = DEV_BSIZE; vap->va_type = vp->v_type; + getboottime(&boottime); #define fix(aa) \ do { \ if ((aa).tv_sec <= 3600) { \ diff --git a/sys/fs/fdescfs/fdesc_vnops.c b/sys/fs/fdescfs/fdesc_vnops.c index 4f6e1b9..65b8a54 100644 --- a/sys/fs/fdescfs/fdesc_vnops.c +++ b/sys/fs/fdescfs/fdesc_vnops.c @@ -394,7 +394,9 @@ fdesc_getattr(struct vop_getattr_args *ap) { struct vnode *vp = ap->a_vp; struct vattr *vap = ap->a_vap; + struct timeval boottime; + getboottime(&boottime); vap->va_mode = S_IRUSR|S_IXUSR|S_IRGRP|S_IXGRP|S_IROTH|S_IXOTH; vap->va_fileid = VTOFDESC(vp)->fd_ix; vap->va_uid = 0; diff --git a/sys/fs/nfs/nfsport.h b/sys/fs/nfs/nfsport.h index 921df2d..6b41e2f 100644 --- a/sys/fs/nfs/nfsport.h +++ b/sys/fs/nfs/nfsport.h @@ -872,7 +872,7 @@ int newnfs_realign(struct mbuf **, int); /* * Set boottime. */ -#define NFSSETBOOTTIME(b) ((b) = boottime) +#define NFSSETBOOTTIME(b) (getboottime(&b)) /* * The size of directory blocks in the buffer cache. diff --git a/sys/fs/procfs/procfs_status.c b/sys/fs/procfs/procfs_status.c index 5a00ee1..defdec3 100644 --- a/sys/fs/procfs/procfs_status.c +++ b/sys/fs/procfs/procfs_status.c @@ -70,6 +70,7 @@ procfs_doprocstatus(PFS_FILL_ARGS) const char *wmesg; char *pc; char *sep; + struct timeval boottime; int pid, ppid, pgid, sid; int i; @@ -129,6 +130,7 @@ procfs_doprocstatus(PFS_FILL_ARGS) calcru(p, &ut, &st); PROC_STATUNLOCK(p); start = p->p_stats->p_start; + getboottime(&boottime); timevaladd(&start, &boottime); sbuf_printf(sb, " %jd,%ld %jd,%ld %jd,%ld", (intmax_t)start.tv_sec, start.tv_usec, diff --git a/sys/kern/kern_acct.c b/sys/kern/kern_acct.c index ef3fd2e..46e6d9b 100644 --- a/sys/kern/kern_acct.c +++ b/sys/kern/kern_acct.c @@ -389,7 +389,7 @@ acct_process(struct thread *td) acct.ac_stime = encode_timeval(st); /* (4) The elapsed time the command ran (and its starting time) */ - tmp = boottime; + getboottime(&tmp); timevaladd(&tmp, &p->p_stats->p_start); acct.ac_btime = tmp.tv_sec; microuptime(&tmp); diff --git a/sys/kern/kern_ntptime.c b/sys/kern/kern_ntptime.c index d352ee7..efc3713 100644 --- a/sys/kern/kern_ntptime.c +++ b/sys/kern/kern_ntptime.c @@ -162,29 +162,12 @@ static l_fp time_adj; /* tick adjust (ns/s) */ static int64_t time_adjtime; /* correction from adjtime(2) (usec) */ -static struct mtx ntpadj_lock; -MTX_SYSINIT(ntpadj, &ntpadj_lock, "ntpadj", -#ifdef PPS_SYNC - MTX_SPIN -#else - MTX_DEF -#endif -); +static struct mtx ntp_lock; +MTX_SYSINIT(ntp, &ntp_lock, "ntp", MTX_SPIN); -/* - * When PPS_SYNC is defined, hardpps() function is provided which can - * be legitimately called from interrupt filters. Due to this, use - * spinlock for ntptime state protection, otherwise sleepable mutex is - * adequate. - */ -#ifdef PPS_SYNC -#define NTPADJ_LOCK() mtx_lock_spin(&ntpadj_lock) -#define NTPADJ_UNLOCK() mtx_unlock_spin(&ntpadj_lock) -#else -#define NTPADJ_LOCK() mtx_lock(&ntpadj_lock) -#define NTPADJ_UNLOCK() mtx_unlock(&ntpadj_lock) -#endif -#define NTPADJ_ASSERT_LOCKED() mtx_assert(&ntpadj_lock, MA_OWNED) +#define NTP_LOCK() mtx_lock_spin(&ntp_lock) +#define NTP_UNLOCK() mtx_unlock_spin(&ntp_lock) +#define NTP_ASSERT_LOCKED() mtx_assert(&ntp_lock, MA_OWNED) #ifdef PPS_SYNC /* @@ -271,7 +254,7 @@ ntp_gettime1(struct ntptimeval *ntvp) { struct timespec atv; /* nanosecond time */ - NTPADJ_ASSERT_LOCKED(); + NTP_ASSERT_LOCKED(); nanotime(&atv); ntvp->time.tv_sec = atv.tv_sec; @@ -302,9 +285,9 @@ sys_ntp_gettime(struct thread *td, struct ntp_gettime_args *uap) { struct ntptimeval ntv; - NTPADJ_LOCK(); + NTP_LOCK(); ntp_gettime1(&ntv); - NTPADJ_UNLOCK(); + NTP_UNLOCK(); td->td_retval[0] = ntv.time_state; return (copyout(&ntv, uap->ntvp, sizeof(ntv))); @@ -315,9 +298,9 @@ ntp_sysctl(SYSCTL_HANDLER_ARGS) { struct ntptimeval ntv; /* temporary structure */ - NTPADJ_LOCK(); + NTP_LOCK(); ntp_gettime1(&ntv); - NTPADJ_UNLOCK(); + NTP_UNLOCK(); return (sysctl_handle_opaque(oidp, &ntv, sizeof(ntv), req)); } @@ -382,7 +365,7 @@ sys_ntp_adjtime(struct thread *td, struct ntp_adjtime_args *uap) error = priv_check(td, PRIV_NTP_ADJTIME); if (error != 0) return (error); - NTPADJ_LOCK(); + NTP_LOCK(); if (modes & MOD_MAXERROR) time_maxerror = ntv.maxerror; if (modes & MOD_ESTERROR) @@ -484,7 +467,7 @@ sys_ntp_adjtime(struct thread *td, struct ntp_adjtime_args *uap) ntv.stbcnt = pps_stbcnt; #endif /* PPS_SYNC */ retval = ntp_is_time_error(time_status) ? TIME_ERROR : time_state; - NTPADJ_UNLOCK(); + NTP_UNLOCK(); error = copyout((caddr_t)&ntv, (caddr_t)uap->tp, sizeof(ntv)); if (error == 0) @@ -506,6 +489,8 @@ ntp_update_second(int64_t *adjustment, time_t *newsec) int tickrate; l_fp ftemp; /* 32/64-bit temporary */ + NTP_LOCK(); + /* * On rollover of the second both the nanosecond and microsecond * clocks are updated and the state machine cranked as @@ -627,6 +612,8 @@ ntp_update_second(int64_t *adjustment, time_t *newsec) else time_status &= ~STA_PPSSIGNAL; #endif /* PPS_SYNC */ + + NTP_UNLOCK(); } /* @@ -690,7 +677,7 @@ hardupdate(offset) long mtemp; l_fp ftemp; - NTPADJ_ASSERT_LOCKED(); + NTP_ASSERT_LOCKED(); /* * Select how the phase is to be controlled and from which @@ -772,7 +759,7 @@ hardpps(tsp, nsec) long u_sec, u_nsec, v_nsec; /* temps */ l_fp ftemp; - NTPADJ_LOCK(); + NTP_LOCK(); /* * The signal is first processed by a range gate and frequency @@ -956,7 +943,7 @@ hardpps(tsp, nsec) time_freq = pps_freq; out: - NTPADJ_UNLOCK(); + NTP_UNLOCK(); } #endif /* PPS_SYNC */ @@ -999,11 +986,11 @@ kern_adjtime(struct thread *td, struct timeval *delta, struct timeval *olddelta) return (error); ltw = (int64_t)delta->tv_sec * 1000000 + delta->tv_usec; } - NTPADJ_LOCK(); + NTP_LOCK(); ltr = time_adjtime; if (delta != NULL) time_adjtime = ltw; - NTPADJ_UNLOCK(); + NTP_UNLOCK(); if (olddelta != NULL) { atv.tv_sec = ltr / 1000000; atv.tv_usec = ltr % 1000000; diff --git a/sys/kern/kern_proc.c b/sys/kern/kern_proc.c index 2f1f620..892b23a 100644 --- a/sys/kern/kern_proc.c +++ b/sys/kern/kern_proc.c @@ -872,6 +872,7 @@ fill_kinfo_proc_only(struct proc *p, struct kinfo_proc *kp) struct session *sp; struct ucred *cred; struct sigacts *ps; + struct timeval boottime; /* For proc_realparent. */ sx_assert(&proctree_lock, SX_LOCKED); @@ -953,6 +954,7 @@ fill_kinfo_proc_only(struct proc *p, struct kinfo_proc *kp) kp->ki_nice = p->p_nice; kp->ki_fibnum = p->p_fibnum; kp->ki_start = p->p_stats->p_start; + getboottime(&boottime); timevaladd(&kp->ki_start, &boottime); PROC_STATLOCK(p); rufetch(p, &kp->ki_rusage); diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c index 0f015b3..c9676fc 100644 --- a/sys/kern/kern_tc.c +++ b/sys/kern/kern_tc.c @@ -70,31 +70,22 @@ struct timehands { struct bintime th_offset; struct timeval th_microtime; struct timespec th_nanotime; + struct bintime th_boottime; /* Fields not to be copied in tc_windup start with th_generation. */ u_int th_generation; struct timehands *th_next; }; static struct timehands th0; -static struct timehands th9 = { NULL, 0, 0, 0, {0, 0}, {0, 0}, {0, 0}, 0, &th0}; -static struct timehands th8 = { NULL, 0, 0, 0, {0, 0}, {0, 0}, {0, 0}, 0, &th9}; -static struct timehands th7 = { NULL, 0, 0, 0, {0, 0}, {0, 0}, {0, 0}, 0, &th8}; -static struct timehands th6 = { NULL, 0, 0, 0, {0, 0}, {0, 0}, {0, 0}, 0, &th7}; -static struct timehands th5 = { NULL, 0, 0, 0, {0, 0}, {0, 0}, {0, 0}, 0, &th6}; -static struct timehands th4 = { NULL, 0, 0, 0, {0, 0}, {0, 0}, {0, 0}, 0, &th5}; -static struct timehands th3 = { NULL, 0, 0, 0, {0, 0}, {0, 0}, {0, 0}, 0, &th4}; -static struct timehands th2 = { NULL, 0, 0, 0, {0, 0}, {0, 0}, {0, 0}, 0, &th3}; -static struct timehands th1 = { NULL, 0, 0, 0, {0, 0}, {0, 0}, {0, 0}, 0, &th2}; +static struct timehands th1 = { + .th_next = &th0 +}; static struct timehands th0 = { - &dummy_timecounter, - 0, - (uint64_t)-1 / 1000000, - 0, - {1, 0}, - {0, 0}, - {0, 0}, - 1, - &th1 + .th_counter = &dummy_timecounter, + .th_scale = (uint64_t)-1 / 1000000, + .th_offset = {1, 0}, + .th_generation = 1, + .th_next = &th1 }; static struct timehands *volatile timehands = &th0; @@ -106,8 +97,6 @@ int tc_min_ticktock_freq = 1; volatile time_t time_second = 1; volatile time_t time_uptime = 1; -struct bintime boottimebin; -struct timeval boottime; static int sysctl_kern_boottime(SYSCTL_HANDLER_ARGS); SYSCTL_PROC(_kern, KERN_BOOTTIME, boottime, CTLTYPE_STRUCT|CTLFLAG_RD, NULL, 0, sysctl_kern_boottime, "S,timeval", "System boottime"); @@ -135,7 +124,8 @@ SYSCTL_PROC(_kern_timecounter, OID_AUTO, alloweddeviation, static int tc_chosen; /* Non-zero if a specific tc was chosen via sysctl. */ -static void tc_windup(void); +static void tc_windup(bool top_call, struct bintime *new_boottimebin); +static void tc_windup_locked(struct bintime *new_boottimebin); static void cpu_tick_calibrate(int); void dtrace_getnanotime(struct timespec *tsp); @@ -143,6 +133,10 @@ void dtrace_getnanotime(struct timespec *tsp); static int sysctl_kern_boottime(SYSCTL_HANDLER_ARGS) { + struct timeval boottime; + + getboottime(&boottime); + #ifndef __mips__ #ifdef SCTL_MASK32 int tv[2]; @@ -150,11 +144,11 @@ sysctl_kern_boottime(SYSCTL_HANDLER_ARGS) if (req->flags & SCTL_MASK32) { tv[0] = boottime.tv_sec; tv[1] = boottime.tv_usec; - return SYSCTL_OUT(req, tv, sizeof(tv)); - } else + return (SYSCTL_OUT(req, tv, sizeof(tv))); + } #endif #endif - return SYSCTL_OUT(req, &boottime, sizeof(boottime)); + return (SYSCTL_OUT(req, &boottime, sizeof(boottime))); } static int @@ -164,7 +158,7 @@ sysctl_kern_timecounter_get(SYSCTL_HANDLER_ARGS) struct timecounter *tc = arg1; ncount = tc->tc_get_timecount(tc); - return sysctl_handle_int(oidp, &ncount, 0, req); + return (sysctl_handle_int(oidp, &ncount, 0, req)); } static int @@ -174,7 +168,7 @@ sysctl_kern_timecounter_freq(SYSCTL_HANDLER_ARGS) struct timecounter *tc = arg1; freq = tc->tc_frequency; - return sysctl_handle_64(oidp, &freq, 0, req); + return (sysctl_handle_64(oidp, &freq, 0, req)); } /* @@ -198,7 +192,7 @@ tc_delta(struct timehands *th) */ #ifdef FFCLOCK -void +static void fbclock_binuptime(struct bintime *bt) { struct timehands *th; @@ -234,8 +228,18 @@ fbclock_microuptime(struct timeval *tvp) void fbclock_bintime(struct bintime *bt) { + struct bintime boottimebin; + struct timehands *th; + unsigned int gen; - fbclock_binuptime(bt); + do { + th = timehands; + gen = atomic_load_acq_int(&th->th_generation); + *bt = th->th_offset; + bintime_addx(bt, th->th_scale * tc_delta(th)); + boottimebin = th->th_boottime; + atomic_thread_fence_acq(); + } while (gen == 0 || gen != th->th_generation); bintime_add(bt, &boottimebin); } @@ -303,12 +307,14 @@ void fbclock_getbintime(struct bintime *bt) { struct timehands *th; + struct bintime boottimebin; unsigned int gen; do { th = timehands; gen = atomic_load_acq_int(&th->th_generation); *bt = th->th_offset; + boottimebin = th->th_boottime; atomic_thread_fence_acq(); } while (gen == 0 || gen != th->th_generation); bintime_add(bt, &boottimebin); @@ -378,8 +384,18 @@ microuptime(struct timeval *tvp) void bintime(struct bintime *bt) { + struct bintime boottimebin; + struct timehands *th; + u_int gen; - binuptime(bt); + do { + th = timehands; + gen = atomic_load_acq_int(&th->th_generation); + *bt = th->th_offset; + bintime_addx(bt, th->th_scale * tc_delta(th)); + boottimebin = th->th_boottime; + atomic_thread_fence_acq(); + } while (gen == 0 || gen != th->th_generation); bintime_add(bt, &boottimebin); } @@ -447,12 +463,14 @@ void getbintime(struct bintime *bt) { struct timehands *th; + struct bintime boottimebin; u_int gen; do { th = timehands; gen = atomic_load_acq_int(&th->th_generation); *bt = th->th_offset; + boottimebin = th->th_boottime; atomic_thread_fence_acq(); } while (gen == 0 || gen != th->th_generation); bintime_add(bt, &boottimebin); @@ -487,6 +505,29 @@ getmicrotime(struct timeval *tvp) } #endif /* FFCLOCK */ +void +getboottime(struct timeval *boottime) +{ + struct bintime boottimebin; + + getboottimebin(&boottimebin); + bintime2timeval(&boottimebin, boottime); +} + +void +getboottimebin(struct bintime *boottimebin) +{ + struct timehands *th; + u_int gen; + + do { + th = timehands; + gen = atomic_load_acq_int(&th->th_generation); + *boottimebin = th->th_boottime; + atomic_thread_fence_acq(); + } while (gen == 0 || gen != th->th_generation); +} + #ifdef FFCLOCK /* * Support for feed-forward synchronization algorithms. This is heavily inspired @@ -1103,6 +1144,7 @@ int sysclock_snap2bintime(struct sysclock_snap *cs, struct bintime *bt, int whichclock, uint32_t flags) { + struct bintime boottimebin; #ifdef FFCLOCK struct bintime bt2; uint64_t period; @@ -1116,8 +1158,10 @@ sysclock_snap2bintime(struct sysclock_snap *cs, struct bintime *bt, if (cs->delta > 0) bintime_addx(bt, cs->fb_info.th_scale * cs->delta); - if ((flags & FBCLOCK_UPTIME) == 0) + if ((flags & FBCLOCK_UPTIME) == 0) { + getboottimebin(&boottimebin); bintime_add(bt, &boottimebin); + } break; #ifdef FFCLOCK case SYSCLOCK_FFWD: @@ -1226,10 +1270,12 @@ tc_getfrequency(void) return (timehands->th_counter->tc_frequency); } +static struct mtx tc_setclock_mtx; +MTX_SYSINIT(tc_setclock_init, &tc_setclock_mtx, "tcsetc", MTX_DEF); + /* * Step our concept of UTC. This is done by modifying our estimate of * when we booted. - * XXX: not locked. */ void tc_setclock(struct timespec *ts) @@ -1237,26 +1283,43 @@ tc_setclock(struct timespec *ts) struct timespec tbef, taft; struct bintime bt, bt2; - cpu_tick_calibrate(1); - nanotime(&tbef); timespec2bintime(ts, &bt); + nanotime(&tbef); + mtx_lock(&tc_setclock_mtx); + critical_enter(); + cpu_tick_calibrate(1); binuptime(&bt2); bintime_sub(&bt, &bt2); - bintime_add(&bt2, &boottimebin); - boottimebin = bt; - bintime2timeval(&bt, &boottime); /* XXX fiddle all the little crinkly bits around the fiords... */ - tc_windup(); - nanotime(&taft); + tc_windup(true, &bt); + critical_exit(); + mtx_unlock(&tc_setclock_mtx); if (timestepwarnings) { + nanotime(&taft); log(LOG_INFO, "Time stepped from %jd.%09ld to %jd.%09ld (%jd.%09ld)\n", (intmax_t)tbef.tv_sec, tbef.tv_nsec, (intmax_t)taft.tv_sec, taft.tv_nsec, (intmax_t)ts->tv_sec, ts->tv_nsec); } - cpu_tick_calibrate(1); +} + +static volatile int tc_windup_lock; +static void +tc_windup(bool top_call, struct bintime *new_boottimebin) +{ + + for (;;) { + if (atomic_cmpset_int(&tc_windup_lock, 0, 1)) { + atomic_thread_fence_acq(); + tc_windup_locked(new_boottimebin); + atomic_store_rel_int(&tc_windup_lock, 0); + break; + } else if (!top_call) { + break; + } + } } /* @@ -1265,7 +1328,7 @@ tc_setclock(struct timespec *ts) * timecounter and/or do seconds processing in NTP. Slightly magic. */ static void -tc_windup(void) +tc_windup_locked(struct bintime *new_boottimebin) { struct bintime bt; struct timehands *th, *tho; @@ -1289,6 +1352,8 @@ tc_windup(void) th->th_generation = 0; atomic_thread_fence_rel(); bcopy(tho, th, offsetof(struct timehands, th_generation)); + if (new_boottimebin != NULL) + th->th_boottime = *new_boottimebin; /* * Capture a timecounter delta on the current timecounter and if @@ -1338,7 +1403,7 @@ tc_windup(void) * case we missed a leap second. */ bt = th->th_offset; - bintime_add(&bt, &boottimebin); + bintime_add(&bt, &th->th_boottime); i = bt.sec - tho->th_microtime.tv_sec; if (i > LARGE_STEP) i = 2; @@ -1346,7 +1411,7 @@ tc_windup(void) t = bt.sec; ntp_update_second(&th->th_adjustment, &bt.sec); if (bt.sec != t) - boottimebin.sec += bt.sec - t; + th->th_boottime.sec += bt.sec - t; } /* Update the UTC timestamps used by the get*() functions. */ /* XXX shouldn't do this here. Should force non-`get' versions. */ @@ -1769,7 +1834,7 @@ pps_event(struct pps_state *pps, int event) tcount &= pps->capth->th_counter->tc_counter_mask; bt = pps->capth->th_offset; bintime_addx(&bt, pps->capth->th_scale * tcount); - bintime_add(&bt, &boottimebin); + bintime_add(&bt, &pps->capth->th_boottime); bintime2timespec(&bt, &ts); /* If the timecounter was wound up underneath us, bail out. */ @@ -1846,7 +1911,7 @@ tc_ticktock(int cnt) if (count < tc_tick) return; count = 0; - tc_windup(); + tc_windup(false, NULL); } static void __inline @@ -1921,7 +1986,7 @@ inittimecounter(void *dummy) /* warm up new timecounter (again) and get rolling. */ (void)timecounter->tc_get_timecount(timecounter); (void)timecounter->tc_get_timecount(timecounter); - tc_windup(); + tc_windup(true, NULL); } SYSINIT(timecounter, SI_SUB_CLOCKS, SI_ORDER_SECOND, inittimecounter, NULL); @@ -2095,7 +2160,7 @@ tc_fill_vdso_timehands(struct vdso_timehands *vdso_th) vdso_th->th_offset_count = th->th_offset_count; vdso_th->th_counter_mask = th->th_counter->tc_counter_mask; vdso_th->th_offset = th->th_offset; - vdso_th->th_boottime = boottimebin; + vdso_th->th_boottime = th->th_boottime; enabled = cpu_fill_vdso_timehands(vdso_th, th->th_counter); if (!vdso_th_enable) enabled = 0; @@ -2116,8 +2181,8 @@ tc_fill_vdso_timehands32(struct vdso_timehands32 *vdso_th32) vdso_th32->th_counter_mask = th->th_counter->tc_counter_mask; vdso_th32->th_offset.sec = th->th_offset.sec; *(uint64_t *)&vdso_th32->th_offset.frac[0] = th->th_offset.frac; - vdso_th32->th_boottime.sec = boottimebin.sec; - *(uint64_t *)&vdso_th32->th_boottime.frac[0] = boottimebin.frac; + vdso_th32->th_boottime.sec = th->th_boottime.sec; + *(uint64_t *)&vdso_th32->th_boottime.frac[0] = th->th_boottime.frac; enabled = cpu_fill_vdso_timehands32(vdso_th32, th->th_counter); if (!vdso_th_enable) enabled = 0; diff --git a/sys/kern/kern_time.c b/sys/kern/kern_time.c index 148da2b..82710f7 100644 --- a/sys/kern/kern_time.c +++ b/sys/kern/kern_time.c @@ -115,9 +115,7 @@ settime(struct thread *td, struct timeval *tv) struct timeval delta, tv1, tv2; static struct timeval maxtime, laststep; struct timespec ts; - int s; - s = splclock(); microtime(&tv1); delta = *tv; timevalsub(&delta, &tv1); @@ -147,10 +145,8 @@ settime(struct thread *td, struct timeval *tv) printf("Time adjustment clamped to -1 second\n"); } } else { - if (tv1.tv_sec == laststep.tv_sec) { - splx(s); + if (tv1.tv_sec == laststep.tv_sec) return (EPERM); - } if (delta.tv_sec > 1) { tv->tv_sec = tv1.tv_sec + 1; printf("Time adjustment clamped to +1 second\n"); @@ -161,10 +157,8 @@ settime(struct thread *td, struct timeval *tv) ts.tv_sec = tv->tv_sec; ts.tv_nsec = tv->tv_usec * 1000; - mtx_lock(&Giant); tc_setclock(&ts); resettodr(); - mtx_unlock(&Giant); return (0); } diff --git a/sys/kern/subr_rtc.c b/sys/kern/subr_rtc.c index dbad36d..4bac324 100644 --- a/sys/kern/subr_rtc.c +++ b/sys/kern/subr_rtc.c @@ -172,11 +172,11 @@ resettodr(void) if (disable_rtc_set || clock_dev == NULL) return; - mtx_lock(&resettodr_lock); getnanotime(&ts); timespecadd(&ts, &clock_adj); ts.tv_sec -= utc_offset(); /* XXX: We should really set all registered RTCs */ + mtx_lock(&resettodr_lock); error = CLOCK_SETTIME(clock_dev, &ts); mtx_unlock(&resettodr_lock); if (error != 0) diff --git a/sys/kern/sys_procdesc.c b/sys/kern/sys_procdesc.c index 37139c1..f47ae7c 100644 --- a/sys/kern/sys_procdesc.c +++ b/sys/kern/sys_procdesc.c @@ -517,7 +517,7 @@ procdesc_stat(struct file *fp, struct stat *sb, struct ucred *active_cred, struct thread *td) { struct procdesc *pd; - struct timeval pstart; + struct timeval pstart, boottime; /* * XXXRW: Perhaps we should cache some more information from the @@ -532,6 +532,7 @@ procdesc_stat(struct file *fp, struct stat *sb, struct ucred *active_cred, /* Set birth and [acm] times to process start time. */ pstart = pd->pd_proc->p_stats->p_start; + getboottime(&boottime); timevaladd(&pstart, &boottime); TIMEVAL_TO_TIMESPEC(&pstart, &sb->st_birthtim); sb->st_atim = sb->st_birthtim; diff --git a/sys/net/bpf.c b/sys/net/bpf.c index 3b12cf4..4251f71 100644 --- a/sys/net/bpf.c +++ b/sys/net/bpf.c @@ -2328,12 +2328,13 @@ bpf_hdrlen(struct bpf_d *d) static void bpf_bintime2ts(struct bintime *bt, struct bpf_ts *ts, int tstype) { - struct bintime bt2; + struct bintime bt2, boottimebin; struct timeval tsm; struct timespec tsn; if ((tstype & BPF_T_MONOTONIC) == 0) { bt2 = *bt; + getboottimebin(&boottimebin); bintime_add(&bt2, &boottimebin); bt = &bt2; } diff --git a/sys/netpfil/ipfw/ip_fw_sockopt.c b/sys/netpfil/ipfw/ip_fw_sockopt.c index d186ba5..7faf99f 100644 --- a/sys/netpfil/ipfw/ip_fw_sockopt.c +++ b/sys/netpfil/ipfw/ip_fw_sockopt.c @@ -395,6 +395,7 @@ swap_map(struct ip_fw_chain *chain, struct ip_fw **new_map, int new_len) static void export_cntr1_base(struct ip_fw *krule, struct ip_fw_bcounter *cntr) { + struct timeval boottime; cntr->size = sizeof(*cntr); @@ -403,21 +404,26 @@ export_cntr1_base(struct ip_fw *krule, struct ip_fw_bcounter *cntr) cntr->bcnt = counter_u64_fetch(krule->cntr + 1); cntr->timestamp = krule->timestamp; } - if (cntr->timestamp > 0) + if (cntr->timestamp > 0) { + getboottime(&boottime); cntr->timestamp += boottime.tv_sec; + } } static void export_cntr0_base(struct ip_fw *krule, struct ip_fw_bcounter0 *cntr) { + struct timeval boottime; if (krule->cntr != NULL) { cntr->pcnt = counter_u64_fetch(krule->cntr); cntr->bcnt = counter_u64_fetch(krule->cntr + 1); cntr->timestamp = krule->timestamp; } - if (cntr->timestamp > 0) + if (cntr->timestamp > 0) { + getboottime(&boottime); cntr->timestamp += boottime.tv_sec; + } } /* @@ -2048,11 +2054,13 @@ ipfw_getrules(struct ip_fw_chain *chain, void *buf, size_t space) char *ep = bp + space; struct ip_fw *rule; struct ip_fw_rule0 *dst; + struct timeval boottime; int error, i, l, warnflag; time_t boot_seconds; warnflag = 0; + getboottime(&boottime); boot_seconds = boottime.tv_sec; for (i = 0; i < chain->n_rules; i++) { rule = chain->map[i]; diff --git a/sys/nfs/nfs_lock.c b/sys/nfs/nfs_lock.c index 7d11672..c84413e 100644 --- a/sys/nfs/nfs_lock.c +++ b/sys/nfs/nfs_lock.c @@ -241,6 +241,7 @@ nfs_dolock(struct vop_advlock_args *ap) struct flock *fl; struct proc *p; struct nfsmount *nmp; + struct timeval boottime; td = curthread; p = td->td_proc; @@ -284,6 +285,7 @@ nfs_dolock(struct vop_advlock_args *ap) p->p_nlminfo = malloc(sizeof(struct nlminfo), M_NLMINFO, M_WAITOK | M_ZERO); p->p_nlminfo->pid_start = p->p_stats->p_start; + getboottime(&boottime); timevaladd(&p->p_nlminfo->pid_start, &boottime); } msg.lm_msg_ident.pid_start = p->p_nlminfo->pid_start; diff --git a/sys/rpc/rpcsec_gss/svc_rpcsec_gss.c b/sys/rpc/rpcsec_gss/svc_rpcsec_gss.c index 1d07943..0879299 100644 --- a/sys/rpc/rpcsec_gss/svc_rpcsec_gss.c +++ b/sys/rpc/rpcsec_gss/svc_rpcsec_gss.c @@ -504,11 +504,13 @@ svc_rpc_gss_find_client(struct svc_rpc_gss_clientid *id) { struct svc_rpc_gss_client *client; struct svc_rpc_gss_client_list *list; + struct timeval boottime; unsigned long hostid; rpc_gss_log_debug("in svc_rpc_gss_find_client(%d)", id->ci_id); getcredhostid(curthread->td_ucred, &hostid); + getboottime(&boottime); if (id->ci_hostid != hostid || id->ci_boottime != boottime.tv_sec) return (NULL); @@ -537,6 +539,7 @@ svc_rpc_gss_create_client(void) { struct svc_rpc_gss_client *client; struct svc_rpc_gss_client_list *list; + struct timeval boottime; unsigned long hostid; rpc_gss_log_debug("in svc_rpc_gss_create_client()"); @@ -547,6 +550,7 @@ svc_rpc_gss_create_client(void) sx_init(&client->cl_lock, "GSS-client"); getcredhostid(curthread->td_ucred, &hostid); client->cl_id.ci_hostid = hostid; + getboottime(&boottime); client->cl_id.ci_boottime = boottime.tv_sec; client->cl_id.ci_id = svc_rpc_gss_next_clientid++; list = &svc_rpc_gss_client_hash[client->cl_id.ci_id % CLIENT_HASH_SIZE]; diff --git a/sys/sys/time.h b/sys/sys/time.h index 395e888..659f8e0 100644 --- a/sys/sys/time.h +++ b/sys/sys/time.h @@ -372,8 +372,6 @@ void resettodr(void); extern volatile time_t time_second; extern volatile time_t time_uptime; -extern struct bintime boottimebin; -extern struct timeval boottime; extern struct bintime tc_tick_bt; extern sbintime_t tc_tick_sbt; extern struct bintime tick_bt; @@ -440,6 +438,9 @@ void getbintime(struct bintime *bt); void getnanotime(struct timespec *tsp); void getmicrotime(struct timeval *tvp); +void getboottime(struct timeval *boottime); +void getboottimebin(struct bintime *boottimebin); + /* Other functions */ int itimerdecr(struct itimerval *itp, int usec); int itimerfix(struct timeval *tv);