Date: Thu, 7 Jun 2012 12:12:43 +0300 From: Konstantin Belousov <kostikbel@gmail.com> To: Bruce Evans <brde@optusnet.com.au> Cc: freebsd-arch@freebsd.org Subject: Re: Fast gettimeofday(2) and clock_gettime(2) Message-ID: <20120607091243.GV85127@deviant.kiev.zoral.com.ua> In-Reply-To: <20120607130029.K1962@besplex.bde.org> References: <20120606165115.GQ85127@deviant.kiev.zoral.com.ua> <201206061423.53179.jhb@freebsd.org> <20120606205938.GS85127@deviant.kiev.zoral.com.ua> <20120607130029.K1962@besplex.bde.org>
next in thread | previous in thread | raw e-mail | index | archive | help
--fryGc0vzirnrYIcd Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Jun 07, 2012 at 01:00:34PM +1000, Bruce Evans wrote: > On Wed, 6 Jun 2012, Konstantin Belousov wrote: >=20 > >On Wed, Jun 06, 2012 at 02:23:53PM -0400, John Baldwin wrote: > >>In general this looks good but I see a few nits / races: > >> > >>1) You don't follow the model of clearing tk_current to 0 while you > >> are updating the structure that the in-kernel timecounter code > >> uses. This also means you have to avoid using a tk_current of 0 > >> and that userland has to keep spinning as long as tk_current is 0. > >> Without this I believe userland can read a partially updated > >> structure. > >I changed the code to be much more similar to the kern_tc.c. I (re)added > >the generation field, which is set to 0 upon kernel touching timehands. >=20 > Seems necessary. >=20 > >I think this can only happen if tc_windups occurs quite close in > >succession, or usermode thread is suspended for long enough. BTW, > >even generation could loop back to the previous value if thread is > >stopped. >=20 > tc_windup()'s close in succession are bugs, since they cycle the timehands > faster than they were designed to be. We already have too many of these > bugs (where tc_setclock() calls tc_windup(). I didn't notice this > particular problem with it before). Now I will point out that version > 2 of your patch adds more of these calls, apparently to get changes to > happen sooner. But in sysctl_kern_timecounter_hardware(), such a call > was intentionaly left out since it is not needed. Note that tc_tick > prevents calls to tc_windup() more often than about once per msec if > hz > 1000. No, I did not added more tc_windup calls. I added a recalculation of the shared page content on the timecounter change, which is not the same as tc_windup() call. This is exactly to handle a disable of usermode rdtsc use when kernel timecounter hardware changes. >=20 > The generation count makes tc_windup()s close in succession harmless, > except they increase race possibilities by reducing the time-domain > locking. The generation count is 32 bits, so it can only loop back to > a previous value after 2**32 tc_windup_calls. This "can't happen". > What can happen is for the timehands to cycle after something is > preempted for 10-100 msec. Then the generation count allows detection > of the cycling. It only has an effect in this case. Otherwise, the > a thread can be preempted for 10-100 seconds and start up using a > timehands pointer that it read into a register that long ago, and > safely use the old pointer unless its generation has changed. Even > switching the timecounter works in that case. This depends on the > hardware part of the timecounter not going away and the software > keeping most state per-timehands. I reinstantiated the generation counter for rev. 3. >=20 > >There was apparently another issue with version 2. The bcopy() is not > >atomic, so potentially libc could read wrong tk_current. I redid > >the interface to write to the shared page to allow use of real atomics. >=20 > Timecounter code is supposed to be lock-free except for some time-domain > locking. I only see 1 problem with this: where tc_windup() writes the > generation count and other things without asking for these writes to > be ordered. In most cases, the time-domain locking prevents problems. In fact, on x86 the ordering is strong enough that no barriers are needed, this is why the problem goes unnoticed so far. > E.g., when the timehands pointer is read, it remains valid for 9+ > generations of cycling timehands (9+ to 90+ msec). It is only when > it sleeps for this long while holding and planning to use the old > pointer that it needs the generation count to actually work. Another > case is if writes are out of order (can't happen on x86), so: >=20 > /* > * The write to th_generation fails to protect users of th > * via 10-100 msec old pointers if it becomes visible unordered > * after any of the writes done by the bcopy(). Very rare to > * lose here, but th_generation's point is to not lose here. > */ > th->th_generation =3D 0; > bcopy(tho, th, offsetof(struct timehands, th_generation)); >=20 > // finish writing th except for th_generation > th->th_generation =3D ogen; > /* > * The previous write to th_generation fails to protect users > * of th via old pointers if becomes visible unordered before > * all of the other writes (users see the generation change > * via the old pointer, and now since it has become nonzero > * they use the incompletely written data. Again, only a problem > * after 10-100 msec. > */ >=20 > timehands =3D th; > /* > * Now users can grab th via timehands. If timehands became visible > * unordered before all of the other writes except th_generation, > * then users use the incompletely written data. Now the time > * domain locking doesn't help. > */ >=20 > >>2) You read tk->tk_boottime without the tk_current protection in your > >> non-uptime routines. This is racey as the kernel alters the > >> boottime when it skews time for large adjustments from ntp, etc. > >> To be really safe you need to read the boottime inside the loop > >> into a local variable and perhaps use a boolean parameter to decide > >> if you should add it to the computed uptime. > >I moved the bootime to timehands from timekeep, thank you for the > >clarification. >=20 > This isn't bug for bug compatible with the kernel. The kernel has a > global boottimebin which affects uses of old timehands the instance > that it is changed (even before tc_windup() is called). >=20 > >Updated patch is at > >http://people.freebsd.org/~kib/misc/moronix.3.patch >=20 > I had better not be awed by looking at it :-). I will test this with your test code when return to home. --fryGc0vzirnrYIcd Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (FreeBSD) iEYEARECAAYFAk/QcIsACgkQC3+MBN1Mb4jL9gCeM2BJ7raUIf4lK9/cnn7oOt9L DZ0AoLk1bHMpwPz6kSv9mSCtMu5jUbRJ =d4j3 -----END PGP SIGNATURE----- --fryGc0vzirnrYIcd--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120607091243.GV85127>