Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 4 Jun 2012 21:19:17 +0300
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        John Baldwin <jhb@freebsd.org>
Cc:        Gianni <gianni@freebsd.org>, Alan Cox <alc@rice.edu>, Alexander Kabaev <kan@freebsd.org>, Attilio Rao <attilio@freebsd.org>, freebsd-arch@freebsd.org
Subject:   Re: Fwd: [RFC] Kernel shared variables
Message-ID:  <20120604181917.GD85127@deviant.kiev.zoral.com.ua>
In-Reply-To: <201206041101.57486.jhb@freebsd.org>
References:  <CACfq090r1tWhuDkxdSZ24fwafbVKU0yduu1yV2%2BoYo%2BwwT4ipA@mail.gmail.com> <20120603051904.GG2358@deviant.kiev.zoral.com.ua> <20120603184315.T856@besplex.bde.org> <201206041101.57486.jhb@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help

--mSxgbZZZvrAyzONB
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Mon, Jun 04, 2012 at 11:01:57AM -0400, John Baldwin wrote:
> On Sunday, June 03, 2012 6:49:27 am Bruce Evans wrote:
> > On Sun, 3 Jun 2012, Konstantin Belousov wrote:
> >=20
> > > On Sun, Jun 03, 2012 at 07:28:09AM +1000, Bruce Evans wrote:
> > >> On Sat, 2 Jun 2012, Konstantin Belousov wrote:
> > >>> ...
> > >>> In fact, I think that if the whole goal is only fast clocks, then we
> > >>> do not need any additional system mechanisms, since we can easily e=
xport
> > >>> coefficients for rdtsc formula already. E.g. we can put it into elf=
 auxv,
> > >>> which is ugly but bearable.
> > >>
> > >> How do you get the timehands offsets?  These only need to be updated
> > >> every second or so, or when used, but how can the application know
> > >> when they need to be updated if this is not done automatically in the
> > >> kernel by writing to a shared page?  I can only think of the
> > >> application arranging an alarm signal every second or so and updating
> > >> then.  No good for libraries.
> > > What is timehands offsets ? Do you mean things like leap seconds ?
> >=20
> > Yes.  binuptime() is:
> >=20
> > % void
> > % binuptime(struct bintime *bt)
> > % {
> > % 	struct timehands *th;
> > % 	u_int gen;
> > %=20
> > % 	do {
> > % 		th =3D timehands;
> > % 		gen =3D th->th_generation;
> > % 		*bt =3D th->th_offset;
> > % 		bintime_addx(bt, th->th_scale * tc_delta(th));
> > % 	} while (gen =3D=3D 0 || gen !=3D th->th_generation);
> > % }
> >=20
> > Without the kernel providing th->th_offset, you have to do lots of ntp
> > handling for yourself (compatibly with the kernel) just to get an
> > accuracy of 1 second.  Leap seconds don't affect CLOCK_MONOTONIC, but
> > they do affect CLOCK_REALTIME which is the clock id used by
> > gettimeofday().  For the former, you only have to advance the offset
> > for yourself occasionally (compatibly with the kernel) and manage
> > (compatibly with the kernel, especially in the long term) ntp slewing
> > and other syscall/sysctl kernel activity that micro-adjusts th->th_scal=
e.
>=20
> I think duplicating this logic in userland would just be wasteful.  I have
> a private fast gettimeofday() at my current job and it works by exporting
> the current timehands structure (well, the equivalent) to userland.  The
> userland bits then fetch a copy of the details and do the same as bintime=
().
> (I move the math (bintime_addx() and the multiply)) out of the loop howev=
er.
I started yesterday an implementation which uses shared page to export
some variant of timehands, and uses auxv to provide the libc with a pointer
to timehands when rdtsc is reasonable.

I almost finished both 32bit and 64bit userspace, but there is
kernel-side work left. Is your implementation ready or close to be ready
for commit ? In other words, should I drop the efforts, or continue ?

>=20
> > > This is indeed problematic for auxv. For auxv it could be solved by
> > > providing offset for next recheck using syscalls, and making libc cod=
e to
> > > respect this offset. But, I do think that vdso in shared page
> > > is the right solution, not auxv.
> >=20
> > timehands in a shared pages is close to working.  th_generation protects
> > things in the same way as in the kernel, modulo assumptions that writes
> > are ordered.
>=20
> It would work fine.  And in fact, having multiple timehands is actually a
> bug, not a feature.  It lets you compute bogus timestamps if you get pree=
mpted
> at the wrong time and end up with time jumping around.  At Yahoo! we redu=
ced
> the number of timehands structures down to 2 or some such, and I'm now of
> the opinion we should just have one and dispense with the entire array.
>=20
> For my userland case I only export a single timehands copy.
Well, I have to use two copies due to time_t ABI differences, one for
32, and one for 64-bit.

>=20
> > >> rdtsc is also very unportable, even on CPUs that have it.  But all o=
ther
> > >> x86 timecounter hardware is too slow if you want gettimeofday() to b=
e fast
> > >> and as accurate as it is now.
>=20
> For all the hardware where people run mysql and similar software that cal=
ls
> getimeofday() a lot, rdtsc() works just fine.
I also try to mimic kernel code as close as possible, so there are
two possible tsc counters, selection is managed by kernel, but the code
lives in libc or possible vdso. But I do not see immediate use for vdso
just for gettimeofday(2) and clock_gettime(2), although having vdso
to provide unwinding tables for signal trampolines is _very_ desirable.

>=20
> > > !rdtsc hardware is probably cannot be used at all due to need to prov=
ide
> > > usermode access to device registers. The mere presence of rdtsc does =
not
> > > means that usermode indeed can use it, it should be decided by kernel
> > > based on the current in-kernel time source. If rdtsc is not usable, t=
he
> > > corresponding data should not be exported, or implementation should go
> > > directly into syscall or whatever.
>=20
> Yes, the patches I have only work if the kernel uses the TSC as its main
> timecounter as well.
>=20
> > But then applications would:
> > - use gettimeofday() more than they should ("it works on Linux"), even
> >    more than now since when "it works on FreeBSD-x86" too
> > - just be slow when gettimeofday() is slow
> > - kludge around gettimeofday() being slow like they do now
> > - kludge around gettimeofday() being slow not like they do now (use more
> >    complications to probe it being slow).
>=20
> Some applications really need fine-grained timing with as little overhead
> as possible.
>=20
> --=20
> John Baldwin

--mSxgbZZZvrAyzONB
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (FreeBSD)

iEYEARECAAYFAk/M/CUACgkQC3+MBN1Mb4ghXwCgkPtKRATwrzKbJDD0j9LeoqLR
0/MAnRtpx6mS4HOad3y/lgGdV2bducK9
=zlG/
-----END PGP SIGNATURE-----

--mSxgbZZZvrAyzONB--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120604181917.GD85127>