Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 11 Jun 2012 12:18:11 +0300
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Robert Watson <rwatson@freebsd.org>
Cc:        arch@freebsd.org
Subject:   Re: Fast gettimeofday(2) and clock_gettime(2)
Message-ID:  <20120611091811.GA2337@deviant.kiev.zoral.com.ua>
In-Reply-To: <alpine.BSF.2.00.1206110952570.78881@fledge.watson.org>
References:  <20120606165115.GQ85127@deviant.kiev.zoral.com.ua> <alpine.BSF.2.00.1206110952570.78881@fledge.watson.org>

next in thread | previous in thread | raw e-mail | index | archive | help

--u3/rZRmxL6MmkK24
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Mon, Jun 11, 2012 at 09:56:54AM +0100, Robert Watson wrote:
> On Wed, 6 Jun 2012, Konstantin Belousov wrote:
>=20
> >The whole struct vdso_timekeep is versioned, as well as individual struc=
t=20
> >vdso_timehands, which should allow to implement future algorithms withou=
t=20
> >breaking binary compatibility.  The code is structured to eventually mov=
e=20
> >__vdso_* functions out of libc into VDSO, if it ever materialize. This=
=20
> >desire explains vdso prefix and header file names.
> >
> >I implemented and tested the userspace timecounter on amd64, both for 64=
=20
> >and 32 bit binaries, it would probably work for i386 too. Other=20
> >architecture maintainers are welcome to add neccessary support there. Yo=
u=20
> >need to provide machine/vdso.h header with definitions of=20
> >VDSO_TIMEHANDS_MD fields for struct vdso_timehands, which should provide=
=20
> >information for userspace to implement fast tc_get_timecount(). The fiel=
ds=20
> >are filled in per-arch cpu_fill_vdso_timehands(9) function. If your=20
> >architecture support 32bit compat, there are cpu_fill_vdso_timehands32(9=
)=20
> >and VDSO_TIMEHANDS_MD32 to code as well. After that, the=20
> >lib/libc/<arch>/sys/__vdso_gettc.c should contain an implemention of=20
> >__vdso_gettc() function, exact analogue of tc_get_timecount().
>=20
> Hi Kostik:
>=20
> I'm glad to see someone is finally grappling with this issue.  I could=20
=46rom the real-world tests, it seems that e.g. mysql does not notice any
difference with the faster (4-7x) gettimeofday() and clock_gettime()
implementations.

> never entirely decide how I felt about the Linux VSDO mechanism, but havi=
ng=20
> some solution here is actually quite important.  A few thoughts that you=
=20
> might comment on:
>=20
> 1) It would be nice if we linked any (future) notion of VDSO to the same
>    mechanism we use for ELF branding/ABI emulation -- you conceivably wan=
t=20
>    to
>    support it not just for native ABI and perhaps 32-bit compat ABIs, but=
=20
>    also
>    the Linux ABI, alternative userspace ABIs (vis o32 on an n64 MIPS=20
>    kernel),
>    and so on.
This is solved almost automatically with any sensible VDSO
implementation, since VDSO _must_ be per-ABI. You simply cannot use
amd64 VDSO in 32bit process. Please note that even the current shared
page mechanism is already per-ABI, since it is attached to struct
sysentvec instance. It just so happen that I found convenient and less
resoirce-wasteful to reuse the same page for both 32bit and 64bit on
amd64. For VDSO, it definitely should be separated (which is trivial).

VDSO for Linux is required to have Linux/amd64 emulator running. At least
I was told so by Chagin.
>=20
> 2) Once the VDSO mechanism is there, you get into feature creep space, and
>    looking at how Linux handles pluggable system call mechanisms for the C
>    library is actually interesting.
Yes, there is a possibility. But really I think that we should left i386
on the pasture. Do we really care about 0.5% (?) of the speed for 32 bit
binaries ?

Main benefits from VDSO is due to improved toolchain support (dwarf
unwinding for signal trampolines) and specific operation optimizations,
like gettimeofday(), which is performed without the need to keep
kernel ABI intact, IMO.

As an example, I saw that Linux could export HPET registers page r/o on
machines where rdtsc is unusable. Due to VDSO, this is transparent to
libc. I preferred not to port HPET timers into usermode for now exactly
because libc should become aware of them.

>=20
> 3) For the purposes of adaptive mutexes in userspace, it really would be=
=20
> quite
>    nice to know whether remote threads are running or not, in the same way
>    that cheap access to remote thread run state in the kernel makes for m=
uch
>    more efficient adaptive spinning.  I wonder if we could use this=20
>    mechanism
>    for that purpose as well.  I guess for now, at least, you're using a=
=20
>    single
>    global page, but in the future, per-process pages might be quite
>    beneficial.

The per-process page looks almost undoable. I think that what could be
made working, although with some hacks, is per-CPU page, and the page
content update on context switch. This could benefit trivial system calls
like getpid(), getppid() and others, but obviously cause increased context
switch latency.

Per-CPU page would then solve the proposal of having an indicator of
other threads running. I am not sure how much do we care of the potential
information leak there.

--u3/rZRmxL6MmkK24
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (FreeBSD)

iEYEARECAAYFAk/Vt9MACgkQC3+MBN1Mb4jWuQCgghMXFCoEjGSg6ZE/7xL5C9an
H84An1rmZwY647w0aU2d3g1uwCWQ3Fql
=i58I
-----END PGP SIGNATURE-----

--u3/rZRmxL6MmkK24--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120611091811.GA2337>