Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 8 Jun 2012 01:56:48 +0300
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Peter Wemm <peter@wemm.org>
Cc:        freebsd-arch@freebsd.org
Subject:   Re: Fast gettimeofday(2) and clock_gettime(2)
Message-ID:  <20120607225648.GC85127@deviant.kiev.zoral.com.ua>
In-Reply-To: <CAGE5yCrk8E5DikNNVQzEZ7bkj98nxQi%2BaWsLsi6d4jc8vLg2PA@mail.gmail.com>
References:  <20120606165115.GQ85127@deviant.kiev.zoral.com.ua> <201206061423.53179.jhb@freebsd.org> <20120606205938.GS85127@deviant.kiev.zoral.com.ua> <201206070850.55751.jhb@freebsd.org> <20120607172839.GZ85127@deviant.kiev.zoral.com.ua> <CAGE5yCrk8E5DikNNVQzEZ7bkj98nxQi%2BaWsLsi6d4jc8vLg2PA@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help

--N+dhEFW7Y2Uiel/w
Content-Type: text/plain; charset=koi8-r
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Thu, Jun 07, 2012 at 03:47:04PM -0700, Peter Wemm wrote:
> On Thu, Jun 7, 2012 at 10:28 AM, Konstantin Belousov
> <kostikbel@gmail.com> wrote:
> > On Thu, Jun 07, 2012 at 08:50:55AM -0400, John Baldwin wrote:
> >> On Wednesday, June 06, 2012 4:59:38 pm Konstantin Belousov wrote:
> >> > On Wed, Jun 06, 2012 at 02:23:53PM -0400, John Baldwin wrote:
> >> > > On Wednesday, June 06, 2012 12:51:15 pm Konstantin Belousov wrote:
> >> > > > A positive result from the recent flame-bait on arch@ is the wor=
king
> >> > > > implementation of the fast gettimeofday(2) and clock_gettime(2).=
 The
> >> > > > speedup I see is around 6-7x on the 2600K. I think the speedup c=
ould
> >> > > > be even bigger on the previous generation of CPUs, where lock
> >> > > > operations and syscall entry are costlier. A sample test runs of
> >> > > > tools/tools/syscall_timing are presented at the end of message.
> >> > >
> >> > > In general this looks good but I see a few nits / races:
> >> > >
> >> > > 1) You don't follow the model of clearing tk_current to 0 while you
> >> > > =9A =9Aare updating the structure that the in-kernel timecounter c=
ode
> >> > > =9A =9Auses. =9AThis also means you have to avoid using a tk_curre=
nt of 0
> >> > > =9A =9Aand that userland has to keep spinning as long as tk_curren=
t is 0.
> >> > > =9A =9AWithout this I believe userland can read a partially updated
> >> > > =9A =9Astructure.
> >> > I changed the code to be much more similar to the kern_tc.c. I (re)a=
dded
> >> > the generation field, which is set to 0 upon kernel touching timehan=
ds.
> >>
> >> Thank you. =9ABTW, I think we should use atomic_load_acq_int() on both=
 accesses
> >> to th_gen (and the in-kernel binuptime should do the same). =9AI reali=
ze this
> >> requires using rmb before the while condition in userland since we can=
't
> >> use atomic_load_acq_int() here. =9AI think it should also use
> >> atomic_store_rel_int() for both stores to th_gen during the tc_windup()
> >> callback.
> > This is done. On the other hand, I removed a store_rel from updating
> > tk_current, since it is after enabling store to th_gen, and the order
> > there does not matter.
> >
> > I also did some restructuring of the userspace, removing layers that
> > Bruce did not liked. Now top-level functions directly call binuptime().
> > I also shortened the preliminary operations by caching timekeep pointer.
> > Its double-initialization is safe.
> >
> > Latest version is at
> > http://people.freebsd.org/~kib/misc/moronix.4.patch
> >
> > I probably move all shared page helpers to separate file from kern_exec=
.c,
> > but this will happen after moronix is committed.
>=20
> Stepping back for a moment.. why even have a shared page at all, in
> common MI code?
The decision to use shared page is delegated to MD, but MI code handles
most of the details, since there is no much difference if shared page
is used.

>=20
> The AMD64 kernel can simply make a page readable from within kernel
> space since it's page level protected.
All arches which use shared page use it this way now. See below.

>=20
> The i386 kernel needs the same treatment.  We can save one clock cycle
> from address generation by switching to page protection for the kernel
> and using a full 4GB %cs/%ds/etc.  With that fix we could do the same
> there.  I've been meaning to "fix" this for about 8 years now.
Sorry, I do not follow. Aren't we already use 4GB segments on i386 ?

>=20
> There would have been no need to allocate "space" in userland for
> things like signal trampolines because it could be executed directly
> from a kernel page by unprivileged user code.
This is how it is done already. But the shared page is mapped at the
fixed location at the usermode, which simplifies things for debugging at
least.

>=20
> Things like allocating a shared page could be a MD backend decision
> for architectures that don't have page level access control for where
> the kernel lives.
This is exactly how it is done now. Per-ABI struct sysentvec has a flag
indicating were the shared page is needed for ABI, and where to map it.

>=20
> Things like tc_fill_vdso_timehands() could go away if userland could
> be allowed to directly read the kernel's version.  With a little
> linker magic, the 'struct timehands' stuff could be marshaled into a
> page and the auxinfo point to it.
I dislike the idea of directly exporting a kernel structure into
userland, since this makes it impossible to modify kernel side of the
things. IMO rarely executed translation is not a problem, and I can
control the ABI. At least until I find time to implement VDSO, where
the problem of ABI stability for kernel->user transport will be solved
completely.

--N+dhEFW7Y2Uiel/w
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (FreeBSD)

iEYEARECAAYFAk/RMa8ACgkQC3+MBN1Mb4hAIwCgioJKGPnE7gfckztJYNCQJONj
PZYAn0rdxvVdcGmz7iM5SYF8R67ivu7G
=b1NG
-----END PGP SIGNATURE-----

--N+dhEFW7Y2Uiel/w--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120607225648.GC85127>