From owner-freebsd-arch@FreeBSD.ORG Mon Jun 4 18:19:30 2012 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 3C44010657DB; Mon, 4 Jun 2012 18:19:30 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id 817478FC0A; Mon, 4 Jun 2012 18:19:29 +0000 (UTC) Received: from skuns.kiev.zoral.com.ua (localhost [127.0.0.1]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id q54IJIpJ045754; Mon, 4 Jun 2012 21:19:18 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5) with ESMTP id q54IJH3K092775; Mon, 4 Jun 2012 21:19:17 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5/Submit) id q54IJHlL092774; Mon, 4 Jun 2012 21:19:17 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Mon, 4 Jun 2012 21:19:17 +0300 From: Konstantin Belousov To: John Baldwin Message-ID: <20120604181917.GD85127@deviant.kiev.zoral.com.ua> References: <20120603051904.GG2358@deviant.kiev.zoral.com.ua> <20120603184315.T856@besplex.bde.org> <201206041101.57486.jhb@freebsd.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="mSxgbZZZvrAyzONB" Content-Disposition: inline In-Reply-To: <201206041101.57486.jhb@freebsd.org> User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-4.0 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: Gianni , Alan Cox , Alexander Kabaev , Attilio Rao , freebsd-arch@freebsd.org Subject: Re: Fwd: [RFC] Kernel shared variables X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 Jun 2012 18:19:30 -0000 --mSxgbZZZvrAyzONB Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Jun 04, 2012 at 11:01:57AM -0400, John Baldwin wrote: > On Sunday, June 03, 2012 6:49:27 am Bruce Evans wrote: > > On Sun, 3 Jun 2012, Konstantin Belousov wrote: > >=20 > > > On Sun, Jun 03, 2012 at 07:28:09AM +1000, Bruce Evans wrote: > > >> On Sat, 2 Jun 2012, Konstantin Belousov wrote: > > >>> ... > > >>> In fact, I think that if the whole goal is only fast clocks, then we > > >>> do not need any additional system mechanisms, since we can easily e= xport > > >>> coefficients for rdtsc formula already. E.g. we can put it into elf= auxv, > > >>> which is ugly but bearable. > > >> > > >> How do you get the timehands offsets? These only need to be updated > > >> every second or so, or when used, but how can the application know > > >> when they need to be updated if this is not done automatically in the > > >> kernel by writing to a shared page? I can only think of the > > >> application arranging an alarm signal every second or so and updating > > >> then. No good for libraries. > > > What is timehands offsets ? Do you mean things like leap seconds ? > >=20 > > Yes. binuptime() is: > >=20 > > % void > > % binuptime(struct bintime *bt) > > % { > > % struct timehands *th; > > % u_int gen; > > %=20 > > % do { > > % th =3D timehands; > > % gen =3D th->th_generation; > > % *bt =3D th->th_offset; > > % bintime_addx(bt, th->th_scale * tc_delta(th)); > > % } while (gen =3D=3D 0 || gen !=3D th->th_generation); > > % } > >=20 > > Without the kernel providing th->th_offset, you have to do lots of ntp > > handling for yourself (compatibly with the kernel) just to get an > > accuracy of 1 second. Leap seconds don't affect CLOCK_MONOTONIC, but > > they do affect CLOCK_REALTIME which is the clock id used by > > gettimeofday(). For the former, you only have to advance the offset > > for yourself occasionally (compatibly with the kernel) and manage > > (compatibly with the kernel, especially in the long term) ntp slewing > > and other syscall/sysctl kernel activity that micro-adjusts th->th_scal= e. >=20 > I think duplicating this logic in userland would just be wasteful. I have > a private fast gettimeofday() at my current job and it works by exporting > the current timehands structure (well, the equivalent) to userland. The > userland bits then fetch a copy of the details and do the same as bintime= (). > (I move the math (bintime_addx() and the multiply)) out of the loop howev= er. I started yesterday an implementation which uses shared page to export some variant of timehands, and uses auxv to provide the libc with a pointer to timehands when rdtsc is reasonable. I almost finished both 32bit and 64bit userspace, but there is kernel-side work left. Is your implementation ready or close to be ready for commit ? In other words, should I drop the efforts, or continue ? >=20 > > > This is indeed problematic for auxv. For auxv it could be solved by > > > providing offset for next recheck using syscalls, and making libc cod= e to > > > respect this offset. But, I do think that vdso in shared page > > > is the right solution, not auxv. > >=20 > > timehands in a shared pages is close to working. th_generation protects > > things in the same way as in the kernel, modulo assumptions that writes > > are ordered. >=20 > It would work fine. And in fact, having multiple timehands is actually a > bug, not a feature. It lets you compute bogus timestamps if you get pree= mpted > at the wrong time and end up with time jumping around. At Yahoo! we redu= ced > the number of timehands structures down to 2 or some such, and I'm now of > the opinion we should just have one and dispense with the entire array. >=20 > For my userland case I only export a single timehands copy. Well, I have to use two copies due to time_t ABI differences, one for 32, and one for 64-bit. >=20 > > >> rdtsc is also very unportable, even on CPUs that have it. But all o= ther > > >> x86 timecounter hardware is too slow if you want gettimeofday() to b= e fast > > >> and as accurate as it is now. >=20 > For all the hardware where people run mysql and similar software that cal= ls > getimeofday() a lot, rdtsc() works just fine. I also try to mimic kernel code as close as possible, so there are two possible tsc counters, selection is managed by kernel, but the code lives in libc or possible vdso. But I do not see immediate use for vdso just for gettimeofday(2) and clock_gettime(2), although having vdso to provide unwinding tables for signal trampolines is _very_ desirable. >=20 > > > !rdtsc hardware is probably cannot be used at all due to need to prov= ide > > > usermode access to device registers. The mere presence of rdtsc does = not > > > means that usermode indeed can use it, it should be decided by kernel > > > based on the current in-kernel time source. If rdtsc is not usable, t= he > > > corresponding data should not be exported, or implementation should go > > > directly into syscall or whatever. >=20 > Yes, the patches I have only work if the kernel uses the TSC as its main > timecounter as well. >=20 > > But then applications would: > > - use gettimeofday() more than they should ("it works on Linux"), even > > more than now since when "it works on FreeBSD-x86" too > > - just be slow when gettimeofday() is slow > > - kludge around gettimeofday() being slow like they do now > > - kludge around gettimeofday() being slow not like they do now (use more > > complications to probe it being slow). >=20 > Some applications really need fine-grained timing with as little overhead > as possible. >=20 > --=20 > John Baldwin --mSxgbZZZvrAyzONB Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (FreeBSD) iEYEARECAAYFAk/M/CUACgkQC3+MBN1Mb4ghXwCgkPtKRATwrzKbJDD0j9LeoqLR 0/MAnRtpx6mS4HOad3y/lgGdV2bducK9 =zlG/ -----END PGP SIGNATURE----- --mSxgbZZZvrAyzONB--