Date: Fri, 22 Jun 2012 13:55:56 +0300 From: Alexander Motin <mav@FreeBSD.org> To: Konstantin Belousov <kostikbel@gmail.com> Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org Subject: Re: svn commit: r237433 - in head/sys: amd64/include arm/include conf i386/include ia64/include kern mips/include pc98/include powerpc/include sparc64/include sys x86/include x86/x86 Message-ID: <4FE44F3C.5040108@FreeBSD.org> In-Reply-To: <20120622104626.GE2337@deviant.kiev.zoral.com.ua> References: <201206220706.q5M76fbO062751@svn.freebsd.org> <4FE42812.3050807@FreeBSD.org> <20120622082502.GB2337@deviant.kiev.zoral.com.ua> <4FE432C4.7000608@FreeBSD.org> <20120622102342.GD2337@deviant.kiev.zoral.com.ua> <20120622104626.GE2337@deviant.kiev.zoral.com.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
On 06/22/12 13:46, Konstantin Belousov wrote: > On Fri, Jun 22, 2012 at 01:23:42PM +0300, Konstantin Belousov wrote: >> On Fri, Jun 22, 2012 at 11:54:28AM +0300, Alexander Motin wrote: >>> On 22.06.2012 11:25, Konstantin Belousov wrote: >>>> On Fri, Jun 22, 2012 at 11:08:50AM +0300, Alexander Motin wrote: >>>>> On 06/22/12 10:06, Konstantin Belousov wrote: >>>>>> Author: kib >>>>>> Date: Fri Jun 22 07:06:40 2012 >>>>>> New Revision: 237433 >>>>>> URL: http://svn.freebsd.org/changeset/base/237433 >>>>>> >>>>>> Log: >>>>>> Implement mechanism to export some kernel timekeeping data to >>>>>> usermode, using shared page. The structures and functions have vdso >>>>>> prefix, to indicate the intended location of the code in some future. >>>>>> >>>>>> The versioned per-algorithm data is exported in the format of struct >>>>>> vdso_timehands, which mostly repeats the content of in-kernel struct >>>>>> timehands. Usermode reading of the structure can be lockless. >>>>>> Compatibility export for 32bit processes on 64bit host is also >>>>>> provided. Kernel also provides usermode with indication about >>>>>> currently used timecounter, so that libc can fall back to syscall if >>>>>> configured timecounter is unknown to usermode code. >>>>>> >>>>>> The shared data updates are initiated both from the tc_windup(), where >>>>>> a fast task is queued to do the update, and from sysctl handlers which >>>>>> change timecounter. A manual override switch >>>>>> kern.timecounter.fast_gettime allows to turn off the mechanism. >>>>>> >>>>>> Only x86 architectures export the real algorithm data, and there, only >>>>>> for tsc timecounter. HPET counters page could be exported as well, but >>>>>> I prefer to not further glue the kernel and libc ABI there until >>>>>> proper vdso-based solution is developed. >>>>>> >>>>>> Minimal stubs neccessary for non-x86 architectures to still compile >>>>>> are provided. >>>>>> >>>>>> Discussed with: bde >>>>>> Reviewed by: jhb >>>>>> Tested by: flo >>>>>> MFC after: 1 month >>>>> >>>>> >>>>>> @@ -1360,6 +1367,7 @@ tc_windup(void) >>>>>> #endif >>>>>> >>>>>> timehands = th; >>>>>> + taskqueue_enqueue_fast(taskqueue_fast,&tc_windup_push_vdso_task); >>>>>> } >>>>>> >>>>>> /* Report or change the active timecounter hardware. */ >>>>> >>>>> This taskqueue_enqueue_fast() will schedule extra thread to run each >>>>> time hardclock() fires. One thread may be not a big problem, but >>>>> together with callout swi and possible other threads woken up there it >>>>> will wake up several other CPU cores from sleep just to put them back in >>>>> few microseconds. Now davide@ and me are trying to fix that by avoiding >>>>> callout SWI use for simple tasks. Please, let's not create another >>>>> problem same time. >>>> >>>> The patch was public for quite a time. If you have some comments about >>>> it, it would be much more productive to let me know about them before >>>> the commit, not after. >>> >>> I'm sorry, I haven't seen it. My mad. >>> >>>> Anyway, what is your proposal for 'let's not create another problem >>>> same time' part of the message ? It was discussed, as a possibility, >>>> to have permanent mapping for the shared page in the KVA and to perform >>>> lock-less update of the struct vdso_timehands directly from hardclock >>>> handler. My opinion was that amount of work added by tc_windup >>>> eventhandler is not trivial, so it is better to be postponed to >>>> less critical context. It also slightly more safe to not perform >>>> lockless update for vdso_timehands, since otherwise module load which >>>> register exec handler could cause transient gettimeofday() failure >>>> in usermode. >>>> >>>> This might boil down to the fact that tc_windup function is called >>>> too often, in fact. Also, packing execution of tc_windup eventhandler >>>> together with the clock swi is fine from my POV. >>> >>> I have nothing against using shared pages. On UP system I would probably >>> have not so much against several threads. But on SMP system it will >>> cause at least one, but in many cases two extra CPUs to be woken up. >>> There are two or more threads to run on hardclock(): this taskqueue, >>> callout swi and some thread(s) woken from callouts. Scheduler has no >>> idea how heavy they are. So it will try to move each of them to separate >>> idle CPU. Does the amount of work done in event handlers worth extra >>> Watts consumed by rapidly waking CPUs? As quite rare person running >>> FreeBSD on laptop, I am not sure. I am not sure even that on >>> desktop/server this won't kill all benefit of fast clocks by limiting >>> TurboBoost. >> >> Patch below would probably work, but I cannot test it right now on real >> hardware due to ACPI issue. It worked for me in qemu. >> >> commit 4f2ffd93b36d20eae61495776fc6b0855745fd7f >> Author: Konstantin Belousov <kib@freebsd.org> >> Date: Fri Jun 22 13:19:22 2012 +0300 >> >> Use persistent kernel mapping of the shared page, and update the >> vdso_timehands from hardclock, instead of scheduling task. > > Slightly improved version. Since tc_fill_vdso_timehands is now > called from hardclock context, thee is no need to spin waiting for > valid current generation of timehands. I can't evaluate how much "hackish" this is from memory management side, but I'm glad there is some viable solution. Thank you! -- Alexander Motin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4FE44F3C.5040108>