Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 22 Jun 2012 11:54:28 +0300
From:      Alexander Motin <mav@FreeBSD.org>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        svn-src-head@FreeBSD.org, svn-src-all@FreeBSD.org, src-committers@FreeBSD.org
Subject:   Re: svn commit: r237433 - in head/sys: amd64/include arm/include conf i386/include ia64/include kern mips/include pc98/include powerpc/include sparc64/include sys x86/include x86/x86
Message-ID:  <4FE432C4.7000608@FreeBSD.org>
In-Reply-To: <20120622082502.GB2337@deviant.kiev.zoral.com.ua>
References:  <201206220706.q5M76fbO062751@svn.freebsd.org> <4FE42812.3050807@FreeBSD.org> <20120622082502.GB2337@deviant.kiev.zoral.com.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
On 22.06.2012 11:25, Konstantin Belousov wrote:
> On Fri, Jun 22, 2012 at 11:08:50AM +0300, Alexander Motin wrote:
>> On 06/22/12 10:06, Konstantin Belousov wrote:
>>> Author: kib
>>> Date: Fri Jun 22 07:06:40 2012
>>> New Revision: 237433
>>> URL: http://svn.freebsd.org/changeset/base/237433
>>>
>>> Log:
>>>    Implement mechanism to export some kernel timekeeping data to
>>>    usermode, using shared page.  The structures and functions have vdso
>>>    prefix, to indicate the intended location of the code in some future.
>>>
>>>    The versioned per-algorithm data is exported in the format of struct
>>>    vdso_timehands, which mostly repeats the content of in-kernel struct
>>>    timehands. Usermode reading of the structure can be lockless.
>>>    Compatibility export for 32bit processes on 64bit host is also
>>>    provided. Kernel also provides usermode with indication about
>>>    currently used timecounter, so that libc can fall back to syscall if
>>>    configured timecounter is unknown to usermode code.
>>>
>>>    The shared data updates are initiated both from the tc_windup(), where
>>>    a fast task is queued to do the update, and from sysctl handlers which
>>>    change timecounter. A manual override switch
>>>    kern.timecounter.fast_gettime allows to turn off the mechanism.
>>>
>>>    Only x86 architectures export the real algorithm data, and there, only
>>>    for tsc timecounter. HPET counters page could be exported as well, but
>>>    I prefer to not further glue the kernel and libc ABI there until
>>>    proper vdso-based solution is developed.
>>>
>>>    Minimal stubs neccessary for non-x86 architectures to still compile
>>>    are provided.
>>>
>>>    Discussed with:	bde
>>>    Reviewed by:	jhb
>>>    Tested by:	flo
>>>    MFC after:	1 month
>>
>>
>>> @@ -1360,6 +1367,7 @@ tc_windup(void)
>>>   #endif
>>>
>>>   	timehands = th;
>>> +	taskqueue_enqueue_fast(taskqueue_fast,&tc_windup_push_vdso_task);
>>>   }
>>>
>>>   /* Report or change the active timecounter hardware. */
>>
>> This taskqueue_enqueue_fast() will schedule extra thread to run each
>> time hardclock() fires. One thread may be not a big problem, but
>> together with callout swi and possible other threads woken up there it
>> will wake up several other CPU cores from sleep just to put them back in
>> few microseconds. Now davide@ and me are trying to fix that by avoiding
>> callout SWI use for simple tasks. Please, let's not create another
>> problem same time.
>
> The patch was public for quite a time. If you have some comments about
> it, it would be much more productive to let me know about them before
> the commit, not after.

I'm sorry, I haven't seen it. My mad.

> Anyway, what is your proposal for 'let's not create another problem
> same time' part of the message ? It was discussed, as a possibility,
> to have permanent mapping for the shared page in the KVA and to perform
> lock-less update of the struct vdso_timehands directly from hardclock
> handler. My opinion was that amount of work added by tc_windup
> eventhandler is not trivial, so it is better to be postponed to
> less critical context. It also slightly more safe to not perform
> lockless update for vdso_timehands, since otherwise module load which
> register exec handler could cause transient gettimeofday() failure
> in usermode.
>
> This might boil down to the fact that tc_windup function is called
> too often, in fact. Also, packing execution of tc_windup eventhandler
> together with the clock swi is fine from my POV.

I have nothing against using shared pages. On UP system I would probably 
have not so much against several threads. But on SMP system it will 
cause at least one, but in many cases two extra CPUs to be woken up. 
There are two or more threads to run on hardclock(): this taskqueue, 
callout swi and some thread(s) woken from callouts. Scheduler has no 
idea how heavy they are. So it will try to move each of them to separate 
idle CPU. Does the amount of work done in event handlers worth extra 
Watts consumed by rapidly waking CPUs? As quite rare person running 
FreeBSD on laptop, I am not sure. I am not sure even that on 
desktop/server this won't kill all benefit of fast clocks by limiting 
TurboBoost.

-- 
Alexander Motin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4FE432C4.7000608>