Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 22 Jun 2012 13:55:56 +0300
From:      Alexander Motin <mav@FreeBSD.org>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org
Subject:   Re: svn commit: r237433 - in head/sys: amd64/include arm/include conf i386/include ia64/include kern mips/include pc98/include powerpc/include sparc64/include sys x86/include x86/x86
Message-ID:  <4FE44F3C.5040108@FreeBSD.org>
In-Reply-To: <20120622104626.GE2337@deviant.kiev.zoral.com.ua>
References:  <201206220706.q5M76fbO062751@svn.freebsd.org> <4FE42812.3050807@FreeBSD.org> <20120622082502.GB2337@deviant.kiev.zoral.com.ua> <4FE432C4.7000608@FreeBSD.org> <20120622102342.GD2337@deviant.kiev.zoral.com.ua> <20120622104626.GE2337@deviant.kiev.zoral.com.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
On 06/22/12 13:46, Konstantin Belousov wrote:
> On Fri, Jun 22, 2012 at 01:23:42PM +0300, Konstantin Belousov wrote:
>> On Fri, Jun 22, 2012 at 11:54:28AM +0300, Alexander Motin wrote:
>>> On 22.06.2012 11:25, Konstantin Belousov wrote:
>>>> On Fri, Jun 22, 2012 at 11:08:50AM +0300, Alexander Motin wrote:
>>>>> On 06/22/12 10:06, Konstantin Belousov wrote:
>>>>>> Author: kib
>>>>>> Date: Fri Jun 22 07:06:40 2012
>>>>>> New Revision: 237433
>>>>>> URL: http://svn.freebsd.org/changeset/base/237433
>>>>>>
>>>>>> Log:
>>>>>>    Implement mechanism to export some kernel timekeeping data to
>>>>>>    usermode, using shared page.  The structures and functions have vdso
>>>>>>    prefix, to indicate the intended location of the code in some future.
>>>>>>
>>>>>>    The versioned per-algorithm data is exported in the format of struct
>>>>>>    vdso_timehands, which mostly repeats the content of in-kernel struct
>>>>>>    timehands. Usermode reading of the structure can be lockless.
>>>>>>    Compatibility export for 32bit processes on 64bit host is also
>>>>>>    provided. Kernel also provides usermode with indication about
>>>>>>    currently used timecounter, so that libc can fall back to syscall if
>>>>>>    configured timecounter is unknown to usermode code.
>>>>>>
>>>>>>    The shared data updates are initiated both from the tc_windup(), where
>>>>>>    a fast task is queued to do the update, and from sysctl handlers which
>>>>>>    change timecounter. A manual override switch
>>>>>>    kern.timecounter.fast_gettime allows to turn off the mechanism.
>>>>>>
>>>>>>    Only x86 architectures export the real algorithm data, and there, only
>>>>>>    for tsc timecounter. HPET counters page could be exported as well, but
>>>>>>    I prefer to not further glue the kernel and libc ABI there until
>>>>>>    proper vdso-based solution is developed.
>>>>>>
>>>>>>    Minimal stubs neccessary for non-x86 architectures to still compile
>>>>>>    are provided.
>>>>>>
>>>>>>    Discussed with:	bde
>>>>>>    Reviewed by:	jhb
>>>>>>    Tested by:	flo
>>>>>>    MFC after:	1 month
>>>>>
>>>>>
>>>>>> @@ -1360,6 +1367,7 @@ tc_windup(void)
>>>>>>   #endif
>>>>>>
>>>>>>   	timehands = th;
>>>>>> +	taskqueue_enqueue_fast(taskqueue_fast,&tc_windup_push_vdso_task);
>>>>>>   }
>>>>>>
>>>>>>   /* Report or change the active timecounter hardware. */
>>>>>
>>>>> This taskqueue_enqueue_fast() will schedule extra thread to run each
>>>>> time hardclock() fires. One thread may be not a big problem, but
>>>>> together with callout swi and possible other threads woken up there it
>>>>> will wake up several other CPU cores from sleep just to put them back in
>>>>> few microseconds. Now davide@ and me are trying to fix that by avoiding
>>>>> callout SWI use for simple tasks. Please, let's not create another
>>>>> problem same time.
>>>>
>>>> The patch was public for quite a time. If you have some comments about
>>>> it, it would be much more productive to let me know about them before
>>>> the commit, not after.
>>>
>>> I'm sorry, I haven't seen it. My mad.
>>>
>>>> Anyway, what is your proposal for 'let's not create another problem
>>>> same time' part of the message ? It was discussed, as a possibility,
>>>> to have permanent mapping for the shared page in the KVA and to perform
>>>> lock-less update of the struct vdso_timehands directly from hardclock
>>>> handler. My opinion was that amount of work added by tc_windup
>>>> eventhandler is not trivial, so it is better to be postponed to
>>>> less critical context. It also slightly more safe to not perform
>>>> lockless update for vdso_timehands, since otherwise module load which
>>>> register exec handler could cause transient gettimeofday() failure
>>>> in usermode.
>>>>
>>>> This might boil down to the fact that tc_windup function is called
>>>> too often, in fact. Also, packing execution of tc_windup eventhandler
>>>> together with the clock swi is fine from my POV.
>>>
>>> I have nothing against using shared pages. On UP system I would probably
>>> have not so much against several threads. But on SMP system it will
>>> cause at least one, but in many cases two extra CPUs to be woken up.
>>> There are two or more threads to run on hardclock(): this taskqueue,
>>> callout swi and some thread(s) woken from callouts. Scheduler has no
>>> idea how heavy they are. So it will try to move each of them to separate
>>> idle CPU. Does the amount of work done in event handlers worth extra
>>> Watts consumed by rapidly waking CPUs? As quite rare person running
>>> FreeBSD on laptop, I am not sure. I am not sure even that on
>>> desktop/server this won't kill all benefit of fast clocks by limiting
>>> TurboBoost.
>>
>> Patch below would probably work, but I cannot test it right now on real
>> hardware due to ACPI issue. It worked for me in qemu.
>>
>> commit 4f2ffd93b36d20eae61495776fc6b0855745fd7f
>> Author: Konstantin Belousov <kib@freebsd.org>
>> Date:   Fri Jun 22 13:19:22 2012 +0300
>>
>>      Use persistent kernel mapping of the shared page, and update the
>>      vdso_timehands from hardclock, instead of scheduling task.
>
> Slightly improved version. Since tc_fill_vdso_timehands is now
> called from hardclock context, thee is no need to spin waiting for
> valid current generation of timehands.

I can't evaluate how much "hackish" this is from memory management side, 
but I'm glad there is some viable solution.

Thank you!

-- 
Alexander Motin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4FE44F3C.5040108>