From owner-svn-src-head@FreeBSD.ORG Fri Jun 22 10:56:03 2012 Return-Path: Delivered-To: svn-src-head@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 75CDD106566B; Fri, 22 Jun 2012 10:56:03 +0000 (UTC) (envelope-from mavbsd@gmail.com) Received: from mail-lb0-f182.google.com (mail-lb0-f182.google.com [209.85.217.182]) by mx1.freebsd.org (Postfix) with ESMTP id 687EF8FC1C; Fri, 22 Jun 2012 10:56:02 +0000 (UTC) Received: by lbon10 with SMTP id n10so4199785lbo.13 for ; Fri, 22 Jun 2012 03:56:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=n56Zzf2SkirEAWcoEB4xiY4eaFX+eMcCwkymXk2Ics8=; b=M7XL42czxv2CWSSs2sFBfR7oRaxTBsVXpI2mh0vXh92bebnb+eo4JzwAV/zqTx+fXF S31ecsBynR03KZ9r4Je3A8fsfb5WCw38kjU+4G7CvDOxwvQ4wl3WIYP3/HnKkrTL2Jl9 iYHPFcMcgeDAnFAVdCcJ+XB2ki5o02xsEg6/dHvu17FgR3Epf5PhA70wgNL37EoTqCvL jVHExiTmhPWHHUrv+QyQq6O9S589Zx3/ioRzUfPteTN38NyHIBKnWD49yU4DaW7LtHQF LHeTEkMmqyMXe/iCx1qVLe0LfdbkPFcN97NB1Zba/NQ9/PFIZxJNwRxAVJFaByb+s5ct FV+Q== Received: by 10.112.103.130 with SMTP id fw2mr1270950lbb.107.1340362561056; Fri, 22 Jun 2012 03:56:01 -0700 (PDT) Received: from mavbook2.mavhome.dp.ua (pc.mavhome.dp.ua. [212.86.226.226]) by mx.google.com with ESMTPS id q8sm20660474lbj.2.2012.06.22.03.55.58 (version=SSLv3 cipher=OTHER); Fri, 22 Jun 2012 03:56:00 -0700 (PDT) Sender: Alexander Motin Message-ID: <4FE44F3C.5040108@FreeBSD.org> Date: Fri, 22 Jun 2012 13:55:56 +0300 From: Alexander Motin User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:13.0) Gecko/20120618 Thunderbird/13.0 MIME-Version: 1.0 To: Konstantin Belousov References: <201206220706.q5M76fbO062751@svn.freebsd.org> <4FE42812.3050807@FreeBSD.org> <20120622082502.GB2337@deviant.kiev.zoral.com.ua> <4FE432C4.7000608@FreeBSD.org> <20120622102342.GD2337@deviant.kiev.zoral.com.ua> <20120622104626.GE2337@deviant.kiev.zoral.com.ua> In-Reply-To: <20120622104626.GE2337@deviant.kiev.zoral.com.ua> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org Subject: Re: svn commit: r237433 - in head/sys: amd64/include arm/include conf i386/include ia64/include kern mips/include pc98/include powerpc/include sparc64/include sys x86/include x86/x86 X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Jun 2012 10:56:03 -0000 On 06/22/12 13:46, Konstantin Belousov wrote: > On Fri, Jun 22, 2012 at 01:23:42PM +0300, Konstantin Belousov wrote: >> On Fri, Jun 22, 2012 at 11:54:28AM +0300, Alexander Motin wrote: >>> On 22.06.2012 11:25, Konstantin Belousov wrote: >>>> On Fri, Jun 22, 2012 at 11:08:50AM +0300, Alexander Motin wrote: >>>>> On 06/22/12 10:06, Konstantin Belousov wrote: >>>>>> Author: kib >>>>>> Date: Fri Jun 22 07:06:40 2012 >>>>>> New Revision: 237433 >>>>>> URL: http://svn.freebsd.org/changeset/base/237433 >>>>>> >>>>>> Log: >>>>>> Implement mechanism to export some kernel timekeeping data to >>>>>> usermode, using shared page. The structures and functions have vdso >>>>>> prefix, to indicate the intended location of the code in some future. >>>>>> >>>>>> The versioned per-algorithm data is exported in the format of struct >>>>>> vdso_timehands, which mostly repeats the content of in-kernel struct >>>>>> timehands. Usermode reading of the structure can be lockless. >>>>>> Compatibility export for 32bit processes on 64bit host is also >>>>>> provided. Kernel also provides usermode with indication about >>>>>> currently used timecounter, so that libc can fall back to syscall if >>>>>> configured timecounter is unknown to usermode code. >>>>>> >>>>>> The shared data updates are initiated both from the tc_windup(), where >>>>>> a fast task is queued to do the update, and from sysctl handlers which >>>>>> change timecounter. A manual override switch >>>>>> kern.timecounter.fast_gettime allows to turn off the mechanism. >>>>>> >>>>>> Only x86 architectures export the real algorithm data, and there, only >>>>>> for tsc timecounter. HPET counters page could be exported as well, but >>>>>> I prefer to not further glue the kernel and libc ABI there until >>>>>> proper vdso-based solution is developed. >>>>>> >>>>>> Minimal stubs neccessary for non-x86 architectures to still compile >>>>>> are provided. >>>>>> >>>>>> Discussed with: bde >>>>>> Reviewed by: jhb >>>>>> Tested by: flo >>>>>> MFC after: 1 month >>>>> >>>>> >>>>>> @@ -1360,6 +1367,7 @@ tc_windup(void) >>>>>> #endif >>>>>> >>>>>> timehands = th; >>>>>> + taskqueue_enqueue_fast(taskqueue_fast,&tc_windup_push_vdso_task); >>>>>> } >>>>>> >>>>>> /* Report or change the active timecounter hardware. */ >>>>> >>>>> This taskqueue_enqueue_fast() will schedule extra thread to run each >>>>> time hardclock() fires. One thread may be not a big problem, but >>>>> together with callout swi and possible other threads woken up there it >>>>> will wake up several other CPU cores from sleep just to put them back in >>>>> few microseconds. Now davide@ and me are trying to fix that by avoiding >>>>> callout SWI use for simple tasks. Please, let's not create another >>>>> problem same time. >>>> >>>> The patch was public for quite a time. If you have some comments about >>>> it, it would be much more productive to let me know about them before >>>> the commit, not after. >>> >>> I'm sorry, I haven't seen it. My mad. >>> >>>> Anyway, what is your proposal for 'let's not create another problem >>>> same time' part of the message ? It was discussed, as a possibility, >>>> to have permanent mapping for the shared page in the KVA and to perform >>>> lock-less update of the struct vdso_timehands directly from hardclock >>>> handler. My opinion was that amount of work added by tc_windup >>>> eventhandler is not trivial, so it is better to be postponed to >>>> less critical context. It also slightly more safe to not perform >>>> lockless update for vdso_timehands, since otherwise module load which >>>> register exec handler could cause transient gettimeofday() failure >>>> in usermode. >>>> >>>> This might boil down to the fact that tc_windup function is called >>>> too often, in fact. Also, packing execution of tc_windup eventhandler >>>> together with the clock swi is fine from my POV. >>> >>> I have nothing against using shared pages. On UP system I would probably >>> have not so much against several threads. But on SMP system it will >>> cause at least one, but in many cases two extra CPUs to be woken up. >>> There are two or more threads to run on hardclock(): this taskqueue, >>> callout swi and some thread(s) woken from callouts. Scheduler has no >>> idea how heavy they are. So it will try to move each of them to separate >>> idle CPU. Does the amount of work done in event handlers worth extra >>> Watts consumed by rapidly waking CPUs? As quite rare person running >>> FreeBSD on laptop, I am not sure. I am not sure even that on >>> desktop/server this won't kill all benefit of fast clocks by limiting >>> TurboBoost. >> >> Patch below would probably work, but I cannot test it right now on real >> hardware due to ACPI issue. It worked for me in qemu. >> >> commit 4f2ffd93b36d20eae61495776fc6b0855745fd7f >> Author: Konstantin Belousov >> Date: Fri Jun 22 13:19:22 2012 +0300 >> >> Use persistent kernel mapping of the shared page, and update the >> vdso_timehands from hardclock, instead of scheduling task. > > Slightly improved version. Since tc_fill_vdso_timehands is now > called from hardclock context, thee is no need to spin waiting for > valid current generation of timehands. I can't evaluate how much "hackish" this is from memory management side, but I'm glad there is some viable solution. Thank you! -- Alexander Motin