From owner-freebsd-arch@FreeBSD.ORG  Sun Jun  3 05:19:11 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id DD0C6106564A;
	Sun,  3 Jun 2012 05:19:11 +0000 (UTC)
	(envelope-from kostikbel@gmail.com)
Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200])
	by mx1.freebsd.org (Postfix) with ESMTP id 5DDF48FC0A;
	Sun,  3 Jun 2012 05:19:11 +0000 (UTC)
Received: from skuns.kiev.zoral.com.ua (localhost [127.0.0.1])
	by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id q535J5Qr016796;
	Sun, 3 Jun 2012 08:19:05 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1])
	by deviant.kiev.zoral.com.ua (8.14.5/8.14.5) with ESMTP id
	q535J4m1082109; Sun, 3 Jun 2012 08:19:04 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
Received: (from kostik@localhost)
	by deviant.kiev.zoral.com.ua (8.14.5/8.14.5/Submit) id q535J4j0082108; 
	Sun, 3 Jun 2012 08:19:04 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to
	kostikbel@gmail.com using -f
Date: Sun, 3 Jun 2012 08:19:04 +0300
From: Konstantin Belousov <kostikbel@gmail.com>
To: Bruce Evans <brde@optusnet.com.au>
Message-ID: <20120603051904.GG2358@deviant.kiev.zoral.com.ua>
References: <CACfq090r1tWhuDkxdSZ24fwafbVKU0yduu1yV2+oYo+wwT4ipA@mail.gmail.com>
	<20120601193522.GA2358@deviant.kiev.zoral.com.ua>
	<CAJ-FndC71=3Jo+BxQi==gCoLipBxj8X8XMBydjvrcKeGw+WOnA@mail.gmail.com>
	<20120602164847.GB2358@deviant.kiev.zoral.com.ua>
	<CAJ-FndAXFwuEspq+QeF0Hv1dr8JjREP=c=g3-abP=eoZ-D4hEg@mail.gmail.com>
	<CAJ-FndCpztSWyJo2hRVs5qu+vQOj9E1mPBhfVOxM_OC2eNac6A@mail.gmail.com>
	<20120602171632.GC2358@deviant.kiev.zoral.com.ua>
	<20120603063330.H3418@besplex.bde.org>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="eEhvUqzJgUABKnxr"
Content-Disposition: inline
In-Reply-To: <20120603063330.H3418@besplex.bde.org>
User-Agent: Mutt/1.4.2.3i
X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua
X-Virus-Status: Clean
X-Spam-Status: No, score=-4.0 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00
	autolearn=ham version=3.2.5
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on
	skuns.kiev.zoral.com.ua
Cc: Gianni <gianni@freebsd.org>, Alan Cox <alc@rice.edu>,
	Alexander Kabaev <kan@freebsd.org>, Attilio Rao <attilio@freebsd.org>,
	Konstantin Belousov <kib@freebsd.org>, freebsd-arch@freebsd.org
Subject: Re: Fwd: [RFC] Kernel shared variables
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 03 Jun 2012 05:19:11 -0000


--eEhvUqzJgUABKnxr
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Sun, Jun 03, 2012 at 07:28:09AM +1000, Bruce Evans wrote:
> On Sat, 2 Jun 2012, Konstantin Belousov wrote:
>=20
> >On Sat, Jun 02, 2012 at 06:00:06PM +0100, Attilio Rao wrote:
> >>...
> >>2012/6/2 Konstantin Belousov <kostikbel@gmail.com>:
> >>>On Sat, Jun 02, 2012 at 02:01:35PM +0100, Attilio Rao wrote:
> >[Tried to trim the text]
>=20
> [Trimmed more]
>=20
> >>>Right, exactly, and this is why I object to the "offsets" approach.
> >>>It basically moves us to the old times of the "jump tables" shared
> >>>libraries, that fortunately was never a case for FreeBSD even when
> >>>a.out was used.
> >>
> >>I'm objecting to this either.
> >My english is not good enough to understand this. Do you agree or disagr=
ee
> >with my statement that 'indexes' make it very hard to maintain ABI ?
>=20
> Syscall numbers are basically indexes, and work OK (because there aren't
> many of them even after ~30-35 years of accumulating them).
>=20
> >...
> >>The gettimeofday() implementation is a different story than what is ask=
ed=20
> >>here.
> >
> >But the goal is to have fast clocks, right ? What else is planned ?
> >
> >In fact, I think that if the whole goal is only fast clocks, then we
> >do not need any additional system mechanisms, since we can easily export
> >coefficients for rdtsc formula already. E.g. we can put it into elf auxv,
> >which is ugly but bearable.
>=20
> How do you get the timehands offsets?  These only need to be updated
> every second or so, or when used, but how can the application know
> when they need to be updated if this is not done automatically in the
> kernel by writing to a shared page?  I can only think of the
> application arranging an alarm signal every second or so and updating
> then.  No good for libraries.
What is timehands offsets ? Do you mean things like leap seconds ?
This is indeed problematic for auxv. For auxv it could be solved by
providing offset for next recheck using syscalls, and making libc code to
respect this offset. But, I do think that vdso in shared page
is the right solution, not auxv.

>=20
> rdtsc is also very unportable, even on CPUs that have it.  But all other
> x86 timecounter hardware is too slow if you want gettimeofday() to be fast
> and as accurate as it is now.
!rdtsc hardware is probably cannot be used at all due to need to provide
usermode access to device registers. The mere presence of rdtsc does not
means that usermode indeed can use it, it should be decided by kernel
based on the current in-kernel time source. If rdtsc is not usable, the
corresponding data should not be exported, or implementation should go
directly into syscall or whatever.

In fact, I would be very grateful if an expert in time-keeping provided
concise description of the algorithm for translating rdtsc output into
struct timeval, also enumerating required parameters.

--eEhvUqzJgUABKnxr
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (FreeBSD)

iEYEARECAAYFAk/K88gACgkQC3+MBN1Mb4glQwCg1YIEeb2XDWk6r2fPtZ1/5rB0
yfYAoIXaW0zTrBFZOBQHEVFDhV1t/pNY
=N/wE
-----END PGP SIGNATURE-----

--eEhvUqzJgUABKnxr--

From owner-freebsd-arch@FreeBSD.ORG  Sun Jun  3 07:19:07 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id C561F106564A;
	Sun,  3 Jun 2012 07:19:07 +0000 (UTC) (envelope-from avg@FreeBSD.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
	by mx1.freebsd.org (Postfix) with ESMTP id A3D148FC12;
	Sun,  3 Jun 2012 07:19:06 +0000 (UTC)
Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua
	[212.40.38.100])
	by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id KAA28084;
	Sun, 03 Jun 2012 10:19:04 +0300 (EEST)
	(envelope-from avg@FreeBSD.org)
Received: from localhost ([127.0.0.1])
	by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
	id 1Sb55c-000M72-5L; Sun, 03 Jun 2012 10:19:04 +0300
Message-ID: <4FCB0FE5.4050607@FreeBSD.org>
Date: Sun, 03 Jun 2012 10:19:01 +0300
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:12.0) Gecko/20120503 Thunderbird/12.0.1
MIME-Version: 1.0
To: Attilio Rao <attilio@FreeBSD.org>, Mitsuru IWASAKI <iwasaki@jp.freebsd.org>
References: <20120603.002554.119853142.iwasaki@jp.FreeBSD.org>
	<CAJ-FndAfm4_XqFSwBqXK=cgWkE6YVrtkS5BbcH7zcRd-100xTw@mail.gmail.com>
In-Reply-To: <CAJ-FndAfm4_XqFSwBqXK=cgWkE6YVrtkS5BbcH7zcRd-100xTw@mail.gmail.com>
X-Enigmail-Version: 1.5pre
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Cc: freebsd-acpi@FreeBSD.org, freebsd-arch@FreeBSD.org
Subject: cpu stopping [Was: preparation for x86/acpica/acpi_wakeup.c]
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 03 Jun 2012 07:19:07 -0000

on 03/06/2012 00:39 Attilio Rao said the following:
> The first thing to consider is that right now we only have 2 states
> for CPUs: started and stopped. These states are controlled by
> started_cpus and stopped_cpus masks respectively. It seems you really
> want to add an intermediate level among the 2 where you have: started
> -> suspended -> started -> suspended ... -> stopped and you need to
> expand the mechanism for dealing with started and stopped cpus to do
> that. I'm pretty sure this will be very helpful also for other
> architectures that want to do the same.

As the first thing I must admit that I haven't looked at the patch :-)

But really I don't see why we need to differentiate between stopped and
suspended state as both of them ultimately mean exactly the same thing - CPUs
are spinning on some condition (and they are in a well-defined place and state).

My view of how this should work is:
- there can be only one master CPU that controls all other (slave) CPUs
- the master sets entry and exit hooks
- the master signals slaves to enter the stop state
- the slaves execute the enter hook and start spinning on the release condition
- the master does whatever it wants to do in this special system state
- the master signals the slaves to resume
- the slave exit the spin loop and execute the exit hook

We have almost all of this in place.  Only now we have different IPIs and
different IPI handlers to do the job (cpustop_handler and cpususpend_handler).
I think that the hooks model should be more universal.

In my opinion, what really would deserve a completely independent path is the
hard-stop case.  As this can be invoked nested to the other cases.  E.g. exotic
situations like a breakpoint or a trap or a panic in the suspend or the normal
stop code paths.

-- 
Andriy Gapon

From owner-freebsd-arch@FreeBSD.ORG  Sun Jun  3 09:54:10 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 6A6AC106564A;
	Sun,  3 Jun 2012 09:54:10 +0000 (UTC)
	(envelope-from asmrookie@gmail.com)
Received: from mail-lpp01m010-f54.google.com (mail-lpp01m010-f54.google.com
	[209.85.215.54])
	by mx1.freebsd.org (Postfix) with ESMTP id 7908D8FC08;
	Sun,  3 Jun 2012 09:54:09 +0000 (UTC)
Received: by laai10 with SMTP id i10so3072293laa.13
	for <multiple recipients>; Sun, 03 Jun 2012 02:54:08 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
	h=mime-version:sender:in-reply-to:references:date
	:x-google-sender-auth:message-id:subject:from:to:cc:content-type;
	bh=Pr4Hnj2HezsD1rehQuPsqCPcosSqnis1+4C86gL9b50=;
	b=WvH6NwwC/Ix+vXjpZx/aZxicqBDNEqCfqd2xK20qQaEu33kbIpese2sz68YUzszipm
	3iC62advSifR+flumJlcijLxoann4oMUSJsWiS4aBf1ThZqh66wqr8Z1ckOfknXmrR0R
	27lE/v8a/9PikykdUrdKwkIumen/4o6Al3ppoTRGfJ3rTh5QFT+LgZUsJz5153RxV5sf
	yOE9HT4zKJNDeaekISQHfWln1LkxLOieH/iXjzjMcnlV5c/wufDh8xutg6NdjNFcEl/G
	wdiwwi9aeF6F2kPIE7h5r8gLTYxgm1nj+EJS1DY4atHi1ME3wsoP/kYCKZjtY6EqlCHl
	fvIA==
MIME-Version: 1.0
Received: by 10.152.103.11 with SMTP id fs11mr8689233lab.23.1338717248070;
	Sun, 03 Jun 2012 02:54:08 -0700 (PDT)
Sender: asmrookie@gmail.com
Received: by 10.112.27.65 with HTTP; Sun, 3 Jun 2012 02:54:07 -0700 (PDT)
In-Reply-To: <4FCB0FE5.4050607@FreeBSD.org>
References: <20120603.002554.119853142.iwasaki@jp.FreeBSD.org>
	<CAJ-FndAfm4_XqFSwBqXK=cgWkE6YVrtkS5BbcH7zcRd-100xTw@mail.gmail.com>
	<4FCB0FE5.4050607@FreeBSD.org>
Date: Sun, 3 Jun 2012 10:54:07 +0100
X-Google-Sender-Auth: qSLZcHBUQgn9exkzIhnujbmC38s
Message-ID: <CAJ-FndAnx=UnxJCwLPtze7tu72wT4b+e2T_tHH+pup-VaxfiTw@mail.gmail.com>
From: Attilio Rao <attilio@freebsd.org>
To: Andriy Gapon <avg@freebsd.org>
Content-Type: text/plain; charset=UTF-8
Cc: freebsd-acpi@freebsd.org, Mitsuru IWASAKI <iwasaki@jp.freebsd.org>,
	freebsd-arch@freebsd.org
Subject: Re: cpu stopping [Was: preparation for x86/acpica/acpi_wakeup.c]
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 03 Jun 2012 09:54:10 -0000

2012/6/3 Andriy Gapon <avg@freebsd.org>:
> on 03/06/2012 00:39 Attilio Rao said the following:
>> The first thing to consider is that right now we only have 2 states
>> for CPUs: started and stopped. These states are controlled by
>> started_cpus and stopped_cpus masks respectively. It seems you really
>> want to add an intermediate level among the 2 where you have: started
>> -> suspended -> started -> suspended ... -> stopped and you need to
>> expand the mechanism for dealing with started and stopped cpus to do
>> that. I'm pretty sure this will be very helpful also for other
>> architectures that want to do the same.
>
> As the first thing I must admit that I haven't looked at the patch :-)
>
>
> But really I don't see why we need to differentiate between stopped and
> suspended state as both of them ultimately mean exactly the same thing - CPUs
> are spinning on some condition (and they are in a well-defined place and state).

This is debeatable and I'm not sure I agree.
At some point we may want to implement CPU on-the-fly suspension for
CPUs which is a different event than "stopping" (where stopping will
be "permanent stopping" and suspending will be "possible to recover
suspension").

The important thing about this is that we need to expand our model in
a way that it makes simple to add more states to the CPUs than simple
started/stopped. Right now we don't have any architecture for this in
place.

> My view of how this should work is:
> - there can be only one master CPU that controls all other (slave) CPUs
> - the master sets entry and exit hooks
> - the master signals slaves to enter the stop state
> - the slaves execute the enter hook and start spinning on the release condition
> - the master does whatever it wants to do in this special system state
> - the master signals the slaves to resume
> - the slave exit the spin loop and execute the exit hook
>
> We have almost all of this in place.  Only now we have different IPIs and
> different IPI handlers to do the job (cpustop_handler and cpususpend_handler).
> I think that the hooks model should be more universal.

For hook you mean like a rendezvous handler? I'm not sure I understand
otherwise.

> In my opinion, what really would deserve a completely independent path is the
> hard-stop case.  As this can be invoked nested to the other cases.  E.g. exotic
> situations like a breakpoint or a trap or a panic in the suspend or the normal
> stop code paths.

What I'm really interested is expanding our model in a way that it can
handle multiple CPU states. Then it is just a matter of adding the
right states and it is all trivial work.

And however, as already mentioned, I'm not sure I would assimilate
suspended = stopped.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein

From owner-freebsd-arch@FreeBSD.ORG  Sun Jun  3 10:49:46 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 4E753106566B;
	Sun,  3 Jun 2012 10:49:46 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail01.syd.optusnet.com.au (mail01.syd.optusnet.com.au
	[211.29.132.182])
	by mx1.freebsd.org (Postfix) with ESMTP id 295408FC1E;
	Sun,  3 Jun 2012 10:49:43 +0000 (UTC)
Received: from c122-106-171-232.carlnfd1.nsw.optusnet.com.au
	(c122-106-171-232.carlnfd1.nsw.optusnet.com.au [122.106.171.232])
	by mail01.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	q53AnRFV000363
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Sun, 3 Jun 2012 20:49:29 +1000
Date: Sun, 3 Jun 2012 20:49:27 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Konstantin Belousov <kostikbel@gmail.com>
In-Reply-To: <20120603051904.GG2358@deviant.kiev.zoral.com.ua>
Message-ID: <20120603184315.T856@besplex.bde.org>
References: <CACfq090r1tWhuDkxdSZ24fwafbVKU0yduu1yV2+oYo+wwT4ipA@mail.gmail.com>
	<20120601193522.GA2358@deviant.kiev.zoral.com.ua>
	<CAJ-FndC71=3Jo+BxQi==gCoLipBxj8X8XMBydjvrcKeGw+WOnA@mail.gmail.com>
	<20120602164847.GB2358@deviant.kiev.zoral.com.ua>
	<CAJ-FndAXFwuEspq+QeF0Hv1dr8JjREP=c=g3-abP=eoZ-D4hEg@mail.gmail.com>
	<CAJ-FndCpztSWyJo2hRVs5qu+vQOj9E1mPBhfVOxM_OC2eNac6A@mail.gmail.com>
	<20120602171632.GC2358@deviant.kiev.zoral.com.ua>
	<20120603063330.H3418@besplex.bde.org>
	<20120603051904.GG2358@deviant.kiev.zoral.com.ua>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: Gianni <gianni@FreeBSD.org>, Alan Cox <alc@rice.edu>,
	Alexander Kabaev <kan@FreeBSD.org>, Attilio Rao <attilio@FreeBSD.org>,
	Konstantin Belousov <kib@FreeBSD.org>, freebsd-arch@FreeBSD.org
Subject: Re: Fwd: [RFC] Kernel shared variables
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 03 Jun 2012 10:49:46 -0000

On Sun, 3 Jun 2012, Konstantin Belousov wrote:

> On Sun, Jun 03, 2012 at 07:28:09AM +1000, Bruce Evans wrote:
>> On Sat, 2 Jun 2012, Konstantin Belousov wrote:
>>> ...
>>> In fact, I think that if the whole goal is only fast clocks, then we
>>> do not need any additional system mechanisms, since we can easily export
>>> coefficients for rdtsc formula already. E.g. we can put it into elf auxv,
>>> which is ugly but bearable.
>>
>> How do you get the timehands offsets?  These only need to be updated
>> every second or so, or when used, but how can the application know
>> when they need to be updated if this is not done automatically in the
>> kernel by writing to a shared page?  I can only think of the
>> application arranging an alarm signal every second or so and updating
>> then.  No good for libraries.
> What is timehands offsets ? Do you mean things like leap seconds ?

Yes.  binuptime() is:

% void
% binuptime(struct bintime *bt)
% {
% 	struct timehands *th;
% 	u_int gen;
% 
% 	do {
% 		th = timehands;
% 		gen = th->th_generation;
% 		*bt = th->th_offset;
% 		bintime_addx(bt, th->th_scale * tc_delta(th));
% 	} while (gen == 0 || gen != th->th_generation);
% }

Without the kernel providing th->th_offset, you have to do lots of ntp
handling for yourself (compatibly with the kernel) just to get an
accuracy of 1 second.  Leap seconds don't affect CLOCK_MONOTONIC, but
they do affect CLOCK_REALTIME which is the clock id used by
gettimeofday().  For the former, you only have to advance the offset
for yourself occasionally (compatibly with the kernel) and manage
(compatibly with the kernel, especially in the long term) ntp slewing
and other syscall/sysctl kernel activity that micro-adjusts th->th_scale.

> This is indeed problematic for auxv. For auxv it could be solved by
> providing offset for next recheck using syscalls, and making libc code to
> respect this offset. But, I do think that vdso in shared page
> is the right solution, not auxv.

timehands in a shared pages is close to working.  th_generation protects
things in the same way as in the kernel, modulo assumptions that writes
are ordered.

>> rdtsc is also very unportable, even on CPUs that have it.  But all other
>> x86 timecounter hardware is too slow if you want gettimeofday() to be fast
>> and as accurate as it is now.
> !rdtsc hardware is probably cannot be used at all due to need to provide
> usermode access to device registers. The mere presence of rdtsc does not
> means that usermode indeed can use it, it should be decided by kernel
> based on the current in-kernel time source. If rdtsc is not usable, the
> corresponding data should not be exported, or implementation should go
> directly into syscall or whatever.

But then applications would:
- use gettimeofday() more than they should ("it works on Linux"), even
   more than now since when "it works on FreeBSD-x86" too
- just be slow when gettimeofday() is slow
- kludge around gettimeofday() being slow like they do now
- kludge around gettimeofday() being slow not like they do now (use more
   complications to probe it being slow).

I found some RedHat documentation for gettimeofday() in VDSO.  It seems
to leave it to the sysadmin to "tune" gettimeofday() using a boot
parameter to configure gettimeofday() being accurate/slow, less-accurate/
less-slow, or inaccurate/fast.  A per-process parameter would be more
correct and harder to use (add mounds of autoconfig and runtime code
in every program[mer] that cares to detect and use it).

> In fact, I would be very grateful if an expert in time-keeping provided
> concise description of the algorithm for translating rdtsc output into
> struct timeval, also enumerating required parameters.

See above.  You just scale

     tc_delta(th) == (uint32_t)(rdtsc() - rdtsc_offset) when th is for TSC,

using a carefully managed fixed point scale factor.  The delta is
reduced to 32 bits so that the scaling can be efficient.  The result
is a bintime fraction which is added to a bintime offset.  Both offsets
are even more carefully managed, and everything is protected by
th_generation, and for optimality there are multiple timehands so that
th_generation very rarely changes underneath you.  The resulting bintime
is then converted to a timeval or timespec as required.  This gives
uptimes.  Another offset is added for real times.  Times in seconds are
handled more directly; it is assumed that time_t is atomic so that
th_generation is not needed for protecting them.

The TSC frequency is limited to about 4 GHz, so the above tc_delta()
works for about 4 seconds after rdtsc_offset is updated.  But the
bintime fraction only works for 1 second.  If either of these wraps,
then the result is still latter than the update time; however, it
may be earlier than a previous result.  So the update must occur at
least once per second for the TSC.  Otherwise, negative time
differences occur (the final result is in advance of th_offset since
the bintime fraction is >= 0, but will be before a previous final
result if the bintime fraction wraps).  Negative time differences are
more worse than lost "ticks" that cause all results to be in the past.
The updates are broken by at least stopping in ddb and perhaps by
suspend/resume.  The correct fix is probably to update (or just zap)
the timecounter as the first step of resuming from ddb or sleep (this
must be done before any other timecounter call).  Note that times
going backwards cannot detected in binuptime(), etc., since to detect
it you would have to write the previous time, but that would requires
pessimal locking that is intentionally left out.

Timecounter internals like th_offsets are currently private in kern_tc.c.
I don't like exposing them for this, or cloning them for FFCLOCK.

Bruce

From owner-freebsd-arch@FreeBSD.ORG  Sun Jun  3 14:42:47 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 9135A106566B;
	Sun,  3 Jun 2012 14:42:47 +0000 (UTC)
	(envelope-from iwasaki@jp.FreeBSD.org)
Received: from locore.org (ns01.locore.org [218.45.21.227])
	by mx1.freebsd.org (Postfix) with ESMTP id 236248FC0C;
	Sun,  3 Jun 2012 14:42:47 +0000 (UTC)
Received: from localhost (celeron.v4.locore.org [192.168.0.10])
	by locore.org (8.14.5/8.14.5/iwasaki) with ESMTP/inet id q53EgiCq031408;
	Sun, 3 Jun 2012 23:42:44 +0900 (JST)
	(envelope-from iwasaki@jp.FreeBSD.org)
Date: Sun, 03 Jun 2012 23:42:43 +0900 (JST)
Message-Id: <20120603.234243.28389486.iwasaki@jp.FreeBSD.org>
To: avg@FreeBSD.org
From: Mitsuru IWASAKI <iwasaki@jp.FreeBSD.org>
In-Reply-To: <4FCB0FE5.4050607@FreeBSD.org>
References: <20120603.002554.119853142.iwasaki@jp.FreeBSD.org>
	<CAJ-FndAfm4_XqFSwBqXK=cgWkE6YVrtkS5BbcH7zcRd-100xTw@mail.gmail.com>
	<4FCB0FE5.4050607@FreeBSD.org>
X-Mailer: Mew version 3.3 on Emacs 20.7 / Mule 4.0 (HANANOEN)
Mime-Version: 1.0
Content-Type: Text/Plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Cc: attilio@FreeBSD.org, freebsd-acpi@FreeBSD.org, freebsd-arch@FreeBSD.org
Subject: Re: cpu stopping
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 03 Jun 2012 14:42:47 -0000

Hi, thanks for comments.

> As the first thing I must admit that I haven't looked at the patch :-)

Never mind :) What I'm trying to do in the patches is just to unify
amd64/i386 independent part (acpi_wakeup.c) for the code maintenance,
so please let's commit it first, then start re-design the
cpususpend_handler().

> But really I don't see why we need to differentiate between stopped and
> suspended state as both of them ultimately mean exactly the same thing - CPUs
> are spinning on some condition (and they are in a well-defined place and state).

Yes, amd64/i386 cpususpend_handler() is very similar to cpustop_handler()
actually, some resume related procedures are added for suspend.

> My view of how this should work is:
> - there can be only one master CPU that controls all other (slave) CPUs
> - the master sets entry and exit hooks

Entry hook for suspending might be
----
                ctx_fpusave(suspfpusave[cpu]);
                wbinvd();
                CPU_SET_ATOMIC(cpu, &stopped_cpus);
----

and for stopping is
----
        /* Indicate that we are stopped */
        CPU_SET_ATOMIC(cpu, &stopped_cpus);
----

Correct?
I think stopping hook can be replaced with suspending hook.


Exit hook for suspending is
----
                pmap_init_pat();
                load_cr3(susppcbs[cpu]->pcb_cr3);
                initializecpu();
                PCPU_SET(switchtime, 0);
                PCPU_SET(switchticks, ticks);
[snip]
        /* Resume MCA and local APIC */
        mca_resume();
        lapic_setup(0);
----

For stopping should be
----
        if (cpu == 0 && cpustop_restartfunc != NULL) {
                cpustop_restartfunc();
                cpustop_restartfunc = NULL;
        }
----

> - the master signals slaves to enter the stop state
> - the slaves execute the enter hook and start spinning on the release condition
> - the master does whatever it wants to do in this special system state
> - the master signals the slaves to resume
> - the slave exit the spin loop and execute the exit hook

I think it would be possible.  However I personally think that
priority of x86/x86/mp_machdep.c is higher and more effective than
merging cpususpend/stop_handler().

Thanks

From owner-freebsd-arch@FreeBSD.ORG  Sun Jun  3 19:02:02 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id BD3B41065672
	for <freebsd-arch@freebsd.org>; Sun,  3 Jun 2012 19:02:02 +0000 (UTC)
	(envelope-from matthewstory@gmail.com)
Received: from mail-ob0-f182.google.com (mail-ob0-f182.google.com
	[209.85.214.182])
	by mx1.freebsd.org (Postfix) with ESMTP id 7C68E8FC0C
	for <freebsd-arch@freebsd.org>; Sun,  3 Jun 2012 19:02:02 +0000 (UTC)
Received: by obcni5 with SMTP id ni5so7921841obc.13
	for <freebsd-arch@freebsd.org>; Sun, 03 Jun 2012 12:02:02 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
	h=mime-version:date:message-id:subject:from:to:content-type;
	bh=0jdHHvmFDrycVQfGFLm6bz/vexsA4didlqLd1cNGU2Y=;
	b=VdoWc4erh/u7c7W16aZuAcOfiJCKknyJPW6Udg/oRo5z/MKXL6i4oYAFxGc/iErbVV
	UhqLOiaEivAwLy3z1Tf6n6Wtp+iwfNJxUbEQ1Ry7TkKiBy702YDxKVjldXc0TR5MMOKg
	631JDhuT8D0R6/fpcgFlPYiAhCs8LTHVQULVnaMWBvvbUVWave5pc6MwtzmU6Nc2s/Wg
	DzHQl5J6PHb/xyEl/ZhVM9DvQpt0opI0Pf63cWOk3qh0P7QT7KFYKAcrrpQ6PDGy69hW
	gtk9K5dfE3crL8xTQJV3YeW4WgkLS2cVxpQZT4B18n4ShAIuMdrwaa5DH1HYK21fu9w2
	iCEQ==
MIME-Version: 1.0
Received: by 10.60.3.40 with SMTP id 8mr9506170oez.31.1338750122095; Sun, 03
	Jun 2012 12:02:02 -0700 (PDT)
Received: by 10.76.116.68 with HTTP; Sun, 3 Jun 2012 12:02:02 -0700 (PDT)
Date: Sun, 3 Jun 2012 15:02:02 -0400
Message-ID: <CAB+9ogfQcp6gr8GH9ubFPRSmGvpVKfJ+EF2=-JH8dJTbukoNJw@mail.gmail.com>
From: Matthew Story <matthewstory@gmail.com>
To: freebsd-arch@freebsd.org
Content-Type: multipart/mixed; boundary=e89a8f83a34578732404c1960d10
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Subject: ASCII Notes from FreeBSD Network Summit at BSDCan
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 03 Jun 2012 19:02:02 -0000

--e89a8f83a34578732404c1960d10
Content-Type: text/plain; charset=ISO-8859-1

gnn asked me to forward these along to arch.  notes are (as) literal a copy
of the whiteboard session as I could work into ASCII

-- 
regards,
matt

--e89a8f83a34578732404c1960d10
Content-Type: text/plain; charset=US-ASCII; name="network-whiteboard-part1.txt"
Content-Disposition: attachment; filename="network-whiteboard-part1.txt"
Content-Transfer-Encoding: base64
X-Attachment-Id: f_h30h8hf30

KiBtYnVmCiAgLT4gdmFyaWFibGUKICAtPiBtdGFnCiAgLT4gb2ZmbG9hZGluZwogIC0+IGluZGly
ZWN0aW9uCiogbDIvbDMgc3BsaXQKKiBpZm5ldCByZWRlc2lnbgogICAgLT4gcXVldWUKICAgIC0+
IGluZGlyZWN0aW9uCiAgICAtPiBkZWR1cGUgMTBHCiAgICAtPiB2YXJpYWJsZSBzaXplCiogY2hl
Y2tzdW0KKiBJT1ggcm9hZG1hcAoqIG5ldG1hcAoqIGxhdGVuY3kvYncgbWVhc3VyZQoqIE5JQy9T
dGFjayBsb2FkIGRpc3RyaWJ1dGlvbgoKbWJ1ZiBwcm9ibGVtcyBpbmRpcmVjdGlvbjoKICAgIC0+
IDIgdHlwZXMgb2YgbWJ1ZnMKICAgICAgICAtPiB2ZXJ5IHNtYWxsCiAgICAgICAgLT4gdmVyeSBs
YXJnZQogICAgLT4gdG9vIG11Y2ggaW5kaXJlY3Rpb24KICAgICAgICAtPiBKZWZmUiBwYXRjaD8K
ICAgICAgICAgICAgLT4gdmFyaWFibGUtc2l6ZSBtYnVmIHBhdGNoCiAgICAgICAgICAgICAgICAt
PiBhbnlvbmUgb3duPwogICAgICAgICAgICAgICAgICAgIC0+IG5vCiAgICAgICAgICAgICAgICAt
PiB5b3UgZG9uJ3QgaGF2ZSB0byBoYXZlIGluZGlyZWN0aW9uCiAgICAgICAgICAgICAgICAtPiBu
byBjbHVzdGVycyByZXF1aXJlZAogICAgICAgICAgICAgICAgICAgIC0+IHN1cHBvcnQgZm9yIGNs
dXN0ZXJzIHJlbWFpbnMsCiAgICAgICAgICAgICAgICAgICAgICAgbmVjZXNzYXJ5IGZvciBhcmNo
IHcvbyBhY2Nlc3MgdG8gYWxsIG1lbW9yeQogICAgICAgICAgICAgICAgLT4gcGF0Y2ggaXMgc3Bl
Y2lmaWMsIGFueSBvdGhlciBjb25jZXJucz8KICAgICAgICAgICAgICAgICAgICAtPiBJTyBWZWN0
b3IgZGVzaWduLCBzY2F0dGVyL2dhdGhlciAoc29tZSBzb3J0IG9rIGlvdmVjKQogICAgICAgICAg
ICAgICAgICAgIC0+IGJhdGNoaW5nPwogICAgICAgICAgICAgICAgICAgICAgICAtPiBzYWNyaWZp
Y2UgbGVzcyBpbmRpcmVjdGlvbiBpbiBoZWFkZXIsCiAgICAgICAgICAgICAgICAgICAgICAgICAg
IGZvciBtb3JlIGluZGlyZWN0aW9uIGluIG1ldGEtZGF0YQogICAgICAgICAgICAgICAgICAgICAg
ICAtPiBvciBpcyBpdCBqdXN0IG1vdmluZyB0aGUgaW5kaXJlY3Rpb24/CiAgICAgICAgICAgICAg
ICAgICAgLT4gc3RyaXBwaW5nIGhlYWRlcnMKICAgICAgICAgICAgICAgICAgICAtPiBoZWFkZXIg
YXQgZW5kPyAKICAgICAgICAgICAgICAgICAgICAtPiBzaXplIGNob2ljZXMKICAgICAgICAgICAg
ICAgICAgICAtPiBwcm9maWxpbmcKICAgICAgICAgICAgICAgICAgICAtPiBwcml2YXRlIGFsbG9j
YXRpb24KICAgICAgICAgICAgICAgIC0+IGFueW9uZSBvd24/CiAgICAgICAgICAgICAgICAgICAg
LT4geWVzLCBycnNACgpXaGF0IGRvIHdlIHdhbnQgdG8gc3RvcmUgaW4gdmFyaWFibGUgbWJ1ZnMK
ClZMQU4gSUQgKGV0YykKUSBpbiBRIGluIC4uLgpNQUMgQWRkcmVzcwpNUExTCkZsb3cgSUQgKyB0
eXBlCkZJQgo4MDIuMTEgLS0tLS0+IFFvUyAoM2IpLCBBZ2UgKDhiKSwgU2VxICgxOGIpLCB2aWV3
IFRJRCAoNGIpLCBSYXRlIGNvbnRyb2wgKDE2QikKSW50ZXJmYWNlSUQgKyBnZW5lcmF0aW9uCkZp
cmV3YWxsIFJ1bGVzIDggLSAxNkIgKGp1bmlwZXIpCihjYW4ndCByZWFkKQpQYWNrZXQgVGltZXN0
YW1wICg2NGIpCkxvY2FsIERhdGEgKENQVSwgZXRjKQpKb3VybmFsIG9mIHVzZSAodHJhY2UpCklQ
U2VjIC0+IGRhdGEgJiByZWZlcmVuY2UKSGVhZGVyIHBhcnNlIHN0YXRlCk1BQyBsYWJlbHMKVklN
QUdFPyAocG9pbnRlcikKVFNPCkNoZWNrc3VtCkNBUlAsIExBR0cKQUxUUSB0YWcK
--e89a8f83a34578732404c1960d10
Content-Type: text/plain; charset=US-ASCII; name="clusters.txt"
Content-Disposition: attachment; filename="clusters.txt"
Content-Transfer-Encoding: base64
X-Attachment-Id: f_h30h8z9q1

Ky0tLS0tLS0tLS0tLS0tLS0tLS0tLS0rICAgICAgKy0tLS0tLS0tLS0tLS0tLS0tLS0tKwp8IGhl
YWRlciAgICAgcG9pbnQgdG8gICstLS0tLS0rPiBjbHVzdGVyICAgICAgICAgICB8CnwgICAgICAg
ICAgICAgICAgICBvciAgKy0rICAgIHwgICAgICAgICAgICAgICAgICAgIHwKKy0tLS0tLS0tLS0t
LS0tLS0tLS0tLS0rIHwgICAgfCAgICAgICAgICAgICAgICAgICAgfAp8IGRhdGEgICAgICAgICAg
ICAgICAgPCstKyAgICB8ICAgICAgICAgICAgICAgICAgICB8CnwgICAgICAgICAgICAgICAgICAg
ICAgfCAgICAgIHwgICAgICAgICAgICAgICAgICAgIHwKfCAgICAgICAgICAgICAgICAgICAgICB8
ICAgICAgKy0tLS0tLS0tLS0tLS0tLS0tLS0tKworLS0tLS0tLS0tLS0tLS0tLS0tLS0tLSsKLiAg
ICAgICAgIHwgIHwgICAgICAgICAuIAouICAgICAgICAgfCAgfCAgICAgICAgIC4gIHByb3Bvc2Vk
IHNvbHV0aW9uIC4uLgouICAgICAgICB2fHZ2fHYgICAgICAgIC4KLiAgICAgICAgIHZ2dnYgICAg
ICAgICAuIAouICAgICAgICAgIHZ2ICAgICAgICAgIC4KLiAuIC4gLiAuIC4uLi4gLiAuIC4gLiAu
Cg==
--e89a8f83a34578732404c1960d10--

From owner-freebsd-arch@FreeBSD.ORG  Sun Jun  3 19:35:26 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 53EEA106566B;
	Sun,  3 Jun 2012 19:35:26 +0000 (UTC) (envelope-from avg@FreeBSD.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
	by mx1.freebsd.org (Postfix) with ESMTP id 3964C8FC1B;
	Sun,  3 Jun 2012 19:35:25 +0000 (UTC)
Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua
	[212.40.38.100])
	by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id WAA01101;
	Sun, 03 Jun 2012 22:35:16 +0300 (EEST)
	(envelope-from avg@FreeBSD.org)
Received: from localhost ([127.0.0.1])
	by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
	id 1SbGa4-000Mor-3y; Sun, 03 Jun 2012 22:35:16 +0300
Message-ID: <4FCBBC72.8070209@FreeBSD.org>
Date: Sun, 03 Jun 2012 22:35:14 +0300
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:12.0) Gecko/20120503 Thunderbird/12.0.1
MIME-Version: 1.0
To: Attilio Rao <attilio@FreeBSD.org>
References: <20120603.002554.119853142.iwasaki@jp.FreeBSD.org>
	<CAJ-FndAfm4_XqFSwBqXK=cgWkE6YVrtkS5BbcH7zcRd-100xTw@mail.gmail.com>
	<4FCB0FE5.4050607@FreeBSD.org>
	<CAJ-FndAnx=UnxJCwLPtze7tu72wT4b+e2T_tHH+pup-VaxfiTw@mail.gmail.com>
In-Reply-To: <CAJ-FndAnx=UnxJCwLPtze7tu72wT4b+e2T_tHH+pup-VaxfiTw@mail.gmail.com>
X-Enigmail-Version: 1.5pre
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Cc: freebsd-acpi@FreeBSD.org, Mitsuru IWASAKI <iwasaki@jp.freebsd.org>,
	freebsd-arch@FreeBSD.org
Subject: Re: cpu stopping [Was: preparation for x86/acpica/acpi_wakeup.c]
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 03 Jun 2012 19:35:26 -0000

on 03/06/2012 12:54 Attilio Rao said the following:
> 2012/6/3 Andriy Gapon <avg@freebsd.org>:
>> on 03/06/2012 00:39 Attilio Rao said the following:
>>> The first thing to consider is that right now we only have 2 states
>>> for CPUs: started and stopped. These states are controlled by
>>> started_cpus and stopped_cpus masks respectively. It seems you really
>>> want to add an intermediate level among the 2 where you have: started
>>> -> suspended -> started -> suspended ... -> stopped and you need to
>>> expand the mechanism for dealing with started and stopped cpus to do
>>> that. I'm pretty sure this will be very helpful also for other
>>> architectures that want to do the same.
>>
>> As the first thing I must admit that I haven't looked at the patch :-)
>>
>>
>> But really I don't see why we need to differentiate between stopped and
>> suspended state as both of them ultimately mean exactly the same thing - CPUs
>> are spinning on some condition (and they are in a well-defined place and state).
> 
> This is debeatable and I'm not sure I agree.
> At some point we may want to implement CPU on-the-fly suspension for
> CPUs which is a different event than "stopping" (where stopping will
> be "permanent stopping" and suspending will be "possible to recover
> suspension").

Right, but that should operate on the level above the current code.
I.e. first stop all slave CPUs, than set state of a target CPU (which includes
global view of that state), then resume all other CPUs.

> The important thing about this is that we need to expand our model in
> a way that it makes simple to add more states to the CPUs than simple
> started/stopped. Right now we don't have any architecture for this in
> place.

I can't disagree with this, but I think that the current IPI-to-stop code is not
a place for that.  It's too low level.

>> My view of how this should work is:
>> - there can be only one master CPU that controls all other (slave) CPUs
>> - the master sets entry and exit hooks
>> - the master signals slaves to enter the stop state
>> - the slaves execute the enter hook and start spinning on the release condition
>> - the master does whatever it wants to do in this special system state
>> - the master signals the slaves to resume
>> - the slave exit the spin loop and execute the exit hook
>>
>> We have almost all of this in place.  Only now we have different IPIs and
>> different IPI handlers to do the job (cpustop_handler and cpususpend_handler).
>> I think that the hooks model should be more universal.
> 
> For hook you mean like a rendezvous handler? I'm not sure I understand
> otherwise.

Maybe, perhaps.  I meant just a couple of function pointers.
cpustop_restartfunc seems to be a better analogy.

>> In my opinion, what really would deserve a completely independent path is the
>> hard-stop case.  As this can be invoked nested to the other cases.  E.g. exotic
>> situations like a breakpoint or a trap or a panic in the suspend or the normal
>> stop code paths.
> 
> What I'm really interested is expanding our model in a way that it can
> handle multiple CPU states. Then it is just a matter of adding the
> right states and it is all trivial work.
> 
> And however, as already mentioned, I'm not sure I would assimilate
> suspended = stopped.


-- 
Andriy Gapon

From owner-freebsd-arch@FreeBSD.ORG  Sun Jun  3 19:45:41 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 24093106564A;
	Sun,  3 Jun 2012 19:45:41 +0000 (UTC) (envelope-from avg@FreeBSD.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
	by mx1.freebsd.org (Postfix) with ESMTP id DBFA78FC1F;
	Sun,  3 Jun 2012 19:45:39 +0000 (UTC)
Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua
	[212.40.38.100])
	by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id WAA01145;
	Sun, 03 Jun 2012 22:45:34 +0300 (EEST)
	(envelope-from avg@FreeBSD.org)
Received: from localhost ([127.0.0.1])
	by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
	id 1SbGk1-000MpY-T6; Sun, 03 Jun 2012 22:45:33 +0300
Message-ID: <4FCBBEDD.5000604@FreeBSD.org>
Date: Sun, 03 Jun 2012 22:45:33 +0300
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:12.0) Gecko/20120503 Thunderbird/12.0.1
MIME-Version: 1.0
To: Mitsuru IWASAKI <iwasaki@jp.FreeBSD.org>
References: <20120603.002554.119853142.iwasaki@jp.FreeBSD.org>
	<CAJ-FndAfm4_XqFSwBqXK=cgWkE6YVrtkS5BbcH7zcRd-100xTw@mail.gmail.com>
	<4FCB0FE5.4050607@FreeBSD.org>
	<20120603.234243.28389486.iwasaki@jp.FreeBSD.org>
In-Reply-To: <20120603.234243.28389486.iwasaki@jp.FreeBSD.org>
X-Enigmail-Version: 1.5pre
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: attilio@FreeBSD.org, freebsd-acpi@FreeBSD.org, freebsd-arch@FreeBSD.org
Subject: Re: cpu stopping
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 03 Jun 2012 19:45:41 -0000

on 03/06/2012 17:42 Mitsuru IWASAKI said the following:
> Hi, thanks for comments.
> 
>> As the first thing I must admit that I haven't looked at the patch :-)
> 
> Never mind :) What I'm trying to do in the patches is just to unify
> amd64/i386 independent part (acpi_wakeup.c) for the code maintenance,
> so please let's commit it first, then start re-design the
> cpususpend_handler().

In no way I am trying to delay your work :)
Just shared my view on the design of cpu stopping code.

>> But really I don't see why we need to differentiate between stopped and
>> suspended state as both of them ultimately mean exactly the same thing - CPUs
>> are spinning on some condition (and they are in a well-defined place and state).
> 
> Yes, amd64/i386 cpususpend_handler() is very similar to cpustop_handler()
> actually, some resume related procedures are added for suspend.
> 
>> My view of how this should work is:
>> - there can be only one master CPU that controls all other (slave) CPUs
>> - the master sets entry and exit hooks
> 
> Entry hook for suspending might be
> ----
>                 ctx_fpusave(suspfpusave[cpu]);
>                 wbinvd();
>                 CPU_SET_ATOMIC(cpu, &stopped_cpus);
> ----
> 
> and for stopping is
> ----
>         /* Indicate that we are stopped */
>         CPU_SET_ATOMIC(cpu, &stopped_cpus);
> ----
> 
> Correct?

Yes.  The only nit is that CPU_SET_ATOMIC(cpu, &stopped_cpus) could be part of
the wait loop prologue.  No need to duplicate it in each hook.

> I think stopping hook can be replaced with suspending hook.

Perhaps... But let's not go into this topic just yet.

> Exit hook for suspending is
> ----
>                 pmap_init_pat();
>                 load_cr3(susppcbs[cpu]->pcb_cr3);
>                 initializecpu();
>                 PCPU_SET(switchtime, 0);
>                 PCPU_SET(switchticks, ticks);
> [snip]
>         /* Resume MCA and local APIC */
>         mca_resume();
>         lapic_setup(0);
> ----
> 
> For stopping should be
> ----
>         if (cpu == 0 && cpustop_restartfunc != NULL) {
>                 cpustop_restartfunc();
>                 cpustop_restartfunc = NULL;
>         }
> ----
> 
>> - the master signals slaves to enter the stop state
>> - the slaves execute the enter hook and start spinning on the release condition
>> - the master does whatever it wants to do in this special system state
>> - the master signals the slaves to resume
>> - the slave exit the spin loop and execute the exit hook
> 
> I think it would be possible.  However I personally think that
> priority of x86/x86/mp_machdep.c is higher and more effective than
> merging cpususpend/stop_handler().

I do not disagree.

-- 
Andriy Gapon

From owner-freebsd-arch@FreeBSD.ORG  Sun Jun  3 20:02:10 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 9FBBC1065745;
	Sun,  3 Jun 2012 20:02:10 +0000 (UTC)
	(envelope-from asmrookie@gmail.com)
Received: from mail-lpp01m010-f54.google.com (mail-lpp01m010-f54.google.com
	[209.85.215.54])
	by mx1.freebsd.org (Postfix) with ESMTP id A3F3A8FC1A;
	Sun,  3 Jun 2012 20:02:09 +0000 (UTC)
Received: by laai10 with SMTP id i10so3262639laa.13
	for <multiple recipients>; Sun, 03 Jun 2012 13:02:01 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
	h=mime-version:sender:in-reply-to:references:date
	:x-google-sender-auth:message-id:subject:from:to:cc:content-type
	:content-transfer-encoding;
	bh=dUPpHTdTJTdWNZBt33McuZiSIzerq38ci95MZLizuIg=;
	b=ysizMijAvLhD1tY3B6qSu1V1CfNTU92DMr4EGBk+nYjYr+s9XZKnUc3M17gT2pzYJp
	O9KLbi3YI5KIdPtoD7CEgU9c2/kHdPLM8qpSL/UWydFcoePPSjW14l87IOZUO1Xypl6V
	te7tQYsfeztiGf8RKcONNtnAYOUw0o0A2rt/NuwapqjqQuJmXDOtCx7Qon4NUPcsyE5t
	mWRiiO0b6sCMu6jHeJuh56ipp6ki4HPslwVWjGUxtYcGGxBCsBD3n/wUl4/Xm22kWWfu
	xUDIR2KWaNdHleZLuC2aGDAC3D5Dkxtoqlf9P8ffzGdi//Skmz5ROJ1LnDj5BL/Ej7fV
	rewA==
MIME-Version: 1.0
Received: by 10.152.103.11 with SMTP id fs11mr9876014lab.23.1338753721887;
	Sun, 03 Jun 2012 13:02:01 -0700 (PDT)
Sender: asmrookie@gmail.com
Received: by 10.112.27.65 with HTTP; Sun, 3 Jun 2012 13:02:01 -0700 (PDT)
In-Reply-To: <4FCBBC72.8070209@FreeBSD.org>
References: <20120603.002554.119853142.iwasaki@jp.FreeBSD.org>
	<CAJ-FndAfm4_XqFSwBqXK=cgWkE6YVrtkS5BbcH7zcRd-100xTw@mail.gmail.com>
	<4FCB0FE5.4050607@FreeBSD.org>
	<CAJ-FndAnx=UnxJCwLPtze7tu72wT4b+e2T_tHH+pup-VaxfiTw@mail.gmail.com>
	<4FCBBC72.8070209@FreeBSD.org>
Date: Sun, 3 Jun 2012 21:02:01 +0100
X-Google-Sender-Auth: xSOxkHrJHCFZkowbHu9KLCDOs4E
Message-ID: <CAJ-FndAqFOy-jxXzxX_+RFBqD1k+v62tDJzFic_==SWODi3VuQ@mail.gmail.com>
From: Attilio Rao <attilio@freebsd.org>
To: Andriy Gapon <avg@freebsd.org>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Cc: freebsd-acpi@freebsd.org, Mitsuru IWASAKI <iwasaki@jp.freebsd.org>,
	freebsd-arch@freebsd.org
Subject: Re: cpu stopping [Was: preparation for x86/acpica/acpi_wakeup.c]
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 03 Jun 2012 20:02:10 -0000

2012/6/3 Andriy Gapon <avg@freebsd.org>:
> on 03/06/2012 12:54 Attilio Rao said the following:
>> 2012/6/3 Andriy Gapon <avg@freebsd.org>:
>>> on 03/06/2012 00:39 Attilio Rao said the following:
>>>> The first thing to consider is that right now we only have 2 states
>>>> for CPUs: started and stopped. These states are controlled by
>>>> started_cpus and stopped_cpus masks respectively. It seems you really
>>>> want to add an intermediate level among the 2 where you have: started
>>>> -> suspended -> started -> suspended ... -> stopped and you need to
>>>> expand the mechanism for dealing with started and stopped cpus to do
>>>> that. I'm pretty sure this will be very helpful also for other
>>>> architectures that want to do the same.
>>>
>>> As the first thing I must admit that I haven't looked at the patch :-)
>>>
>>>
>>> But really I don't see why we need to differentiate between stopped and
>>> suspended state as both of them ultimately mean exactly the same thing =
- CPUs
>>> are spinning on some condition (and they are in a well-defined place an=
d state).
>>
>> This is debeatable and I'm not sure I agree.
>> At some point we may want to implement CPU on-the-fly suspension for
>> CPUs which is a different event than "stopping" (where stopping will
>> be "permanent stopping" and suspending will be "possible to recover
>> suspension").
>
> Right, but that should operate on the level above the current code.
> I.e. first stop all slave CPUs, than set state of a target CPU (which inc=
ludes
> global view of that state), then resume all other CPUs.
>
>> The important thing about this is that we need to expand our model in
>> a way that it makes simple to add more states to the CPUs than simple
>> started/stopped. Right now we don't have any architecture for this in
>> place.
>
> I can't disagree with this, but I think that the current IPI-to-stop code=
 is not
> a place for that. =C2=A0It's too low level.

Yeah, I was referring in particular to the handling of the masks and
few other things (stoppcbs, which could be rebased as suspendpcbs for
that, etc.).
The point I'm really trying to make is: our model is very very biased
on the on/off case (started/stopped) and we need to abstract this and
have a framework for adding several CPU states.
After you have an abstracted model, you can simply make several states easi=
lly.
This is not a simple work and it is also less simple for
synchronization, which right now is very much simplified/unhandled. I
would be very happy if you or Mitsuru plan to work on that.

Attilio


--=20
Peace can only be achieved by understanding - A. Einstein

From owner-freebsd-arch@FreeBSD.ORG  Mon Jun  4 17:50:11 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 00AC7106564A;
	Mon,  4 Jun 2012 17:50:10 +0000 (UTC) (envelope-from jhb@freebsd.org)
Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net
	[IPv6:2001:470:1f10:75::2])
	by mx1.freebsd.org (Postfix) with ESMTP id BD3B18FC12;
	Mon,  4 Jun 2012 17:50:10 +0000 (UTC)
Received: from jhbbsd.localnet (unknown [209.249.190.124])
	by bigwig.baldwin.cx (Postfix) with ESMTPSA id 287E1B990;
	Mon,  4 Jun 2012 13:50:10 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: freebsd-arch@freebsd.org
Date: Mon, 4 Jun 2012 10:53:51 -0400
User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p13; KDE/4.5.5; amd64; ; )
References: <CACfq090r1tWhuDkxdSZ24fwafbVKU0yduu1yV2+oYo+wwT4ipA@mail.gmail.com>
	<20120602171632.GC2358@deviant.kiev.zoral.com.ua>
	<CAJ-FndCh77syp+860LaCbgQ6eiQAq_OMM98RxqxmCv+YKENXoA@mail.gmail.com>
In-Reply-To: <CAJ-FndCh77syp+860LaCbgQ6eiQAq_OMM98RxqxmCv+YKENXoA@mail.gmail.com>
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="utf-8"
Content-Transfer-Encoding: 7bit
Message-Id: <201206041053.51802.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7
	(bigwig.baldwin.cx); Mon, 04 Jun 2012 13:50:10 -0400 (EDT)
Cc: Gianni <gianni@freebsd.org>, Alan Cox <alc@rice.edu>,
	Alexander Kabaev <kan@freebsd.org>, Attilio Rao <attilio@freebsd.org>,
	Konstantin Belousov <kib@freebsd.org>,
	Konstantin Belousov <kostikbel@gmail.com>
Subject: Re: Fwd: [RFC] Kernel shared variables
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 04 Jun 2012 17:50:11 -0000

On Saturday, June 02, 2012 1:27:58 pm Attilio Rao wrote:
> >> The gettimeofday() implementation is a different story than what is asked 
here.
> >
> > But the goal is to have fast clocks, right ? What else is planned ?
> >
> > In fact, I think that if the whole goal is only fast clocks, then we
> > do not need any additional system mechanisms, since we can easily export
> > coefficients for rdtsc formula already. E.g. we can put it into elf auxv,
> > which is ugly but bearable.
> 
> Not sure if there is anything else besides gettimeofday() that we want
> right now, in particular on global basis.
> I just mean to say that I don't think Giovanni put a lot of effort in
> correctness/robustness of gettimeofday userland implementation, so we
> should not judge that part of the patch too tightly.

I think this is an important question actually.  Is there anything that really 
needs to be here besides gettimeofday()?  I mean, is there any real-world 
application that needs to call getpid() or getppid() a bunch of times?  Things 
that are static like that the application can easily cache (and should if it
actually needs it).  gettimeofday() is different because it is dynamic.

> >> > Interesting question is how much shared the shared page needs be.
> >> > Obvious needs are shared between all same-ABI processes, but I can also
> >> > easily see a need for the per-process private information be present in
> >> > the 'private-shared' page. For silly but typical example, useful for
> >> > moronix-style benchmarks, see getpid().
> >>
> >> Really the performance benefits of having fast getpid() is marginal if
> >> compared to heavilly used things like gettimeofday(). I cannot think
> >> of a per-process page implementing a fast syscall that can bring many
> >> perfomance advantages.
> >
> > This is completely true, but there may be other process-private data that
> > could benefit from the low access cost. I just do not know right now.
> 
> I don't know either, thus I don't think there is a big urgence for
> per-process shared pages at all.

I can't think of anything useful.

-- 
John Baldwin

From owner-freebsd-arch@FreeBSD.ORG  Mon Jun  4 17:50:12 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 291EA1065675;
	Mon,  4 Jun 2012 17:50:12 +0000 (UTC) (envelope-from jhb@freebsd.org)
Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net
	[IPv6:2001:470:1f10:75::2])
	by mx1.freebsd.org (Postfix) with ESMTP id 8B5178FC1D;
	Mon,  4 Jun 2012 17:50:11 +0000 (UTC)
Received: from jhbbsd.localnet (unknown [209.249.190.124])
	by bigwig.baldwin.cx (Postfix) with ESMTPSA id E8B70B99B;
	Mon,  4 Jun 2012 13:50:10 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: freebsd-arch@freebsd.org
Date: Mon, 4 Jun 2012 11:01:57 -0400
User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p13; KDE/4.5.5; amd64; ; )
References: <CACfq090r1tWhuDkxdSZ24fwafbVKU0yduu1yV2+oYo+wwT4ipA@mail.gmail.com>
	<20120603051904.GG2358@deviant.kiev.zoral.com.ua>
	<20120603184315.T856@besplex.bde.org>
In-Reply-To: <20120603184315.T856@besplex.bde.org>
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Message-Id: <201206041101.57486.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7
	(bigwig.baldwin.cx); Mon, 04 Jun 2012 13:50:11 -0400 (EDT)
Cc: Gianni <gianni@freebsd.org>, Alan Cox <alc@rice.edu>,
	Alexander Kabaev <kan@freebsd.org>, Attilio Rao <attilio@freebsd.org>,
	Konstantin Belousov <kib@freebsd.org>,
	Konstantin Belousov <kostikbel@gmail.com>
Subject: Re: Fwd: [RFC] Kernel shared variables
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 04 Jun 2012 17:50:12 -0000

On Sunday, June 03, 2012 6:49:27 am Bruce Evans wrote:
> On Sun, 3 Jun 2012, Konstantin Belousov wrote:
> 
> > On Sun, Jun 03, 2012 at 07:28:09AM +1000, Bruce Evans wrote:
> >> On Sat, 2 Jun 2012, Konstantin Belousov wrote:
> >>> ...
> >>> In fact, I think that if the whole goal is only fast clocks, then we
> >>> do not need any additional system mechanisms, since we can easily export
> >>> coefficients for rdtsc formula already. E.g. we can put it into elf auxv,
> >>> which is ugly but bearable.
> >>
> >> How do you get the timehands offsets?  These only need to be updated
> >> every second or so, or when used, but how can the application know
> >> when they need to be updated if this is not done automatically in the
> >> kernel by writing to a shared page?  I can only think of the
> >> application arranging an alarm signal every second or so and updating
> >> then.  No good for libraries.
> > What is timehands offsets ? Do you mean things like leap seconds ?
> 
> Yes.  binuptime() is:
> 
> % void
> % binuptime(struct bintime *bt)
> % {
> % 	struct timehands *th;
> % 	u_int gen;
> % 
> % 	do {
> % 		th = timehands;
> % 		gen = th->th_generation;
> % 		*bt = th->th_offset;
> % 		bintime_addx(bt, th->th_scale * tc_delta(th));
> % 	} while (gen == 0 || gen != th->th_generation);
> % }
> 
> Without the kernel providing th->th_offset, you have to do lots of ntp
> handling for yourself (compatibly with the kernel) just to get an
> accuracy of 1 second.  Leap seconds don't affect CLOCK_MONOTONIC, but
> they do affect CLOCK_REALTIME which is the clock id used by
> gettimeofday().  For the former, you only have to advance the offset
> for yourself occasionally (compatibly with the kernel) and manage
> (compatibly with the kernel, especially in the long term) ntp slewing
> and other syscall/sysctl kernel activity that micro-adjusts th->th_scale.

I think duplicating this logic in userland would just be wasteful.  I have
a private fast gettimeofday() at my current job and it works by exporting
the current timehands structure (well, the equivalent) to userland.  The
userland bits then fetch a copy of the details and do the same as bintime().
(I move the math (bintime_addx() and the multiply)) out of the loop however.

> > This is indeed problematic for auxv. For auxv it could be solved by
> > providing offset for next recheck using syscalls, and making libc code to
> > respect this offset. But, I do think that vdso in shared page
> > is the right solution, not auxv.
> 
> timehands in a shared pages is close to working.  th_generation protects
> things in the same way as in the kernel, modulo assumptions that writes
> are ordered.

It would work fine.  And in fact, having multiple timehands is actually a
bug, not a feature.  It lets you compute bogus timestamps if you get preempted
at the wrong time and end up with time jumping around.  At Yahoo! we reduced
the number of timehands structures down to 2 or some such, and I'm now of
the opinion we should just have one and dispense with the entire array.

For my userland case I only export a single timehands copy.

> >> rdtsc is also very unportable, even on CPUs that have it.  But all other
> >> x86 timecounter hardware is too slow if you want gettimeofday() to be fast
> >> and as accurate as it is now.

For all the hardware where people run mysql and similar software that calls
getimeofday() a lot, rdtsc() works just fine.

> > !rdtsc hardware is probably cannot be used at all due to need to provide
> > usermode access to device registers. The mere presence of rdtsc does not
> > means that usermode indeed can use it, it should be decided by kernel
> > based on the current in-kernel time source. If rdtsc is not usable, the
> > corresponding data should not be exported, or implementation should go
> > directly into syscall or whatever.

Yes, the patches I have only work if the kernel uses the TSC as its main
timecounter as well.

> But then applications would:
> - use gettimeofday() more than they should ("it works on Linux"), even
>    more than now since when "it works on FreeBSD-x86" too
> - just be slow when gettimeofday() is slow
> - kludge around gettimeofday() being slow like they do now
> - kludge around gettimeofday() being slow not like they do now (use more
>    complications to probe it being slow).

Some applications really need fine-grained timing with as little overhead
as possible.

-- 
John Baldwin

From owner-freebsd-arch@FreeBSD.ORG  Mon Jun  4 18:19:30 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 3C44010657DB;
	Mon,  4 Jun 2012 18:19:30 +0000 (UTC)
	(envelope-from kostikbel@gmail.com)
Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200])
	by mx1.freebsd.org (Postfix) with ESMTP id 817478FC0A;
	Mon,  4 Jun 2012 18:19:29 +0000 (UTC)
Received: from skuns.kiev.zoral.com.ua (localhost [127.0.0.1])
	by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id q54IJIpJ045754;
	Mon, 4 Jun 2012 21:19:18 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1])
	by deviant.kiev.zoral.com.ua (8.14.5/8.14.5) with ESMTP id
	q54IJH3K092775; Mon, 4 Jun 2012 21:19:17 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
Received: (from kostik@localhost)
	by deviant.kiev.zoral.com.ua (8.14.5/8.14.5/Submit) id q54IJHlL092774; 
	Mon, 4 Jun 2012 21:19:17 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to
	kostikbel@gmail.com using -f
Date: Mon, 4 Jun 2012 21:19:17 +0300
From: Konstantin Belousov <kostikbel@gmail.com>
To: John Baldwin <jhb@freebsd.org>
Message-ID: <20120604181917.GD85127@deviant.kiev.zoral.com.ua>
References: <CACfq090r1tWhuDkxdSZ24fwafbVKU0yduu1yV2+oYo+wwT4ipA@mail.gmail.com>
	<20120603051904.GG2358@deviant.kiev.zoral.com.ua>
	<20120603184315.T856@besplex.bde.org>
	<201206041101.57486.jhb@freebsd.org>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="mSxgbZZZvrAyzONB"
Content-Disposition: inline
In-Reply-To: <201206041101.57486.jhb@freebsd.org>
User-Agent: Mutt/1.4.2.3i
X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua
X-Virus-Status: Clean
X-Spam-Status: No, score=-4.0 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00
	autolearn=ham version=3.2.5
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on
	skuns.kiev.zoral.com.ua
Cc: Gianni <gianni@freebsd.org>, Alan Cox <alc@rice.edu>,
	Alexander Kabaev <kan@freebsd.org>,
	Attilio Rao <attilio@freebsd.org>, freebsd-arch@freebsd.org
Subject: Re: Fwd: [RFC] Kernel shared variables
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 04 Jun 2012 18:19:30 -0000


--mSxgbZZZvrAyzONB
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Mon, Jun 04, 2012 at 11:01:57AM -0400, John Baldwin wrote:
> On Sunday, June 03, 2012 6:49:27 am Bruce Evans wrote:
> > On Sun, 3 Jun 2012, Konstantin Belousov wrote:
> >=20
> > > On Sun, Jun 03, 2012 at 07:28:09AM +1000, Bruce Evans wrote:
> > >> On Sat, 2 Jun 2012, Konstantin Belousov wrote:
> > >>> ...
> > >>> In fact, I think that if the whole goal is only fast clocks, then we
> > >>> do not need any additional system mechanisms, since we can easily e=
xport
> > >>> coefficients for rdtsc formula already. E.g. we can put it into elf=
 auxv,
> > >>> which is ugly but bearable.
> > >>
> > >> How do you get the timehands offsets?  These only need to be updated
> > >> every second or so, or when used, but how can the application know
> > >> when they need to be updated if this is not done automatically in the
> > >> kernel by writing to a shared page?  I can only think of the
> > >> application arranging an alarm signal every second or so and updating
> > >> then.  No good for libraries.
> > > What is timehands offsets ? Do you mean things like leap seconds ?
> >=20
> > Yes.  binuptime() is:
> >=20
> > % void
> > % binuptime(struct bintime *bt)
> > % {
> > % 	struct timehands *th;
> > % 	u_int gen;
> > %=20
> > % 	do {
> > % 		th =3D timehands;
> > % 		gen =3D th->th_generation;
> > % 		*bt =3D th->th_offset;
> > % 		bintime_addx(bt, th->th_scale * tc_delta(th));
> > % 	} while (gen =3D=3D 0 || gen !=3D th->th_generation);
> > % }
> >=20
> > Without the kernel providing th->th_offset, you have to do lots of ntp
> > handling for yourself (compatibly with the kernel) just to get an
> > accuracy of 1 second.  Leap seconds don't affect CLOCK_MONOTONIC, but
> > they do affect CLOCK_REALTIME which is the clock id used by
> > gettimeofday().  For the former, you only have to advance the offset
> > for yourself occasionally (compatibly with the kernel) and manage
> > (compatibly with the kernel, especially in the long term) ntp slewing
> > and other syscall/sysctl kernel activity that micro-adjusts th->th_scal=
e.
>=20
> I think duplicating this logic in userland would just be wasteful.  I have
> a private fast gettimeofday() at my current job and it works by exporting
> the current timehands structure (well, the equivalent) to userland.  The
> userland bits then fetch a copy of the details and do the same as bintime=
().
> (I move the math (bintime_addx() and the multiply)) out of the loop howev=
er.
I started yesterday an implementation which uses shared page to export
some variant of timehands, and uses auxv to provide the libc with a pointer
to timehands when rdtsc is reasonable.

I almost finished both 32bit and 64bit userspace, but there is
kernel-side work left. Is your implementation ready or close to be ready
for commit ? In other words, should I drop the efforts, or continue ?

>=20
> > > This is indeed problematic for auxv. For auxv it could be solved by
> > > providing offset for next recheck using syscalls, and making libc cod=
e to
> > > respect this offset. But, I do think that vdso in shared page
> > > is the right solution, not auxv.
> >=20
> > timehands in a shared pages is close to working.  th_generation protects
> > things in the same way as in the kernel, modulo assumptions that writes
> > are ordered.
>=20
> It would work fine.  And in fact, having multiple timehands is actually a
> bug, not a feature.  It lets you compute bogus timestamps if you get pree=
mpted
> at the wrong time and end up with time jumping around.  At Yahoo! we redu=
ced
> the number of timehands structures down to 2 or some such, and I'm now of
> the opinion we should just have one and dispense with the entire array.
>=20
> For my userland case I only export a single timehands copy.
Well, I have to use two copies due to time_t ABI differences, one for
32, and one for 64-bit.

>=20
> > >> rdtsc is also very unportable, even on CPUs that have it.  But all o=
ther
> > >> x86 timecounter hardware is too slow if you want gettimeofday() to b=
e fast
> > >> and as accurate as it is now.
>=20
> For all the hardware where people run mysql and similar software that cal=
ls
> getimeofday() a lot, rdtsc() works just fine.
I also try to mimic kernel code as close as possible, so there are
two possible tsc counters, selection is managed by kernel, but the code
lives in libc or possible vdso. But I do not see immediate use for vdso
just for gettimeofday(2) and clock_gettime(2), although having vdso
to provide unwinding tables for signal trampolines is _very_ desirable.

>=20
> > > !rdtsc hardware is probably cannot be used at all due to need to prov=
ide
> > > usermode access to device registers. The mere presence of rdtsc does =
not
> > > means that usermode indeed can use it, it should be decided by kernel
> > > based on the current in-kernel time source. If rdtsc is not usable, t=
he
> > > corresponding data should not be exported, or implementation should go
> > > directly into syscall or whatever.
>=20
> Yes, the patches I have only work if the kernel uses the TSC as its main
> timecounter as well.
>=20
> > But then applications would:
> > - use gettimeofday() more than they should ("it works on Linux"), even
> >    more than now since when "it works on FreeBSD-x86" too
> > - just be slow when gettimeofday() is slow
> > - kludge around gettimeofday() being slow like they do now
> > - kludge around gettimeofday() being slow not like they do now (use more
> >    complications to probe it being slow).
>=20
> Some applications really need fine-grained timing with as little overhead
> as possible.
>=20
> --=20
> John Baldwin

--mSxgbZZZvrAyzONB
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (FreeBSD)

iEYEARECAAYFAk/M/CUACgkQC3+MBN1Mb4ghXwCgkPtKRATwrzKbJDD0j9LeoqLR
0/MAnRtpx6mS4HOad3y/lgGdV2bducK9
=zlG/
-----END PGP SIGNATURE-----

--mSxgbZZZvrAyzONB--

From owner-freebsd-arch@FreeBSD.ORG  Mon Jun  4 20:51:10 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 89F79106564A;
	Mon,  4 Jun 2012 20:51:10 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail16.syd.optusnet.com.au (mail16.syd.optusnet.com.au
	[211.29.132.197])
	by mx1.freebsd.org (Postfix) with ESMTP id 020DD8FC12;
	Mon,  4 Jun 2012 20:51:09 +0000 (UTC)
Received: from c122-106-171-232.carlnfd1.nsw.optusnet.com.au
	(c122-106-171-232.carlnfd1.nsw.optusnet.com.au [122.106.171.232])
	by mail16.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	q54Kp06q023119
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Tue, 5 Jun 2012 06:51:01 +1000
Date: Tue, 5 Jun 2012 06:51:00 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: John Baldwin <jhb@FreeBSD.org>
In-Reply-To: <201206041101.57486.jhb@freebsd.org>
Message-ID: <20120605054930.H3236@besplex.bde.org>
References: <CACfq090r1tWhuDkxdSZ24fwafbVKU0yduu1yV2+oYo+wwT4ipA@mail.gmail.com>
	<20120603051904.GG2358@deviant.kiev.zoral.com.ua>
	<20120603184315.T856@besplex.bde.org>
	<201206041101.57486.jhb@freebsd.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: Gianni <gianni@FreeBSD.org>, Alan Cox <alc@rice.edu>,
	Alexander Kabaev <kan@FreeBSD.org>, Attilio Rao <attilio@FreeBSD.org>,
	Konstantin Belousov <kib@FreeBSD.org>, freebsd-arch@FreeBSD.org,
	Konstantin Belousov <kostikbel@gmail.com>
Subject: Re: Fwd: [RFC] Kernel shared variables
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 04 Jun 2012 20:51:10 -0000

On Mon, 4 Jun 2012, John Baldwin wrote:

> On Sunday, June 03, 2012 6:49:27 am Bruce Evans wrote:
>> On Sun, 3 Jun 2012, Konstantin Belousov wrote:
>>> What is timehands offsets ? Do you mean things like leap seconds ?
>>
>> Yes.  binuptime() is:
>>
>> % void
>> % binuptime(struct bintime *bt)
>> % {
>> % 	struct timehands *th;
>> % 	u_int gen;
>> %
>> % 	do {
>> % 		th = timehands;
>> % 		gen = th->th_generation;
>> % 		*bt = th->th_offset;
>> % 		bintime_addx(bt, th->th_scale * tc_delta(th));
>> % 	} while (gen == 0 || gen != th->th_generation);
>> % }
>>
>> Without the kernel providing th->th_offset, you have to do lots of ntp
>> handling for yourself (compatibly with the kernel) just to get an
>> accuracy of 1 second.  Leap seconds don't affect CLOCK_MONOTONIC, but
>> they do affect CLOCK_REALTIME which is the clock id used by
>> gettimeofday().  For the former, you only have to advance the offset
>> for yourself occasionally (compatibly with the kernel) and manage
>> (compatibly with the kernel, especially in the long term) ntp slewing
>> and other syscall/sysctl kernel activity that micro-adjusts th->th_scale.
>
> I think duplicating this logic in userland would just be wasteful.  I have

Sure.  I modestly proposed it.

> a private fast gettimeofday() at my current job and it works by exporting
> the current timehands structure (well, the equivalent) to userland.  The
> userland bits then fetch a copy of the details and do the same as bintime().

How do you keep this up to date, especially for leap seconds?

> (I move the math (bintime_addx() and the multiply)) out of the loop however.

My version has a comment saying to do that, but I just noticed that
it wouldn't work so well -- the timehands fields would have to be
copied to local variables while under protection of the generation
count, so it would give messier code to optimize a case that occurs
_very_ rarely.

>> timehands in a shared pages is close to working.  th_generation protects
>> things in the same way as in the kernel, modulo assumptions that writes
>> are ordered.
>
> It would work fine.  And in fact, having multiple timehands is actually a
> bug, not a feature.  It lets you compute bogus timestamps if you get preempted
> at the wrong time and end up with time jumping around.  At Yahoo! we reduced
> the number of timehands structures down to 2 or some such, and I'm now of
> the opinion we should just have one and dispense with the entire array.

No, it is a feature.  The time should never jump around (backwards), but
it can easily jump forwards.  It makes little difference if preemption
occurs after the timehands have been read, or while reading them but in
such a way that the timehands become stale during preemption but not stale
enough for their generation to change so that you notice that they are
stale -- you get a stale timestamp either way (with staleness approximately
the preemption time).  Times read by different threads can easily have
different staleness according to which timehands they ended up using and
this may be quite different from which timehands they started using and
from which timehands is active after they return.  Perhaps this is what
you mean.  But again, this happens anyway when the preemption occurs after
the timehands have been read.

The main point of timehands was originally to give a copy of the time
that was stable for a time hopefully long enough for the timehands to be
read without them being clobbered by an update.  binuptime() was:

1.59         (phk      26-Mar-98): void
1.113        (phk      07-Feb-02): binuptime(struct bintime *bt)
1.113        (phk      07-Feb-02): {
1.113        (phk      07-Feb-02): 	struct timecounter *tc;
1.113        (phk      07-Feb-02): 
1.113        (phk      07-Feb-02): 	tc = timecounter;
1.113        (phk      07-Feb-02): 	*bt = tc->tc_offset;
1.113        (phk      07-Feb-02): 	bintime_addx(bt, tc->tc_scale * tco_delta(tc));
1.113        (phk      07-Feb-02): }

This has an obvious race if the thread running this is preempted for a long
time, so that the copy of the time is actually not stable for long enough.
This was fixed (except I think in some cases using ddb) by using the
generation count.

With the generation count, multiple timehands are probably unnecessary,
but they reduce locking bugs (no memory ordering for the generation count)
and give the optimization that binuptime() etc. doesn't have to spin
waiting for updates.  Now it is the thread doing the updates that gets
the most advantanges from multiple timehands.  It doesn't have to worry
much about locking, or being preempted, or blocking for a long time, since 
it knows that binuptime() etc. will keep using a previous generation
safely and not busy-wait for it, provided only that it doesn't block for
so long that the oldest previous generation doesn't become too old to
work.  2 timehands are probably enough for this, but 1 isn't.

> For my userland case I only export a single timehands copy.

So readers block for a long time if the writer is updating and the
writer blocks?  Works best for UP :-).  Actually, there are problems
in the kernel even for UP.  Consider the writer doing an update and
being preempted by ddb, and ddb using binuptime(), though it shouldn't.
This is deadlock if there is only 1 timehands.  My version runs the
update as a normal interrupt handler so that it can be interrupted
by fast interrupt handlers.  This gives similar problems -- fast
interrupt handlers shouldn't call binuptime() either (this can
deadlock in the timecounter hardware function for at least the
i8254 timecounter), but they do and this is useful for things like
timestamps from serial hardware.  Multiple timehands at least limit
this problem.  Applications have similar problems (more like my
kernel version since applications can't get as exclusive as access
as a fast interrupt handler can).

>>>> rdtsc is also very unportable, even on CPUs that have it.  But all other
>>>> x86 timecounter hardware is too slow if you want gettimeofday() to be fast
>>>> and as accurate as it is now.
>
> For all the hardware where people run mysql and similar software that calls
> getimeofday() a lot, rdtsc() works just fine.

That wasn't the case until recently (except 10-15 years ago for UP with
no SMM).  Someone just fixed rdtsc()-based time function in dtrace.  It
tries to add a per-cpu rdtsc() offset, but the offset was backwards.  It
takes P-state invariance and maybe more for the offset to be 0 and
not drift.

>>> !rdtsc hardware is probably cannot be used at all due to need to provide
>>> usermode access to device registers. The mere presence of rdtsc does not
>>> means that usermode indeed can use it, it should be decided by kernel
>>> based on the current in-kernel time source. If rdtsc is not usable, the
>>> corresponding data should not be exported, or implementation should go
>>> directly into syscall or whatever.
>
> Yes, the patches I have only work if the kernel uses the TSC as its main
> timecounter as well.

The detail I miss most is the TSC being available for use in userland
even if it is not the primary timecounter.  Maybe it its quality is
enough for the application, or the application can fix it up using
per-cpu offsets.

Bruce

From owner-freebsd-arch@FreeBSD.ORG  Mon Jun  4 21:16:12 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 2B1FF1065672;
	Mon,  4 Jun 2012 21:16:12 +0000 (UTC)
	(envelope-from giovanni.trematerra@gmail.com)
Received: from mail-qa0-f49.google.com (mail-qa0-f49.google.com
	[209.85.216.49])
	by mx1.freebsd.org (Postfix) with ESMTP id 8D0B48FC17;
	Mon,  4 Jun 2012 21:16:11 +0000 (UTC)
Received: by qabj40 with SMTP id j40so2205084qab.15
	for <multiple recipients>; Mon, 04 Jun 2012 14:16:11 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
	h=mime-version:sender:in-reply-to:references:date
	:x-google-sender-auth:message-id:subject:from:to:cc:content-type
	:content-transfer-encoding;
	bh=47d5Z5JgH9ySqYO2XKNw1/VzpGOBNoLmoXkzW0H/sQ4=;
	b=vfoNg9FxHW016viznrJ7FWcwnWn2FHqz+57u/FedhZT0zKs8zFGUu2SKBbcdhgt21/
	BymWkbKcqUHWOgBIcCGKmPX6F8xDrFLLjLYwUa3sO8bYGM+5tf6uC9/YIlPQ+Wg/77qU
	Y0uktFvXhURuYhyt/WV1EizZOQMrxllFrR5Zgr/J2Z0RFYiS08/MA3y/6635azStnGFJ
	VTEUdaD6kPwJnpU754WjNjXZIwry4wUphANHxiJ2b/pBHvHbcffuclP+Z8uAOSSpXZgL
	NPhthxgEk8aP/dVC5QC7LtpUFRoVJF5UjUfAWib6ayr5/Xee8fsXgov8c9efX/dJV+E8
	I3ag==
MIME-Version: 1.0
Received: by 10.224.202.8 with SMTP id fc8mr14783196qab.40.1338844570879; Mon,
	04 Jun 2012 14:16:10 -0700 (PDT)
Sender: giovanni.trematerra@gmail.com
Received: by 10.229.160.20 with HTTP; Mon, 4 Jun 2012 14:16:10 -0700 (PDT)
In-Reply-To: <20120604181917.GD85127@deviant.kiev.zoral.com.ua>
References: <CACfq090r1tWhuDkxdSZ24fwafbVKU0yduu1yV2+oYo+wwT4ipA@mail.gmail.com>
	<20120603051904.GG2358@deviant.kiev.zoral.com.ua>
	<20120603184315.T856@besplex.bde.org>
	<201206041101.57486.jhb@freebsd.org>
	<20120604181917.GD85127@deviant.kiev.zoral.com.ua>
Date: Mon, 4 Jun 2012 23:16:10 +0200
X-Google-Sender-Auth: c7e69wjeNf-PVXITIT-QPF4N8XM
Message-ID: <CACfq0933BwaverZinGvKtErPvdZp+4jQRUFQukK9V_QemRsW9g@mail.gmail.com>
From: Giovanni Trematerra <gianni@freebsd.org>
To: Konstantin Belousov <kostikbel@gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Cc: Alan Cox <alc@rice.edu>, Alexander Kabaev <kan@freebsd.org>,
	Attilio Rao <attilio@freebsd.org>, freebsd-arch@freebsd.org
Subject: Re: Fwd: [RFC] Kernel shared variables
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 04 Jun 2012 21:16:12 -0000

On Mon, Jun 4, 2012 at 8:19 PM, Konstantin Belousov <kostikbel@gmail.com> w=
rote:
> On Mon, Jun 04, 2012 at 11:01:57AM -0400, John Baldwin wrote:
>> On Sunday, June 03, 2012 6:49:27 am Bruce Evans wrote:
>> > On Sun, 3 Jun 2012, Konstantin Belousov wrote:

>> I think duplicating this logic in userland would just be wasteful. =A0I =
have
>> a private fast gettimeofday() at my current job and it works by exportin=
g
>> the current timehands structure (well, the equivalent) to userland. =A0T=
he
>> userland bits then fetch a copy of the details and do the same as bintim=
e().
>> (I move the math (bintime_addx() and the multiply)) out of the loop howe=
ver.
> I started yesterday an implementation which uses shared page to export
> some variant of timehands, and uses auxv to provide the libc with a point=
er
> to timehands when rdtsc is reasonable.
>
> I almost finished both 32bit and 64bit userspace, but there is
> kernel-side work left. Is your implementation ready or close to be ready
> for commit ? In other words, should I drop the efforts, or continue ?
>

Hey wait, What are you doing?
This is completely unfair. You didn't even review my patch.
I really don't understand your way to completely ignore me and start implem=
ent
yesterday something you didn't care about for more than 3 years.
It costs me a lot of time and energy and I think I deserve more respect tha=
t
just be ignored.

--
Gianni

From owner-freebsd-arch@FreeBSD.ORG  Mon Jun  4 21:30:16 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id DC4501065678;
	Mon,  4 Jun 2012 21:30:15 +0000 (UTC) (envelope-from jhb@freebsd.org)
Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net
	[IPv6:2001:470:1f10:75::2])
	by mx1.freebsd.org (Postfix) with ESMTP id 96ECE8FC14;
	Mon,  4 Jun 2012 21:30:15 +0000 (UTC)
Received: from jhbbsd.localnet (unknown [209.249.190.124])
	by bigwig.baldwin.cx (Postfix) with ESMTPSA id ECFD9B94F;
	Mon,  4 Jun 2012 17:30:14 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: Konstantin Belousov <kostikbel@gmail.com>
Date: Mon, 4 Jun 2012 17:22:07 -0400
User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p13; KDE/4.5.5; amd64; ; )
References: <CACfq090r1tWhuDkxdSZ24fwafbVKU0yduu1yV2+oYo+wwT4ipA@mail.gmail.com>
	<201206041101.57486.jhb@freebsd.org>
	<20120604181917.GD85127@deviant.kiev.zoral.com.ua>
In-Reply-To: <20120604181917.GD85127@deviant.kiev.zoral.com.ua>
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="iso-8859-15"
Content-Transfer-Encoding: 7bit
Message-Id: <201206041722.07269.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7
	(bigwig.baldwin.cx); Mon, 04 Jun 2012 17:30:15 -0400 (EDT)
Cc: Gianni <gianni@freebsd.org>, Alan Cox <alc@rice.edu>,
	Alexander Kabaev <kan@freebsd.org>,
	Attilio Rao <attilio@freebsd.org>, freebsd-arch@freebsd.org
Subject: Re: Fwd: [RFC] Kernel shared variables
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 04 Jun 2012 21:30:16 -0000

On Monday, June 04, 2012 2:19:17 pm Konstantin Belousov wrote:
> On Mon, Jun 04, 2012 at 11:01:57AM -0400, John Baldwin wrote:
> > On Sunday, June 03, 2012 6:49:27 am Bruce Evans wrote:
> > > On Sun, 3 Jun 2012, Konstantin Belousov wrote:
> > > 
> > > > On Sun, Jun 03, 2012 at 07:28:09AM +1000, Bruce Evans wrote:
> > > >> On Sat, 2 Jun 2012, Konstantin Belousov wrote:
> > > >>> ...
> > > >>> In fact, I think that if the whole goal is only fast clocks, then we
> > > >>> do not need any additional system mechanisms, since we can easily export
> > > >>> coefficients for rdtsc formula already. E.g. we can put it into elf auxv,
> > > >>> which is ugly but bearable.
> > > >>
> > > >> How do you get the timehands offsets?  These only need to be updated
> > > >> every second or so, or when used, but how can the application know
> > > >> when they need to be updated if this is not done automatically in the
> > > >> kernel by writing to a shared page?  I can only think of the
> > > >> application arranging an alarm signal every second or so and updating
> > > >> then.  No good for libraries.
> > > > What is timehands offsets ? Do you mean things like leap seconds ?
> > > 
> > > Yes.  binuptime() is:
> > > 
> > > % void
> > > % binuptime(struct bintime *bt)
> > > % {
> > > % 	struct timehands *th;
> > > % 	u_int gen;
> > > % 
> > > % 	do {
> > > % 		th = timehands;
> > > % 		gen = th->th_generation;
> > > % 		*bt = th->th_offset;
> > > % 		bintime_addx(bt, th->th_scale * tc_delta(th));
> > > % 	} while (gen == 0 || gen != th->th_generation);
> > > % }
> > > 
> > > Without the kernel providing th->th_offset, you have to do lots of ntp
> > > handling for yourself (compatibly with the kernel) just to get an
> > > accuracy of 1 second.  Leap seconds don't affect CLOCK_MONOTONIC, but
> > > they do affect CLOCK_REALTIME which is the clock id used by
> > > gettimeofday().  For the former, you only have to advance the offset
> > > for yourself occasionally (compatibly with the kernel) and manage
> > > (compatibly with the kernel, especially in the long term) ntp slewing
> > > and other syscall/sysctl kernel activity that micro-adjusts th->th_scale.
> > 
> > I think duplicating this logic in userland would just be wasteful.  I have
> > a private fast gettimeofday() at my current job and it works by exporting
> > the current timehands structure (well, the equivalent) to userland.  The
> > userland bits then fetch a copy of the details and do the same as bintime().
> > (I move the math (bintime_addx() and the multiply)) out of the loop however.
> I started yesterday an implementation which uses shared page to export
> some variant of timehands, and uses auxv to provide the libc with a pointer
> to timehands when rdtsc is reasonable.
> 
> I almost finished both 32bit and 64bit userspace, but there is
> kernel-side work left. Is your implementation ready or close to be ready
> for commit ? In other words, should I drop the efforts, or continue ?

No, mine is not general purpose.  I'll see if I can make a public patch of what
it looks like.

-- 
John Baldwin

From owner-freebsd-arch@FreeBSD.ORG  Mon Jun  4 21:30:16 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 5B8D4106567F;
	Mon,  4 Jun 2012 21:30:16 +0000 (UTC) (envelope-from jhb@freebsd.org)
Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net
	[IPv6:2001:470:1f10:75::2])
	by mx1.freebsd.org (Postfix) with ESMTP id 2B7358FC08;
	Mon,  4 Jun 2012 21:30:16 +0000 (UTC)
Received: from jhbbsd.localnet (unknown [209.249.190.124])
	by bigwig.baldwin.cx (Postfix) with ESMTPSA id 86A4FB95B;
	Mon,  4 Jun 2012 17:30:15 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: Giovanni Trematerra <gianni@freebsd.org>
Date: Mon, 4 Jun 2012 17:23:49 -0400
User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p13; KDE/4.5.5; amd64; ; )
References: <CACfq090r1tWhuDkxdSZ24fwafbVKU0yduu1yV2+oYo+wwT4ipA@mail.gmail.com>
	<20120604181917.GD85127@deviant.kiev.zoral.com.ua>
	<CACfq0933BwaverZinGvKtErPvdZp+4jQRUFQukK9V_QemRsW9g@mail.gmail.com>
In-Reply-To: <CACfq0933BwaverZinGvKtErPvdZp+4jQRUFQukK9V_QemRsW9g@mail.gmail.com>
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Message-Id: <201206041723.49562.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7
	(bigwig.baldwin.cx); Mon, 04 Jun 2012 17:30:15 -0400 (EDT)
Cc: Alan Cox <alc@rice.edu>, Alexander Kabaev <kan@freebsd.org>,
	Attilio Rao <attilio@freebsd.org>, freebsd-arch@freebsd.org,
	Konstantin Belousov <kostikbel@gmail.com>
Subject: Re: Fwd: [RFC] Kernel shared variables
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 04 Jun 2012 21:30:16 -0000

On Monday, June 04, 2012 5:16:10 pm Giovanni Trematerra wrote:
> On Mon, Jun 4, 2012 at 8:19 PM, Konstantin Belousov <kostikbel@gmail.com> wrote:
> > On Mon, Jun 04, 2012 at 11:01:57AM -0400, John Baldwin wrote:
> >> On Sunday, June 03, 2012 6:49:27 am Bruce Evans wrote:
> >> > On Sun, 3 Jun 2012, Konstantin Belousov wrote:
> 
> >> I think duplicating this logic in userland would just be wasteful.  I have
> >> a private fast gettimeofday() at my current job and it works by exporting
> >> the current timehands structure (well, the equivalent) to userland.  The
> >> userland bits then fetch a copy of the details and do the same as bintime().
> >> (I move the math (bintime_addx() and the multiply)) out of the loop however.
> > I started yesterday an implementation which uses shared page to export
> > some variant of timehands, and uses auxv to provide the libc with a pointer
> > to timehands when rdtsc is reasonable.
> >
> > I almost finished both 32bit and 64bit userspace, but there is
> > kernel-side work left. Is your implementation ready or close to be ready
> > for commit ? In other words, should I drop the efforts, or continue ?
> >
> 
> Hey wait, What are you doing?
> This is completely unfair. You didn't even review my patch.
> I really don't understand your way to completely ignore me and start implement
> yesterday something you didn't care about for more than 3 years.
> It costs me a lot of time and energy and I think I deserve more respect that
> just be ignored.

In fairness, I would not be able to use your version of gettimeofday().  My
application requires something where we can interpolate based on the value of
rdtsc().

Also, I don't really see the need to export anything other than the details to
make gettimeofday() faster.  I don't see a practical need for using shared
variables for getpid(), getpgid(), getppid(), getuid(), or the like.

-- 
John Baldwin

From owner-freebsd-arch@FreeBSD.ORG  Mon Jun  4 21:30:17 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 22B1C106566C;
	Mon,  4 Jun 2012 21:30:17 +0000 (UTC) (envelope-from jhb@freebsd.org)
Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net
	[IPv6:2001:470:1f10:75::2])
	by mx1.freebsd.org (Postfix) with ESMTP id BCE4A8FC0A;
	Mon,  4 Jun 2012 21:30:16 +0000 (UTC)
Received: from jhbbsd.localnet (unknown [209.249.190.124])
	by bigwig.baldwin.cx (Postfix) with ESMTPSA id 1ABA8B9A3;
	Mon,  4 Jun 2012 17:30:16 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: Bruce Evans <brde@optusnet.com.au>
Date: Mon, 4 Jun 2012 17:30:05 -0400
User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p13; KDE/4.5.5; amd64; ; )
References: <CACfq090r1tWhuDkxdSZ24fwafbVKU0yduu1yV2+oYo+wwT4ipA@mail.gmail.com>
	<201206041101.57486.jhb@freebsd.org>
	<20120605054930.H3236@besplex.bde.org>
In-Reply-To: <20120605054930.H3236@besplex.bde.org>
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Message-Id: <201206041730.05478.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7
	(bigwig.baldwin.cx); Mon, 04 Jun 2012 17:30:16 -0400 (EDT)
Cc: Gianni <gianni@freebsd.org>, Alan Cox <alc@rice.edu>,
	Alexander Kabaev <kan@freebsd.org>, Attilio Rao <attilio@freebsd.org>,
	Konstantin Belousov <kib@freebsd.org>, freebsd-arch@freebsd.org,
	Konstantin Belousov <kostikbel@gmail.com>
Subject: Re: Fwd: [RFC] Kernel shared variables
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 04 Jun 2012 21:30:17 -0000

On Monday, June 04, 2012 4:51:00 pm Bruce Evans wrote:
> On Mon, 4 Jun 2012, John Baldwin wrote:
> > On Sunday, June 03, 2012 6:49:27 am Bruce Evans wrote:
> >> On Sun, 3 Jun 2012, Konstantin Belousov wrote:
> >>> What is timehands offsets ? Do you mean things like leap seconds ?
> >>
> >> Yes.  binuptime() is:
> >>
> >> % void
> >> % binuptime(struct bintime *bt)
> >> % {
> >> % 	struct timehands *th;
> >> % 	u_int gen;
> >> %
> >> % 	do {
> >> % 		th = timehands;
> >> % 		gen = th->th_generation;
> >> % 		*bt = th->th_offset;
> >> % 		bintime_addx(bt, th->th_scale * tc_delta(th));
> >> % 	} while (gen == 0 || gen != th->th_generation);
> >> % }
> >>
> >> Without the kernel providing th->th_offset, you have to do lots of ntp
> >> handling for yourself (compatibly with the kernel) just to get an
> >> accuracy of 1 second.  Leap seconds don't affect CLOCK_MONOTONIC, but
> >> they do affect CLOCK_REALTIME which is the clock id used by
> >> gettimeofday().  For the former, you only have to advance the offset
> >> for yourself occasionally (compatibly with the kernel) and manage
> >> (compatibly with the kernel, especially in the long term) ntp slewing
> >> and other syscall/sysctl kernel activity that micro-adjusts th->th_scale.
> >
> > I think duplicating this logic in userland would just be wasteful.  I have
> 
> Sure.  I modestly proposed it.
> 
> > a private fast gettimeofday() at my current job and it works by exporting
> > the current timehands structure (well, the equivalent) to userland.  The
> > userland bits then fetch a copy of the details and do the same as bintime().
> 
> How do you keep this up to date, especially for leap seconds?

I added a hack to tc_windup() where it updates the shared copy of the variables
with the results of the tc_windup() call each time it is invoked.

> My version has a comment saying to do that, but I just noticed that
> it wouldn't work so well -- the timehands fields would have to be
> copied to local variables while under protection of the generation
> count, so it would give messier code to optimize a case that occurs
> _very_ rarely.

It's not that messy in my experience.

> >> timehands in a shared pages is close to working.  th_generation protects
> >> things in the same way as in the kernel, modulo assumptions that writes
> >> are ordered.
> >
> > It would work fine.  And in fact, having multiple timehands is actually a
> > bug, not a feature.  It lets you compute bogus timestamps if you get preempted
> > at the wrong time and end up with time jumping around.  At Yahoo! we reduced
> > the number of timehands structures down to 2 or some such, and I'm now of
> > the opinion we should just have one and dispense with the entire array.
> 
> No, it is a feature.  The time should never jump around (backwards), but
> it can easily jump forwards.  It makes little difference if preemption
> occurs after the timehands have been read, or while reading them but in
> such a way that the timehands become stale during preemption but not stale
> enough for their generation to change so that you notice that they are
> stale -- you get a stale timestamp either way (with staleness approximately
> the preemption time).  Times read by different threads can easily have
> different staleness according to which timehands they ended up using and
> this may be quite different from which timehands they started using and
> from which timehands is active after they return.  Perhaps this is what
> you mean.  But again, this happens anyway when the preemption occurs after
> the timehands have been read.

Time definitely jumped backwards at Yahoo!.  The problem case was when NTP
was adjusting the time, so if you used a timehands structure that was a
few generations old (stale), you could have a fairly large component that
was (delta * scale).  If the scale had slowed down in subsequent updates,
then the computed time would jump out into the future.  On the next time
update with a newer timehands, the effective base was less than the previous
calculation thought it should have been, and the scale was smaller, so the
end result if the TSC had not advanced very far was for the new time to be
less than the previous time, and thus time jumped backwards.

> The main point of timehands was originally to give a copy of the time
> that was stable for a time hopefully long enough for the timehands to be
> read without them being clobbered by an update.  binuptime() was:
> 
> 1.59         (phk      26-Mar-98): void
> 1.113        (phk      07-Feb-02): binuptime(struct bintime *bt)
> 1.113        (phk      07-Feb-02): {
> 1.113        (phk      07-Feb-02): 	struct timecounter *tc;
> 1.113        (phk      07-Feb-02): 
> 1.113        (phk      07-Feb-02): 	tc = timecounter;
> 1.113        (phk      07-Feb-02): 	*bt = tc->tc_offset;
> 1.113        (phk      07-Feb-02): 	bintime_addx(bt, tc->tc_scale * tco_delta(tc));
> 1.113        (phk      07-Feb-02): }
> 
> This has an obvious race if the thread running this is preempted for a long
> time, so that the copy of the time is actually not stable for long enough.
> This was fixed (except I think in some cases using ddb) by using the
> generation count.

The problem with having too many timehands structures is you can get a stable
timehands structure that is too stale.

> > For my userland case I only export a single timehands copy.
> 
> So readers block for a long time if the writer is updating and the
> writer blocks?  Works best for UP :-).

The update to the shared timehands structure does not take a long time,
specifically for userland it does not require all of tc_windup()'s
execution time, merely the time to update the values.

> > For all the hardware where people run mysql and similar software that calls
> > getimeofday() a lot, rdtsc() works just fine.
> 
> That wasn't the case until recently (except 10-15 years ago for UP with
> no SMM).  Someone just fixed rdtsc()-based time function in dtrace.  It
> tries to add a per-cpu rdtsc() offset, but the offset was backwards.  It
> takes P-state invariance and maybe more for the offset to be 0 and
> not drift.

I do have the luxury of using fairly modern Intel CPUs at work, and all of them
have invariant TSCs.

-- 
John Baldwin

From owner-freebsd-arch@FreeBSD.ORG  Mon Jun  4 22:12:27 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 0B9111065673;
	Mon,  4 Jun 2012 22:12:27 +0000 (UTC)
	(envelope-from kostikbel@gmail.com)
Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200])
	by mx1.freebsd.org (Postfix) with ESMTP id 9554D8FC1B;
	Mon,  4 Jun 2012 22:12:26 +0000 (UTC)
Received: from skuns.kiev.zoral.com.ua (localhost [127.0.0.1])
	by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id q54MC30n091092;
	Tue, 5 Jun 2012 01:12:03 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1])
	by deviant.kiev.zoral.com.ua (8.14.5/8.14.5) with ESMTP id
	q54MC2ej094220; Tue, 5 Jun 2012 01:12:02 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
Received: (from kostik@localhost)
	by deviant.kiev.zoral.com.ua (8.14.5/8.14.5/Submit) id q54MC284094219; 
	Tue, 5 Jun 2012 01:12:02 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to
	kostikbel@gmail.com using -f
Date: Tue, 5 Jun 2012 01:12:02 +0300
From: Konstantin Belousov <kostikbel@gmail.com>
To: Giovanni Trematerra <gianni@freebsd.org>
Message-ID: <20120604221202.GG85127@deviant.kiev.zoral.com.ua>
References: <CACfq090r1tWhuDkxdSZ24fwafbVKU0yduu1yV2+oYo+wwT4ipA@mail.gmail.com>
	<20120603051904.GG2358@deviant.kiev.zoral.com.ua>
	<20120603184315.T856@besplex.bde.org>
	<201206041101.57486.jhb@freebsd.org>
	<20120604181917.GD85127@deviant.kiev.zoral.com.ua>
	<CACfq0933BwaverZinGvKtErPvdZp+4jQRUFQukK9V_QemRsW9g@mail.gmail.com>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="Y+xroYBkGM9OatJL"
Content-Disposition: inline
In-Reply-To: <CACfq0933BwaverZinGvKtErPvdZp+4jQRUFQukK9V_QemRsW9g@mail.gmail.com>
User-Agent: Mutt/1.4.2.3i
X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua
X-Virus-Status: Clean
X-Spam-Status: No, score=-4.0 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00
	autolearn=ham version=3.2.5
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on
	skuns.kiev.zoral.com.ua
Cc: Alan Cox <alc@rice.edu>, Alexander Kabaev <kan@freebsd.org>,
	Attilio Rao <attilio@freebsd.org>, freebsd-arch@freebsd.org
Subject: Re: Fwd: [RFC] Kernel shared variables
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 04 Jun 2012 22:12:27 -0000


--Y+xroYBkGM9OatJL
Content-Type: text/plain; charset=koi8-r
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Mon, Jun 04, 2012 at 11:16:10PM +0200, Giovanni Trematerra wrote:
> On Mon, Jun 4, 2012 at 8:19 PM, Konstantin Belousov <kostikbel@gmail.com>=
 wrote:
> > On Mon, Jun 04, 2012 at 11:01:57AM -0400, John Baldwin wrote:
> >> On Sunday, June 03, 2012 6:49:27 am Bruce Evans wrote:
> >> > On Sun, 3 Jun 2012, Konstantin Belousov wrote:
>=20
> >> I think duplicating this logic in userland would just be wasteful. =9A=
I have
> >> a private fast gettimeofday() at my current job and it works by export=
ing
> >> the current timehands structure (well, the equivalent) to userland. =
=9AThe
> >> userland bits then fetch a copy of the details and do the same as bint=
ime().
> >> (I move the math (bintime_addx() and the multiply)) out of the loop ho=
wever.
> > I started yesterday an implementation which uses shared page to export
> > some variant of timehands, and uses auxv to provide the libc with a poi=
nter
> > to timehands when rdtsc is reasonable.
> >
> > I almost finished both 32bit and 64bit userspace, but there is
> > kernel-side work left. Is your implementation ready or close to be ready
> > for commit ? In other words, should I drop the efforts, or continue ?
> >
>=20
> Hey wait, What are you doing?
> This is completely unfair. You didn't even review my patch.
I did. I am quite saddened if you did not note that I did reviewed your
patch.

> I really don't understand your way to completely ignore me and start impl=
ement
> yesterday something you didn't care about for more than 3 years.
> It costs me a lot of time and energy and I think I deserve more respect t=
hat
> just be ignored.

I did not ignored the problem for 3 years. In fact, I did some, IMO
non-trivial development moving the whole issue forward. In particular, I
developed the shared page infrastructure that are currently used (yes,
we already do have properly implemented shared page and sub-allocator
of memory from it). I did some relevant rtld and libc changes, in
particular, libc now have full access and uses auxv. So I consider this
statement as a form of insult.

I indeed never had much desire to delve into the timekeeping code.
But periodically raising discussions, and final flamefest about the
issue made me realize that I spent more efforts discussing the 'shared
page' 'idea' then it would be to implement fast gettimeofday() and
clock_gettime() using existing infrastructure.

Having my hands somewhat deep into our ABI/ELF everything, I very much want
to not paint myself into corner with unsustaining decisions that make
ABI maintainance problematic. So I decided to save my time and implement
it 'properly', to close the question and possibly remove the item from
the ideas page.

Please note that what I do now is still not a vdso. It does allow the
vdso to plug into the framework later, but currently I only plan to
reuse shared page and auxv transport to implement gettimeofday without
usermode->kernelmode trip.

If you want to do full-flesged vdso at once, I will be very much pleasured
and probably support this work technically. For posted patch, I do respond
with the NACK.

--Y+xroYBkGM9OatJL
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (FreeBSD)

iEYEARECAAYFAk/NMrIACgkQC3+MBN1Mb4i3NACePjulGq8ZJL/dXcHjRCmvf3M7
1EIAnjkQGFTHATGrwScdXfF08wQ19zzp
=FgPt
-----END PGP SIGNATURE-----

--Y+xroYBkGM9OatJL--

From owner-freebsd-arch@FreeBSD.ORG  Mon Jun  4 22:42:59 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 391F31065670;
	Mon,  4 Jun 2012 22:42:59 +0000 (UTC)
	(envelope-from giovanni.trematerra@gmail.com)
Received: from mail-qa0-f47.google.com (mail-qa0-f47.google.com
	[209.85.216.47])
	by mx1.freebsd.org (Postfix) with ESMTP id 9E9CC8FC14;
	Mon,  4 Jun 2012 22:42:58 +0000 (UTC)
Received: by qabg1 with SMTP id g1so1910309qab.13
	for <multiple recipients>; Mon, 04 Jun 2012 15:42:57 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type:content-transfer-encoding;
	bh=NhlwsO0jlVnsBdzv3UwsBnq8l/AWNtwdk9Bs4WjfqFA=;
	b=MjHpZ8KNIGKk2J1PzwPs5hU9YqUWLjSDoagbq/n0KEjyuk8PBt3WRvm9YCEMq86vo6
	iHCofrZP4hiST1apkeJ/qwd8RXi5cDjW+2dFq+kKXxQn6P32I4LyuRCp/bActbLe/mZf
	hQP3GGA4LQ+Dau0IPnYrhYX6qoe7TxHMC1m9FmBRZNCKS4ex/bklqczJ3PTZ0/CF61ZU
	FneWSLEmSLeJ2aIMPQD6Hh8aPD4geCogHpuGsVV2rX5qBaDRgAUUHUuqvYL+2TwEjXjd
	OTeRSXwVV7r08MLQ2oXJ1+zzrziCWDQJI+BjaGaSIbFEAFM9XaI8auoQPZ8ksiRumx4x
	qAHw==
MIME-Version: 1.0
Received: by 10.229.137.14 with SMTP id u14mr4365698qct.87.1338849777748; Mon,
	04 Jun 2012 15:42:57 -0700 (PDT)
Received: by 10.229.160.20 with HTTP; Mon, 4 Jun 2012 15:42:57 -0700 (PDT)
In-Reply-To: <20120604221202.GG85127@deviant.kiev.zoral.com.ua>
References: <CACfq090r1tWhuDkxdSZ24fwafbVKU0yduu1yV2+oYo+wwT4ipA@mail.gmail.com>
	<20120603051904.GG2358@deviant.kiev.zoral.com.ua>
	<20120603184315.T856@besplex.bde.org>
	<201206041101.57486.jhb@freebsd.org>
	<20120604181917.GD85127@deviant.kiev.zoral.com.ua>
	<CACfq0933BwaverZinGvKtErPvdZp+4jQRUFQukK9V_QemRsW9g@mail.gmail.com>
	<20120604221202.GG85127@deviant.kiev.zoral.com.ua>
Date: Tue, 5 Jun 2012 00:42:57 +0200
Message-ID: <CACfq0932i9sKzwcjkZ2DoWqCcmh3U_CNK=V6e160iH3_hUH85Q@mail.gmail.com>
From: Giovanni Trematerra <giovanni.trematerra@gmail.com>
To: Konstantin Belousov <kostikbel@gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Cc: Alan Cox <alc@rice.edu>, Alexander Kabaev <kan@freebsd.org>,
	Attilio Rao <attilio@freebsd.org>, freebsd-arch@freebsd.org
Subject: Re: Fwd: [RFC] Kernel shared variables
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 04 Jun 2012 22:42:59 -0000

On Tue, Jun 5, 2012 at 12:12 AM, Konstantin Belousov
<kostikbel@gmail.com> wrote:
> On Mon, Jun 04, 2012 at 11:16:10PM +0200, Giovanni Trematerra wrote:
>> On Mon, Jun 4, 2012 at 8:19 PM, Konstantin Belousov <kostikbel@gmail.com=
> wrote:
>> > On Mon, Jun 04, 2012 at 11:01:57AM -0400, John Baldwin wrote:
>> >> On Sunday, June 03, 2012 6:49:27 am Bruce Evans wrote:
>> >> > On Sun, 3 Jun 2012, Konstantin Belousov wrote:
>>
>> >> I think duplicating this logic in userland would just be wasteful. =
=A0I have
>> >> a private fast gettimeofday() at my current job and it works by expor=
ting
>> >> the current timehands structure (well, the equivalent) to userland. =
=A0The
>> >> userland bits then fetch a copy of the details and do the same as bin=
time().
>> >> (I move the math (bintime_addx() and the multiply)) out of the loop h=
owever.
>> > I started yesterday an implementation which uses shared page to export
>> > some variant of timehands, and uses auxv to provide the libc with a po=
inter
>> > to timehands when rdtsc is reasonable.
>> >
>> > I almost finished both 32bit and 64bit userspace, but there is
>> > kernel-side work left. Is your implementation ready or close to be rea=
dy
>> > for commit ? In other words, should I drop the efforts, or continue ?
>> >
>>
>> Hey wait, What are you doing?
>> This is completely unfair. You didn't even review my patch.
> I did. I am quite saddened if you did not note that I did reviewed your
> patch.
>
>> I really don't understand your way to completely ignore me and start imp=
lement
>> yesterday something you didn't care about for more than 3 years.
>> It costs me a lot of time and energy and I think I deserve more respect =
that
>> just be ignored.
>
> I did not ignored the problem for 3 years. In fact, I did some, IMO
> non-trivial development moving the whole issue forward. In particular, I
> developed the shared page infrastructure that are currently used (yes,
> we already do have properly implemented shared page and sub-allocator
> of memory from it). I did some relevant rtld and libc changes, in
> particular, libc now have full access and uses auxv. So I consider this
> statement as a form of insult.
>

Really? My apologize if you felt to be insulted. I didn't it on purpose.
Honestly I don't think there will be other occasions to hurt your
feelings.

>
> If you want to do full-flesged vdso at once, I will be very much pleasure=
d
> and probably support this work technically.

Thank you for your offer. I'll appreciate it but I'm not going to work
on it anymore.

--
Gianni

From owner-freebsd-arch@FreeBSD.ORG  Mon Jun  4 23:26:40 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id CDB041065670;
	Mon,  4 Jun 2012 23:26:40 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail03.syd.optusnet.com.au (mail03.syd.optusnet.com.au
	[211.29.132.184])
	by mx1.freebsd.org (Postfix) with ESMTP id 473E58FC08;
	Mon,  4 Jun 2012 23:26:40 +0000 (UTC)
Received: from c122-106-171-232.carlnfd1.nsw.optusnet.com.au
	(c122-106-171-232.carlnfd1.nsw.optusnet.com.au [122.106.171.232])
	by mail03.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	q54NQasv002000
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Tue, 5 Jun 2012 09:26:37 +1000
Date: Tue, 5 Jun 2012 09:26:36 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: John Baldwin <jhb@freebsd.org>
In-Reply-To: <201206041730.05478.jhb@freebsd.org>
Message-ID: <20120605075448.B3655@besplex.bde.org>
References: <CACfq090r1tWhuDkxdSZ24fwafbVKU0yduu1yV2+oYo+wwT4ipA@mail.gmail.com>
	<201206041101.57486.jhb@freebsd.org>
	<20120605054930.H3236@besplex.bde.org>
	<201206041730.05478.jhb@freebsd.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: Gianni <gianni@freebsd.org>, Alan Cox <alc@rice.edu>,
	Alexander Kabaev <kan@freebsd.org>, Attilio Rao <attilio@freebsd.org>,
	Konstantin Belousov <kib@freebsd.org>, freebsd-arch@freebsd.org,
	Konstantin Belousov <kostikbel@gmail.com>
Subject: Re: Fwd: [RFC] Kernel shared variables
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 04 Jun 2012 23:26:40 -0000

On Mon, 4 Jun 2012, John Baldwin wrote:

> On Monday, June 04, 2012 4:51:00 pm Bruce Evans wrote:
>> On Mon, 4 Jun 2012, John Baldwin wrote:
>>> ...
>>> a private fast gettimeofday() at my current job and it works by exporting
>>> the current timehands structure (well, the equivalent) to userland.  The
>>> userland bits then fetch a copy of the details and do the same as bintime().
>>
>> How do you keep this up to date, especially for leap seconds?
>
> I added a hack to tc_windup() where it updates the shared copy of the variables
> with the results of the tc_windup() call each time it is invoked.
>
>> My version has a comment saying to do that, but I just noticed that
>> it wouldn't work so well -- the timehands fields would have to be
>> copied to local variables while under protection of the generation
>> count, so it would give messier code to optimize a case that occurs
>> _very_ rarely.
>
> It's not that messy in my experience.

Just 3-4 lines.  With only 16 copies of them in kern_tc.c.  I doubt that
you provide all of these :-).  But full clock_gettime() support requires
more than half of these :-(.  Maybe even all of the FFCLOCK parts for
full compatibility :-(.

>>>> timehands in a shared pages is close to working.  th_generation protects
>>>> things in the same way as in the kernel, modulo assumptions that writes
>>>> are ordered.
>>>
>>> It would work fine.  And in fact, having multiple timehands is actually a
>>> bug, not a feature.  It lets you compute bogus timestamps if you get preempted
>>> at the wrong time and end up with time jumping around.  At Yahoo! we reduced
>>> the number of timehands structures down to 2 or some such, and I'm now of
>>> the opinion we should just have one and dispense with the entire array.
>>
>> No, it is a feature.  The time should never jump around (backwards), but
>> it can easily jump forwards.  It makes little difference if preemption
>> occurs after the timehands have been read, or while reading them but in
>> such a way that the timehands become stale during preemption but not stale
>> enough for their generation to change so that you notice that they are
>> stale -- you get a stale timestamp either way (with staleness approximately
>> the preemption time).  Times read by different threads can easily have
>> different staleness according to which timehands they ended up using and
>> this may be quite different from which timehands they started using and
>> from which timehands is active after they return.  Perhaps this is what
>> you mean.  But again, this happens anyway when the preemption occurs after
>> the timehands have been read.
>
> Time definitely jumped backwards at Yahoo!.  The problem case was when NTP
> was adjusting the time, so if you used a timehands structure that was a
> few generations old (stale), you could have a fairly large component that
> was (delta * scale).
> If the scale had slowed down in subsequent updates,
> then the computed time would jump out into the future.  On the next time
> update with a newer timehands, the effective base was less than the previous
> calculation thought it should have been, and the scale was smaller, so the
> end result if the TSC had not advanced very far was for the new time to be
> less than the previous time, and thus time jumped backwards.

Hmm, changing th_scale in tc_windup() indeed seems to be quite broken,
and reducing to 1 timehands might work around this.  tc_windup() captures
the time using the current scale, so any future reads on the new
timehands will be monotonic, but current and future reads on other
timehands may be too far ahead.  Current and future reads on the
new timehands are prevented by the generation count -- this is why
reducing to 1 timehands might work.  Reducing the number of timehands
to >= 2 only reduces the maximum non-monotonicity.

Someone should test this using adjtime(2).  I think you can use it to
slew much faster than ntpd will let you.

To fix this, just kill all the timehands by setting their generation
count to 0 iff changing the scale.  This preserves the optimization
except when the scale changes, unless ntp (kernel PLL, etc.) changes
it almost every time.  ntpd only changes things every few seconds or
minutes, and I hope the kernel doesn't need to change the frequency
often.

BTW, the ntp part of the locking is quite broken.  You can see this
in kern_adjtime() where it uses Giant locking to try to protect the
non-atomic write to time_adjtime and other less critical variables.
kern_ntptime.c still says that almost everything must be locked by
splclock(), but that is null.  Actually, almost everything must be
locked by something that locks out tc_windup() or a little more.
Sched locking might have done it for hardclock() and tc_windup(), but
no mutex except Giant has ever been used in kern_ntptime.c.

There is also essentially null locking for pps calls from non-clock
fast interrupt handlers.  The worst case in a useful configuration
seems to be 1 CPU executing ntp_update_second() via a fast interrupt
handler, and another CPU executing hardpps() via another fast interrupt
handler.  A non-useful configuration might more than 1 other CPU
executing hardpps().

>>> For my userland case I only export a single timehands copy.
>>
>> So readers block for a long time if the writer is updating and the
>> writer blocks?  Works best for UP :-).
>
> The update to the shared timehands structure does not take a long time,
> specifically for userland it does not require all of tc_windup()'s
> execution time, merely the time to update the values.

But whenever it is preempted, it it may take a long time.  You have no
control short of privileged rtprio and nice to prevent preemption.
OTOH, tc_windup() is run from a fast interrupt handler.  Almost nothing
can prevent it being called without much delay or preempt it (sched
locking of it used to delay it enough to cause significant lock
contention).

Bruce

From owner-freebsd-arch@FreeBSD.ORG  Tue Jun  5 07:49:34 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 2371D1065675;
	Tue,  5 Jun 2012 07:49:34 +0000 (UTC)
	(envelope-from pawel@dawidek.net)
Received: from mail.dawidek.net (60.wheelsystems.com [83.12.187.60])
	by mx1.freebsd.org (Postfix) with ESMTP id C2FCB8FC12;
	Tue,  5 Jun 2012 07:49:33 +0000 (UTC)
Received: from localhost (58.wheelsystems.com [83.12.187.58])
	by mail.dawidek.net (Postfix) with ESMTPSA id 27352FE1;
	Tue,  5 Jun 2012 09:49:32 +0200 (CEST)
Date: Tue, 5 Jun 2012 09:47:42 +0200
From: Pawel Jakub Dawidek <pjd@FreeBSD.org>
To: "Andrey A. Chernov" <ache@FreeBSD.org>
Message-ID: <20120605074741.GA1391@garage.freebsd.pl>
References: <201206042134.q54LYoVJ067685@svn.freebsd.org>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="5mCyUwZo2JvN/JJP"
Content-Disposition: inline
In-Reply-To: <201206042134.q54LYoVJ067685@svn.freebsd.org>
X-OS: FreeBSD 10.0-CURRENT amd64
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org,
	src-committers@freebsd.org, freebsd-arch@FreeBSD.org
Subject: Re: svn commit: r236582 - head/lib/libc/stdlib
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Jun 2012 07:49:34 -0000


--5mCyUwZo2JvN/JJP
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Mon, Jun 04, 2012 at 09:34:49PM +0000, Andrey A. Chernov wrote:
> Author: ache
> Date: Mon Jun  4 21:34:49 2012
> New Revision: 236582
> URL: http://svn.freebsd.org/changeset/base/236582
>=20
> Log:
>   1) IEEE Std 1003.1-2008, "errno" section, is explicit that
>  =20
>   "The setting of errno after a successful call to a function is
>   unspecified unless the description of that function specifies that
>   errno shall not be modified."

Very interesting. However free(3) is always successful. Maybe we need
more context here, but the sentence above might talk about functions
that can either succeed or fail and such functions do set errno on
failure, but we don't know what they do to errno on success - they
sometimes interact with the errno, free(3) never does.

I aware that my interpretation might be too wishful, but it is pretty
obvious to save errno value when calling a function that can eventually
fail - when we save the errno we don't know if it will fail or not, so
we have to do that, but requiring to save errno when calling a void
function that can't fail is simply silly and complicates the code
without a reason.

>   However, free() in IEEE Std 1003.1-2008 does not mention its interaction
>   with errno, so MAY modify it after successful call
>   (it depends on particular free() implementation, OS-specific, etc.).

Expecting documentation to describe interaction with some global
variable that it doesn't need is pretty silly too (ok, errno is special,
but still). It make sense to describe all the cases when the function
actually is sometimes using the global variable, but for a function that
never fails and should never touch the global it doesn't make sense.
Maybe that's why it doesn't mention interaction with errno?

I agree that the standards aren't clear, but if saving errno around
free(3) is the way to go, then I'm sure we have much more problems in
our code, even if it is not suppose to be portable it should be correct
- we never know who and when will take the code and port it.

I guess what I'm trying to say here is that this is much bigger change
than it looks and we should probably agree on some global rule here.

--=20
Pawel Jakub Dawidek                       http://www.wheelsystems.com
FreeBSD committer                         http://www.FreeBSD.org
Am I Evil? Yes, I Am!                     http://tupytaj.pl

--5mCyUwZo2JvN/JJP
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (FreeBSD)

iEYEARECAAYFAk/NuZ0ACgkQForvXbEpPzSfyACeK8eSY42ZOt2Sl1X4SOxGXsdC
WvIAoOFeogjkUqP7aMxtyL4lqO4yUNyp
=sCiA
-----END PGP SIGNATURE-----

--5mCyUwZo2JvN/JJP--

From owner-freebsd-arch@FreeBSD.ORG  Tue Jun  5 12:25:36 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id AFE8A106564A
	for <arch@freebsd.org>; Tue,  5 Jun 2012 12:25:36 +0000 (UTC)
	(envelope-from des@des.no)
Received: from smtp.des.no (smtp.des.no [194.63.250.102])
	by mx1.freebsd.org (Postfix) with ESMTP id 409178FC15
	for <arch@freebsd.org>; Tue,  5 Jun 2012 12:25:36 +0000 (UTC)
Received: from ds4.des.no (smtp.des.no [194.63.250.102])
	by smtp.des.no (Postfix) with ESMTP id CAE656F43
	for <arch@freebsd.org>; Tue,  5 Jun 2012 12:25:34 +0000 (UTC)
Received: by ds4.des.no (Postfix, from userid 1001)
	id 6F9DD95EF; Tue,  5 Jun 2012 14:25:34 +0200 (CEST)
From: =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= <des@des.no>
To: arch@freebsd.org
Date: Tue, 05 Jun 2012 14:25:33 +0200
Message-ID: <86bokyvtc2.fsf@ds4.des.no>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.3 (berkeley-unix)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Cc: 
Subject: KTR_SPAREx
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Jun 2012 12:25:36 -0000

While working on Capsicum last year, I noticed that some of the spare
KTR types are (ab)used for different purposes by different parts of the
code.  KTR_SPARE[234] are all documented as "/* XXX Used by cxgb */",
but KTR_SPARE3, for instance, is widely used for clock events.  Here is
a complete list:

sys/sys/ktr.h: #define  KTR_SPARE2      0x00000800              /* XXX Used=
 by cxgb */
sys/sys/ktr.h: #define  KTR_SPARE3      0x00008000              /* XXX Used=
 by cxgb */
sys/sys/ktr.h: #define  KTR_SPARE4      0x00010000              /* XXX Used=
 by cxgb */
sys/geom/sched/gs_scheduler.h: #define  KTR_GSCHED      KTR_SPARE4
sys/kern/kern_clocksource.c:    CTR4(KTR_SPARE2, "ipi  at %d:    now  %d.%0=
8x%08x",
sys/kern/kern_clocksource.c:    CTR4(KTR_SPARE2, "handle at %d:  now  %d.%0=
8x%08x",
sys/kern/kern_clocksource.c:            CTR2(KTR_SPARE2, "skip   at %d: %d"=
, curcpu, skip);
sys/kern/kern_clocksource.c:    CTR5(KTR_SPARE2, "next at %d:    next %d.%0=
8x%08x by %d",
sys/kern/kern_clocksource.c:    CTR4(KTR_SPARE2, "intr at %d:    now  %d.%0=
8x%08x",
sys/kern/kern_clocksource.c:                    CTR5(KTR_SPARE2, "load p at=
 %d:   now %d.%08x first in %d.%08x",
sys/kern/kern_clocksource.c:            CTR5(KTR_SPARE2, "load at %d:    ne=
xt %d.%08x%08x eq %d",
sys/kern/kern_clocksource.c:    CTR4(KTR_SPARE2, "idle at %d:    now  %d.%0=
8x%08x",
sys/kern/kern_clocksource.c:    CTR4(KTR_SPARE2, "active at %d:  now  %d.%0=
8x%08x",
sys/kern/kern_clocksource.c:    CTR4(KTR_SPARE2, "set_cyc at %d:  now  %d.%=
08x%08x",
sys/kern/kern_clocksource.c:    CTR4(KTR_SPARE2, "set_cyc at %d:  t  %d.%08=
x%08x",
sys/kern/kern_clocksource.c:    CTR3(KTR_SPARE2, "new co at %d:    on %d in=
 %d",
sys/amd64/amd64/machdep.c:      CTR2(KTR_SPARE2, "cpu_idle(%d) at %d",
sys/amd64/amd64/machdep.c:      CTR2(KTR_SPARE2, "cpu_idle(%d) at %d done",
sys/dev/cxgb/cxgb_osdep.h: #define      KTR_CXGB        KTR_SPARE2
sys/dev/cxgb/ulp/iw_cxgb/iw_cxgb_hal.h: #define KTR_IW_CXGB KTR_SPARE4
sys/dev/cxgb/ulp/tom/cxgb_defs.h: #define       KTR_TOM KTR_SPARE2
sys/dev/cxgb/ulp/tom/cxgb_defs.h: #define       KTR_TCB KTR_SPARE3
sys/dev/cxgb/ulp/tom/cxgb_cpl_io.c:     CTR2(KTR_SPARE2, "wr_ack: snd_una=
=3D%u credits=3D%d", snd_una, credits);
sys/dev/cxgb/ulp/tom/cxgb_cpl_io.c:             CTR1(KTR_SPARE2, "wr_ack: s=
bdrop(%d)", bytes);
sys/dev/gem/if_gem.c: #define   KTR_GEM         KTR_SPARE2
sys/dev/drm2/drmP.h: #define    KTR_DRM_REG     KTR_SPARE3
sys/dev/hme/if_hme.c: #define   KTR_HME         KTR_SPARE2      /* XXX */
sys/dev/cas/if_cas.c: #define   KTR_CAS         KTR_SPARE2
sys/dev/ath/if_ath.c: #define   ATH_KTR_INTR    KTR_SPARE4
sys/dev/ath/if_ath.c: #define   ATH_KTR_ERR     KTR_SPARE3
sys/dev/ath/if_ath_rx.c: #define        ATH_KTR_INTR    KTR_SPARE4
sys/dev/ath/if_ath_rx.c: #define        ATH_KTR_ERR     KTR_SPARE3
sys/i386/xen/xen_machdep.c:     CTR0(KTR_SPARE2, "ni_cli disabling interrup=
ts");
sys/i386/xen/xen_machdep.c:     CTR2(KTR_SPARE2, "%x xen_restore_flags efla=
gs %x", rebp(), eflags);
sys/i386/xen/xen_machdep.c:     CTR1(KTR_SPARE2, "%x xen_cli disabling inte=
rrupts", rebp());
sys/i386/xen/xen_machdep.c:     CTR1(KTR_SPARE2, "%x xen_sti enabling inter=
rupts", rebp());
sys/i386/i386/machdep.c:        CTR2(KTR_SPARE2, "cpu_idle(%d) at %d",
sys/i386/i386/machdep.c:        CTR2(KTR_SPARE2, "cpu_idle(%d) at %d done",
sys/powerpc/powerpc/cpu.c:      CTR2(KTR_SPARE2, "cpu_idle(%d) at %d",
sys/powerpc/powerpc/cpu.c:      CTR2(KTR_SPARE2, "cpu_idle(%d) at %d done",
sys/pc98/pc98/machdep.c:        CTR2(KTR_SPARE2, "cpu_idle(%d) at %d",
sys/pc98/pc98/machdep.c:        CTR2(KTR_SPARE2, "cpu_idle(%d) at %d done",
sys/sparc64/sparc64/pmap.c:             CTR5(KTR_SPARE2,
sys/sparc64/sparc64/tsb.c:              CTR5(KTR_SPARE2,
sys/sparc64/include/bus.h: #define      KTR_BUS                         KTR=
_SPARE2

Most of this is in device drivers, which should use KTR_DEV.  There is
one major use of KTR_SPAREx in common code: KTR_SPARE2 is used for clock
events.  It is also used incorrectly by the sparc64 pmap core (there is
a separate KTR_PMAP for that).

I suggest that we

1) rename one of the spare KTRs to KTR_CLOCK and use that for clock
   events.  I already have a patch for that.

2) eliminate all other use of KTR_SPARE[0-9] in non-device code.  I
   think the existing KTRs should already cover most cases.

3) modify device drivers to use KTR_DEV for events that aren't covered
   by existing, more specific KTRs, which is almost none.  For instance,
   there is no reason why cxgb shouldn't just use KTR_NET.

DES
--=20
Dag-Erling Sm=C3=B8rgrav - des@des.no

From owner-freebsd-arch@FreeBSD.ORG  Tue Jun  5 13:07:10 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 563611065674;
	Tue,  5 Jun 2012 13:07:10 +0000 (UTC) (envelope-from des@des.no)
Received: from smtp.des.no (smtp.des.no [194.63.250.102])
	by mx1.freebsd.org (Postfix) with ESMTP id 1052A8FC17;
	Tue,  5 Jun 2012 13:07:09 +0000 (UTC)
Received: from ds4.des.no (smtp.des.no [194.63.250.102])
	by smtp.des.no (Postfix) with ESMTP id 19EBE6F72;
	Tue,  5 Jun 2012 13:07:09 +0000 (UTC)
Received: by ds4.des.no (Postfix, from userid 1001)
	id D331B95FE; Tue,  5 Jun 2012 15:07:08 +0200 (CEST)
From: =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= <des@des.no>
To: John Baldwin <jhb@freebsd.org>
References: <CACfq090r1tWhuDkxdSZ24fwafbVKU0yduu1yV2+oYo+wwT4ipA@mail.gmail.com>
	<20120602171632.GC2358@deviant.kiev.zoral.com.ua>
	<CAJ-FndCh77syp+860LaCbgQ6eiQAq_OMM98RxqxmCv+YKENXoA@mail.gmail.com>
	<201206041053.51802.jhb@freebsd.org>
Date: Tue, 05 Jun 2012 15:07:08 +0200
In-Reply-To: <201206041053.51802.jhb@freebsd.org> (John Baldwin's message of
	"Mon, 4 Jun 2012 10:53:51 -0400")
Message-ID: <86y5o1vrer.fsf@ds4.des.no>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.3 (berkeley-unix)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Cc: Gianni <gianni@freebsd.org>, Alan Cox <alc@rice.edu>,
	Alexander Kabaev <kan@freebsd.org>, Attilio Rao <attilio@freebsd.org>,
	Konstantin Belousov <kib@freebsd.org>, freebsd-arch@freebsd.org,
	Konstantin Belousov <kostikbel@gmail.com>
Subject: Re: Fwd: [RFC] Kernel shared variables
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Jun 2012 13:07:10 -0000

John Baldwin <jhb@freebsd.org> writes:
> I think this is an important question actually.  Is there anything
> that really needs to be here besides gettimeofday()?  I mean, is there
> any real-world application that needs to call getpid() or getppid() a
> bunch of times?

Yes, for fork detection when accessing resources shared between
descendants of the process that allocated them.

DES
--=20
Dag-Erling Sm=C3=B8rgrav - des@des.no

From owner-freebsd-arch@FreeBSD.ORG  Tue Jun  5 13:09:24 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@FreeBSD.ORG
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 7E97C106566B;
	Tue,  5 Jun 2012 13:09:24 +0000 (UTC) (envelope-from ache@vniz.net)
Received: from vniz.net (vniz.net [194.87.13.69])
	by mx1.freebsd.org (Postfix) with ESMTP id E639D8FC1D;
	Tue,  5 Jun 2012 13:09:23 +0000 (UTC)
Received: from localhost (localhost [127.0.0.1])
	by vniz.net (8.14.5/8.14.5) with ESMTP id q55D9MpJ014011;
	Tue, 5 Jun 2012 17:09:22 +0400 (MSK) (envelope-from ache@vniz.net)
Received: (from ache@localhost)
	by localhost (8.14.5/8.14.5/Submit) id q55D9MQe014010;
	Tue, 5 Jun 2012 17:09:22 +0400 (MSK) (envelope-from ache)
Date: Tue, 5 Jun 2012 17:09:22 +0400
From: Andrey Chernov <ache@FreeBSD.ORG>
To: Pawel Jakub Dawidek <pjd@FreeBSD.ORG>
Message-ID: <20120605130922.GE13306@vniz.net>
Mail-Followup-To: Andrey Chernov <ache@freebsd.org>,
	Pawel Jakub Dawidek <pjd@FreeBSD.ORG>, src-committers@FreeBSD.ORG,
	svn-src-all@FreeBSD.ORG, svn-src-head@FreeBSD.ORG,
	freebsd-arch@FreeBSD.ORG
References: <201206042134.q54LYoVJ067685@svn.freebsd.org>
	<20120605074741.GA1391@garage.freebsd.pl>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="gj572EiMnwbLXET9"
Content-Disposition: inline
In-Reply-To: <20120605074741.GA1391@garage.freebsd.pl>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: svn-src-head@FreeBSD.ORG, svn-src-all@FreeBSD.ORG,
	src-committers@FreeBSD.ORG, freebsd-arch@FreeBSD.ORG
Subject: Re: svn commit: r236582 - head/lib/libc/stdlib
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Jun 2012 13:09:24 -0000


--gj572EiMnwbLXET9
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Tue, Jun 05, 2012 at 09:47:42AM +0200, Pawel Jakub Dawidek wrote:
> >   "The setting of errno after a successful call to a function is
> >   unspecified unless the description of that function specifies that
> >   errno shall not be modified."
>=20
> Very interesting. However free(3) is always successful. Maybe we need
> more context here, but the sentence above might talk about functions
> that can either succeed or fail and such functions do set errno on
> failure, but we don't know what they do to errno on success - they
> sometimes interact with the errno, free(3) never does.

According to Austing Group interpretation, this setence talks about=20
funtions which always succeed too, please see
http://austingroupbugs.net/view.php?id=3D385

> I aware that my interpretation might be too wishful, but it is pretty
> obvious to save errno value when calling a function that can eventually
> fail - when we save the errno we don't know if it will fail or not, so
> we have to do that, but requiring to save errno when calling a void
> function that can't fail is simply silly and complicates the code
> without a reason.

It still can fail due to internal errors, it just not returns failure.
For internal errors POSIX states that errno state is unspecified.

> I agree that the standards aren't clear, but if saving errno around
> free(3) is the way to go, then I'm sure we have much more problems in
> our code, even if it is not suppose to be portable it should be correct
> - we never know who and when will take the code and port it.

Currently they are pretty clear in that moment, although I agree that if=20
POSIX says it should not modify errno, the life will be easy. Lets look at=
=20
their further movement, since they are already aware of this specific=20
problem.

> I guess what I'm trying to say here is that this is much bigger change
> than it looks and we should probably agree on some global rule here.

=2E..which not violate standards.

--=20
http://ache.vniz.net/

--gj572EiMnwbLXET9
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (FreeBSD)

iEYEARECAAYFAk/OBQIACgkQVg5YK5ZEdN3tRwCfSZV9vBpAGgmbFiu6NQuciGF1
ussAn3c6HZUcV5JLevuVuJGCnrw/PpBI
=sd4B
-----END PGP SIGNATURE-----

--gj572EiMnwbLXET9--

From owner-freebsd-arch@FreeBSD.ORG  Tue Jun  5 13:10:08 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 27A0B1065670;
	Tue,  5 Jun 2012 13:10:08 +0000 (UTC) (envelope-from des@des.no)
Received: from smtp.des.no (smtp.des.no [194.63.250.102])
	by mx1.freebsd.org (Postfix) with ESMTP id CAEC28FC0A;
	Tue,  5 Jun 2012 13:10:07 +0000 (UTC)
Received: from ds4.des.no (smtp.des.no [194.63.250.102])
	by smtp.des.no (Postfix) with ESMTP id 2130B6F77;
	Tue,  5 Jun 2012 13:10:07 +0000 (UTC)
Received: by ds4.des.no (Postfix, from userid 1001)
	id DB8CD9600; Tue,  5 Jun 2012 15:10:06 +0200 (CEST)
From: =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= <des@des.no>
To: Pawel Jakub Dawidek <pjd@FreeBSD.org>
References: <201206042134.q54LYoVJ067685@svn.freebsd.org>
	<20120605074741.GA1391@garage.freebsd.pl>
Date: Tue, 05 Jun 2012 15:10:06 +0200
In-Reply-To: <20120605074741.GA1391@garage.freebsd.pl> (Pawel Jakub Dawidek's
	message of "Tue, 5 Jun 2012 09:47:42 +0200")
Message-ID: <86txypvr9t.fsf@ds4.des.no>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.3 (berkeley-unix)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org,
	src-committers@freebsd.org,
	"Andrey A. Chernov" <ache@FreeBSD.org>, freebsd-arch@FreeBSD.org
Subject: Re: svn commit: r236582 - head/lib/libc/stdlib
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Jun 2012 13:10:08 -0000

Pawel Jakub Dawidek <pjd@FreeBSD.org> writes:
> Very interesting. However free(3) is always successful. Maybe we need
> more context here, but the sentence above might talk about functions
> that can either succeed or fail and such functions do set errno on
> failure, but we don't know what they do to errno on success - they
> sometimes interact with the errno, free(3) never does.

Even if free() itself never fails, it might have side effects such as
unmapping a slab, logging a KTR event etc. which can modify errno.

DES
--=20
Dag-Erling Sm=C3=B8rgrav - des@des.no

From owner-freebsd-arch@FreeBSD.ORG  Tue Jun  5 13:35:36 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@FreeBSD.ORG
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 31FA91065674;
	Tue,  5 Jun 2012 13:35:36 +0000 (UTC) (envelope-from ache@vniz.net)
Received: from vniz.net (vniz.net [194.87.13.69])
	by mx1.freebsd.org (Postfix) with ESMTP id 534128FC0C;
	Tue,  5 Jun 2012 13:35:12 +0000 (UTC)
Received: from localhost (localhost [127.0.0.1])
	by vniz.net (8.14.5/8.14.5) with ESMTP id q55DZA9X014565;
	Tue, 5 Jun 2012 17:35:10 +0400 (MSK) (envelope-from ache@vniz.net)
Received: (from ache@localhost)
	by localhost (8.14.5/8.14.5/Submit) id q55DZ8EY014564;
	Tue, 5 Jun 2012 17:35:08 +0400 (MSK) (envelope-from ache)
Date: Tue, 5 Jun 2012 17:35:08 +0400
From: Andrey Chernov <ache@FreeBSD.ORG>
To: Dag-Erling Sm??rgrav <des@des.no>
Message-ID: <20120605133508.GA14460@vniz.net>
Mail-Followup-To: Andrey Chernov <ache@freebsd.org>,
	Dag-Erling Sm??rgrav <des@des.no>,
	Pawel Jakub Dawidek <pjd@FreeBSD.ORG>, svn-src-head@FreeBSD.ORG,
	svn-src-all@FreeBSD.ORG, src-committers@FreeBSD.ORG,
	freebsd-arch@FreeBSD.ORG
References: <201206042134.q54LYoVJ067685@svn.freebsd.org>
	<20120605074741.GA1391@garage.freebsd.pl>
	<86txypvr9t.fsf@ds4.des.no>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <86txypvr9t.fsf@ds4.des.no>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: svn-src-head@FreeBSD.ORG, svn-src-all@FreeBSD.ORG,
	src-committers@FreeBSD.ORG,
	Pawel Jakub Dawidek <pjd@FreeBSD.ORG>, freebsd-arch@FreeBSD.ORG
Subject: Re: svn commit: r236582 - head/lib/libc/stdlib
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Jun 2012 13:35:36 -0000

On Tue, Jun 05, 2012 at 03:10:06PM +0200, Dag-Erling Sm??rgrav wrote:
> Pawel Jakub Dawidek <pjd@FreeBSD.org> writes:
> > Very interesting. However free(3) is always successful. Maybe we need
> > more context here, but the sentence above might talk about functions
> > that can either succeed or fail and such functions do set errno on
> > failure, but we don't know what they do to errno on success - they
> > sometimes interact with the errno, free(3) never does.
> 
> Even if free() itself never fails, it might have side effects such as
> unmapping a slab, logging a KTR event etc. which can modify errno.

I totally agree. Even if our free() will be cleaned in this sense or save 
errno internally, we need the code which not relays on some particular 
implementation but works in general scope with any standard-conformant 
free().

-- 
http://ache.vniz.net/

From owner-freebsd-arch@FreeBSD.ORG  Tue Jun  5 13:39:02 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 6AFF81065675;
	Tue,  5 Jun 2012 13:39:02 +0000 (UTC)
	(envelope-from kostikbel@gmail.com)
Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200])
	by mx1.freebsd.org (Postfix) with ESMTP id 017DB8FC1B;
	Tue,  5 Jun 2012 13:39:01 +0000 (UTC)
Received: from skuns.kiev.zoral.com.ua (localhost [127.0.0.1])
	by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id q55Dcg18059915;
	Tue, 5 Jun 2012 16:38:42 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1])
	by deviant.kiev.zoral.com.ua (8.14.5/8.14.5) with ESMTP id
	q55DcfIw099602; Tue, 5 Jun 2012 16:38:41 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
Received: (from kostik@localhost)
	by deviant.kiev.zoral.com.ua (8.14.5/8.14.5/Submit) id q55DcefT099601; 
	Tue, 5 Jun 2012 16:38:40 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to
	kostikbel@gmail.com using -f
Date: Tue, 5 Jun 2012 16:38:40 +0300
From: Konstantin Belousov <kostikbel@gmail.com>
To: John Baldwin <jhb@freebsd.org>
Message-ID: <20120605133840.GK85127@deviant.kiev.zoral.com.ua>
References: <CACfq090r1tWhuDkxdSZ24fwafbVKU0yduu1yV2+oYo+wwT4ipA@mail.gmail.com>
	<201206041101.57486.jhb@freebsd.org>
	<20120604181917.GD85127@deviant.kiev.zoral.com.ua>
	<201206041722.07269.jhb@freebsd.org>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="PEfPc/DjvCj+JzNg"
Content-Disposition: inline
In-Reply-To: <201206041722.07269.jhb@freebsd.org>
User-Agent: Mutt/1.4.2.3i
X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua
X-Virus-Status: Clean
X-Spam-Status: No, score=-4.0 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00
	autolearn=ham version=3.2.5
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on
	skuns.kiev.zoral.com.ua
Cc: freebsd-arch@freebsd.org
Subject: Re: Fwd: [RFC] Kernel shared variables
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Jun 2012 13:39:02 -0000


--PEfPc/DjvCj+JzNg
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Mon, Jun 04, 2012 at 05:22:07PM -0400, John Baldwin wrote:
> On Monday, June 04, 2012 2:19:17 pm Konstantin Belousov wrote:
> > On Mon, Jun 04, 2012 at 11:01:57AM -0400, John Baldwin wrote:
> > > On Sunday, June 03, 2012 6:49:27 am Bruce Evans wrote:
> > > > On Sun, 3 Jun 2012, Konstantin Belousov wrote:
> > > >=20
> > > > > On Sun, Jun 03, 2012 at 07:28:09AM +1000, Bruce Evans wrote:
> > > > >> On Sat, 2 Jun 2012, Konstantin Belousov wrote:
> > > > >>> ...
> > > > >>> In fact, I think that if the whole goal is only fast clocks, th=
en we
> > > > >>> do not need any additional system mechanisms, since we can easi=
ly export
> > > > >>> coefficients for rdtsc formula already. E.g. we can put it into=
 elf auxv,
> > > > >>> which is ugly but bearable.
> > > > >>
> > > > >> How do you get the timehands offsets?  These only need to be upd=
ated
> > > > >> every second or so, or when used, but how can the application kn=
ow
> > > > >> when they need to be updated if this is not done automatically i=
n the
> > > > >> kernel by writing to a shared page?  I can only think of the
> > > > >> application arranging an alarm signal every second or so and upd=
ating
> > > > >> then.  No good for libraries.
> > > > > What is timehands offsets ? Do you mean things like leap seconds ?
> > > >=20
> > > > Yes.  binuptime() is:
> > > >=20
> > > > % void
> > > > % binuptime(struct bintime *bt)
> > > > % {
> > > > % 	struct timehands *th;
> > > > % 	u_int gen;
> > > > %=20
> > > > % 	do {
> > > > % 		th =3D timehands;
> > > > % 		gen =3D th->th_generation;
> > > > % 		*bt =3D th->th_offset;
> > > > % 		bintime_addx(bt, th->th_scale * tc_delta(th));
> > > > % 	} while (gen =3D=3D 0 || gen !=3D th->th_generation);
> > > > % }
> > > >=20
> > > > Without the kernel providing th->th_offset, you have to do lots of =
ntp
> > > > handling for yourself (compatibly with the kernel) just to get an
> > > > accuracy of 1 second.  Leap seconds don't affect CLOCK_MONOTONIC, b=
ut
> > > > they do affect CLOCK_REALTIME which is the clock id used by
> > > > gettimeofday().  For the former, you only have to advance the offset
> > > > for yourself occasionally (compatibly with the kernel) and manage
> > > > (compatibly with the kernel, especially in the long term) ntp slewi=
ng
> > > > and other syscall/sysctl kernel activity that micro-adjusts th->th_=
scale.
> > >=20
> > > I think duplicating this logic in userland would just be wasteful.  I=
 have
> > > a private fast gettimeofday() at my current job and it works by expor=
ting
> > > the current timehands structure (well, the equivalent) to userland.  =
The
> > > userland bits then fetch a copy of the details and do the same as bin=
time().
> > > (I move the math (bintime_addx() and the multiply)) out of the loop h=
owever.
> > I started yesterday an implementation which uses shared page to export
> > some variant of timehands, and uses auxv to provide the libc with a poi=
nter
> > to timehands when rdtsc is reasonable.
> >=20
> > I almost finished both 32bit and 64bit userspace, but there is
> > kernel-side work left. Is your implementation ready or close to be ready
> > for commit ? In other words, should I drop the efforts, or continue ?
>=20
> No, mine is not general purpose.  I'll see if I can make a public patch o=
f what
> it looks like.

My first version that seems to work on amd64 is at
http://people.freebsd.org/~kib/misc/moronix.1.patch

The plugs do allow for the new gettimeofday code to be replaced by
vdso version in future.

This is definitely WIP, in particular, the memory barriers handling in
the __vdso_gettimeofday and in the tc_windup updater is missing.
Also, clock_gettime() support would require ABI change.

I only compiled amd64 kernel, i386 is probably broken, other architectures
are definitely broken.

--PEfPc/DjvCj+JzNg
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (FreeBSD)

iEYEARECAAYFAk/OC+AACgkQC3+MBN1Mb4gFvgCg7kdxK3EZJGiLz8SDf3/xTkEg
XA8An0Mb5+KWdwgLW+SjCaI7UFY3ufJS
=Ev9z
-----END PGP SIGNATURE-----

--PEfPc/DjvCj+JzNg--

From owner-freebsd-arch@FreeBSD.ORG  Tue Jun  5 14:32:30 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 24CFF106564A
	for <arch@freebsd.org>; Tue,  5 Jun 2012 14:32:30 +0000 (UTC)
	(envelope-from kostikbel@gmail.com)
Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200])
	by mx1.freebsd.org (Postfix) with ESMTP id 6A1F78FC1A
	for <arch@freebsd.org>; Tue,  5 Jun 2012 14:32:29 +0000 (UTC)
Received: from skuns.kiev.zoral.com.ua (localhost [127.0.0.1])
	by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id q55EWGdu067703;
	Tue, 5 Jun 2012 17:32:16 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1])
	by deviant.kiev.zoral.com.ua (8.14.5/8.14.5) with ESMTP id
	q55EWGf6099940; Tue, 5 Jun 2012 17:32:16 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
Received: (from kostik@localhost)
	by deviant.kiev.zoral.com.ua (8.14.5/8.14.5/Submit) id q55EWFBL099939; 
	Tue, 5 Jun 2012 17:32:15 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to
	kostikbel@gmail.com using -f
Date: Tue, 5 Jun 2012 17:32:15 +0300
From: Konstantin Belousov <kostikbel@gmail.com>
To: Dag-Erling Sm??rgrav <des@des.no>
Message-ID: <20120605143215.GL85127@deviant.kiev.zoral.com.ua>
References: <86bokyvtc2.fsf@ds4.des.no>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="bFUYW7mPOLJ+Jd2A"
Content-Disposition: inline
In-Reply-To: <86bokyvtc2.fsf@ds4.des.no>
User-Agent: Mutt/1.4.2.3i
X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua
X-Virus-Status: Clean
X-Spam-Status: No, score=-4.0 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00
	autolearn=ham version=3.2.5
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on
	skuns.kiev.zoral.com.ua
Cc: arch@freebsd.org
Subject: Re: KTR_SPAREx
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Jun 2012 14:32:30 -0000


--bFUYW7mPOLJ+Jd2A
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Tue, Jun 05, 2012 at 02:25:33PM +0200, Dag-Erling Sm??rgrav wrote:
> While working on Capsicum last year, I noticed that some of the spare
> KTR types are (ab)used for different purposes by different parts of the
> code.  KTR_SPARE[234] are all documented as "/* XXX Used by cxgb */",
> but KTR_SPARE3, for instance, is widely used for clock events.  Here is
> a complete list:
>=20
> sys/sys/ktr.h: #define  KTR_SPARE2      0x00000800              /* XXX Us=
ed by cxgb */
> sys/sys/ktr.h: #define  KTR_SPARE3      0x00008000              /* XXX Us=
ed by cxgb */
> sys/sys/ktr.h: #define  KTR_SPARE4      0x00010000              /* XXX Us=
ed by cxgb */
> sys/geom/sched/gs_scheduler.h: #define  KTR_GSCHED      KTR_SPARE4
> sys/kern/kern_clocksource.c:    CTR4(KTR_SPARE2, "ipi  at %d:    now  %d.=
%08x%08x",
> sys/kern/kern_clocksource.c:    CTR4(KTR_SPARE2, "handle at %d:  now  %d.=
%08x%08x",
> sys/kern/kern_clocksource.c:            CTR2(KTR_SPARE2, "skip   at %d: %=
d", curcpu, skip);
> sys/kern/kern_clocksource.c:    CTR5(KTR_SPARE2, "next at %d:    next %d.=
%08x%08x by %d",
> sys/kern/kern_clocksource.c:    CTR4(KTR_SPARE2, "intr at %d:    now  %d.=
%08x%08x",
> sys/kern/kern_clocksource.c:                    CTR5(KTR_SPARE2, "load p =
at %d:   now %d.%08x first in %d.%08x",
> sys/kern/kern_clocksource.c:            CTR5(KTR_SPARE2, "load at %d:    =
next %d.%08x%08x eq %d",
> sys/kern/kern_clocksource.c:    CTR4(KTR_SPARE2, "idle at %d:    now  %d.=
%08x%08x",
> sys/kern/kern_clocksource.c:    CTR4(KTR_SPARE2, "active at %d:  now  %d.=
%08x%08x",
> sys/kern/kern_clocksource.c:    CTR4(KTR_SPARE2, "set_cyc at %d:  now  %d=
.%08x%08x",
> sys/kern/kern_clocksource.c:    CTR4(KTR_SPARE2, "set_cyc at %d:  t  %d.%=
08x%08x",
> sys/kern/kern_clocksource.c:    CTR3(KTR_SPARE2, "new co at %d:    on %d =
in %d",
> sys/amd64/amd64/machdep.c:      CTR2(KTR_SPARE2, "cpu_idle(%d) at %d",
> sys/amd64/amd64/machdep.c:      CTR2(KTR_SPARE2, "cpu_idle(%d) at %d done=
",
> sys/dev/cxgb/cxgb_osdep.h: #define      KTR_CXGB        KTR_SPARE2
> sys/dev/cxgb/ulp/iw_cxgb/iw_cxgb_hal.h: #define KTR_IW_CXGB KTR_SPARE4
> sys/dev/cxgb/ulp/tom/cxgb_defs.h: #define       KTR_TOM KTR_SPARE2
> sys/dev/cxgb/ulp/tom/cxgb_defs.h: #define       KTR_TCB KTR_SPARE3
> sys/dev/cxgb/ulp/tom/cxgb_cpl_io.c:     CTR2(KTR_SPARE2, "wr_ack: snd_una=
=3D%u credits=3D%d", snd_una, credits);
> sys/dev/cxgb/ulp/tom/cxgb_cpl_io.c:             CTR1(KTR_SPARE2, "wr_ack:=
 sbdrop(%d)", bytes);
> sys/dev/gem/if_gem.c: #define   KTR_GEM         KTR_SPARE2
> sys/dev/drm2/drmP.h: #define    KTR_DRM_REG     KTR_SPARE3
> sys/dev/hme/if_hme.c: #define   KTR_HME         KTR_SPARE2      /* XXX */
> sys/dev/cas/if_cas.c: #define   KTR_CAS         KTR_SPARE2
> sys/dev/ath/if_ath.c: #define   ATH_KTR_INTR    KTR_SPARE4
> sys/dev/ath/if_ath.c: #define   ATH_KTR_ERR     KTR_SPARE3
> sys/dev/ath/if_ath_rx.c: #define        ATH_KTR_INTR    KTR_SPARE4
> sys/dev/ath/if_ath_rx.c: #define        ATH_KTR_ERR     KTR_SPARE3
> sys/i386/xen/xen_machdep.c:     CTR0(KTR_SPARE2, "ni_cli disabling interr=
upts");
> sys/i386/xen/xen_machdep.c:     CTR2(KTR_SPARE2, "%x xen_restore_flags ef=
lags %x", rebp(), eflags);
> sys/i386/xen/xen_machdep.c:     CTR1(KTR_SPARE2, "%x xen_cli disabling in=
terrupts", rebp());
> sys/i386/xen/xen_machdep.c:     CTR1(KTR_SPARE2, "%x xen_sti enabling int=
errupts", rebp());
> sys/i386/i386/machdep.c:        CTR2(KTR_SPARE2, "cpu_idle(%d) at %d",
> sys/i386/i386/machdep.c:        CTR2(KTR_SPARE2, "cpu_idle(%d) at %d done=
",
> sys/powerpc/powerpc/cpu.c:      CTR2(KTR_SPARE2, "cpu_idle(%d) at %d",
> sys/powerpc/powerpc/cpu.c:      CTR2(KTR_SPARE2, "cpu_idle(%d) at %d done=
",
> sys/pc98/pc98/machdep.c:        CTR2(KTR_SPARE2, "cpu_idle(%d) at %d",
> sys/pc98/pc98/machdep.c:        CTR2(KTR_SPARE2, "cpu_idle(%d) at %d done=
",
> sys/sparc64/sparc64/pmap.c:             CTR5(KTR_SPARE2,
> sys/sparc64/sparc64/tsb.c:              CTR5(KTR_SPARE2,
> sys/sparc64/include/bus.h: #define      KTR_BUS                         K=
TR_SPARE2
>=20
> Most of this is in device drivers, which should use KTR_DEV.  There is
> one major use of KTR_SPAREx in common code: KTR_SPARE2 is used for clock
> events.  It is also used incorrectly by the sparc64 pmap core (there is
> a separate KTR_PMAP for that).
>=20
> I suggest that we
>=20
> 1) rename one of the spare KTRs to KTR_CLOCK and use that for clock
>    events.  I already have a patch for that.
>=20
> 2) eliminate all other use of KTR_SPARE[0-9] in non-device code.  I
>    think the existing KTRs should already cover most cases.
>=20
> 3) modify device drivers to use KTR_DEV for events that aren't covered
>    by existing, more specific KTRs, which is almost none.  For instance,
>    there is no reason why cxgb shouldn't just use KTR_NET.
Moving all device drivers to KTR_DEV makes the KTR unusable for device
driver debugging. When looking at the drm2 and gem traces, I do not want
to see other devices tracepoints. Amount of data from GEM is huge, and
obfuscating it with unrelated debugging recycles the ktr ring faster, aside
of making noise that cayses log to be meaningless.

--bFUYW7mPOLJ+Jd2A
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (FreeBSD)

iEYEARECAAYFAk/OGG8ACgkQC3+MBN1Mb4j1uQCgpc0bZke1nm1HxOMv4QRMdyZP
nCAAoN2XUHUgTUNM8FXQc1bf50Co5ivR
=mbRt
-----END PGP SIGNATURE-----

--bFUYW7mPOLJ+Jd2A--

From owner-freebsd-arch@FreeBSD.ORG  Tue Jun  5 14:40:33 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id D1B2D1065673
	for <arch@freebsd.org>; Tue,  5 Jun 2012 14:40:33 +0000 (UTC)
	(envelope-from des@des.no)
Received: from smtp.des.no (smtp.des.no [194.63.250.102])
	by mx1.freebsd.org (Postfix) with ESMTP id 92B7F8FC15
	for <arch@freebsd.org>; Tue,  5 Jun 2012 14:40:33 +0000 (UTC)
Received: from ds4.des.no (smtp.des.no [194.63.250.102])
	by smtp.des.no (Postfix) with ESMTP id 92DCE6FE8;
	Tue,  5 Jun 2012 14:40:32 +0000 (UTC)
Received: by ds4.des.no (Postfix, from userid 1001)
	id 6B55D960F; Tue,  5 Jun 2012 16:40:32 +0200 (CEST)
From: =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= <des@des.no>
To: Konstantin Belousov <kostikbel@gmail.com>
References: <86bokyvtc2.fsf@ds4.des.no>
	<20120605143215.GL85127@deviant.kiev.zoral.com.ua>
Date: Tue, 05 Jun 2012 16:40:32 +0200
In-Reply-To: <20120605143215.GL85127@deviant.kiev.zoral.com.ua> (Konstantin
	Belousov's message of "Tue, 5 Jun 2012 17:32:15 +0300")
Message-ID: <86pq9dvn33.fsf@ds4.des.no>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.3 (berkeley-unix)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Cc: arch@freebsd.org
Subject: Re: KTR_SPAREx
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Jun 2012 14:40:34 -0000

Konstantin Belousov <kostikbel@gmail.com> writes:
> Moving all device drivers to KTR_DEV makes the KTR unusable for device
> driver debugging. When looking at the drm2 and gem traces, I do not want
> to see other devices tracepoints. Amount of data from GEM is huge, and
> obfuscating it with unrelated debugging recycles the ktr ring faster, asi=
de
> of making noise that cayses log to be meaningless.

We only have a limited number of KTR types - 32, to be precise.  We
can't spare one for each driver, and there's no reason why *your* driver
(for any value of "you") should get its own while everybody else shares
KTR_DEV.

If you think KTR_DEV is too noisy, add sysctls to enable or disable
tracing on a per-device basis.  It should be quite easy to generalize.

(I still haven't gotten around to implementing a similar infrastructure
for network interfaces...)

DES
--=20
Dag-Erling Sm=C3=B8rgrav - des@des.no

From owner-freebsd-arch@FreeBSD.ORG  Tue Jun  5 14:49:46 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id DBC311065687
	for <arch@freebsd.org>; Tue,  5 Jun 2012 14:49:46 +0000 (UTC)
	(envelope-from kostikbel@gmail.com)
Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200])
	by mx1.freebsd.org (Postfix) with ESMTP id 3E4EB8FC19
	for <arch@freebsd.org>; Tue,  5 Jun 2012 14:49:44 +0000 (UTC)
Received: from skuns.kiev.zoral.com.ua (localhost [127.0.0.1])
	by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id q55EncnA070300;
	Tue, 5 Jun 2012 17:49:38 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1])
	by deviant.kiev.zoral.com.ua (8.14.5/8.14.5) with ESMTP id
	q55Enc67000151; Tue, 5 Jun 2012 17:49:38 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
Received: (from kostik@localhost)
	by deviant.kiev.zoral.com.ua (8.14.5/8.14.5/Submit) id q55Encuk000150; 
	Tue, 5 Jun 2012 17:49:38 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to
	kostikbel@gmail.com using -f
Date: Tue, 5 Jun 2012 17:49:38 +0300
From: Konstantin Belousov <kostikbel@gmail.com>
To: Dag-Erling Sm??rgrav <des@des.no>
Message-ID: <20120605144938.GN85127@deviant.kiev.zoral.com.ua>
References: <86bokyvtc2.fsf@ds4.des.no>
	<20120605143215.GL85127@deviant.kiev.zoral.com.ua>
	<86pq9dvn33.fsf@ds4.des.no>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="CQDko/0aYvuiEzgn"
Content-Disposition: inline
In-Reply-To: <86pq9dvn33.fsf@ds4.des.no>
User-Agent: Mutt/1.4.2.3i
X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua
X-Virus-Status: Clean
X-Spam-Status: No, score=-4.0 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00
	autolearn=ham version=3.2.5
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on
	skuns.kiev.zoral.com.ua
Cc: arch@freebsd.org
Subject: Re: KTR_SPAREx
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Jun 2012 14:49:46 -0000


--CQDko/0aYvuiEzgn
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Tue, Jun 05, 2012 at 04:40:32PM +0200, Dag-Erling Sm??rgrav wrote:
> Konstantin Belousov <kostikbel@gmail.com> writes:
> > Moving all device drivers to KTR_DEV makes the KTR unusable for device
> > driver debugging. When looking at the drm2 and gem traces, I do not want
> > to see other devices tracepoints. Amount of data from GEM is huge, and
> > obfuscating it with unrelated debugging recycles the ktr ring faster, a=
side
> > of making noise that cayses log to be meaningless.
>=20
> We only have a limited number of KTR types - 32, to be precise.  We
> can't spare one for each driver, and there's no reason why *your* driver
> (for any value of "you") should get its own while everybody else shares
> KTR_DEV.
I want to have only *my* driver trace points in the ring, by whatever
means. Breaking it right now would mean that I cannot do any GEM
debugging.

>=20
> If you think KTR_DEV is too noisy, add sysctls to enable or disable
> tracing on a per-device basis.  It should be quite easy to generalize.
So you are planning to break some useful, but possibly randomly-achieved
functionality, and delegate the work to repair it to somebody else ?

>=20
> (I still haven't gotten around to implementing a similar infrastructure
> for network interfaces...)
>=20
> DES
> --=20
> Dag-Erling Sm??rgrav - des@des.no

--CQDko/0aYvuiEzgn
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (FreeBSD)

iEYEARECAAYFAk/OHIEACgkQC3+MBN1Mb4ipIwCeNnkTQuffMM3uGSnbZt2zY5pU
rl0AoMUPQEGfFH93xKOOz/jGwcHZ5BIQ
=gP0D
-----END PGP SIGNATURE-----

--CQDko/0aYvuiEzgn--

From owner-freebsd-arch@FreeBSD.ORG  Tue Jun  5 15:02:17 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 0BDD41065670
	for <freebsd-arch@freebsd.org>; Tue,  5 Jun 2012 15:02:17 +0000 (UTC)
	(envelope-from jhb@freebsd.org)
Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net
	[IPv6:2001:470:1f10:75::2])
	by mx1.freebsd.org (Postfix) with ESMTP id D142E8FC16
	for <freebsd-arch@freebsd.org>; Tue,  5 Jun 2012 15:02:16 +0000 (UTC)
Received: from jhbbsd.localnet (unknown [209.249.190.124])
	by bigwig.baldwin.cx (Postfix) with ESMTPSA id 441E1B95D;
	Tue,  5 Jun 2012 11:02:16 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: freebsd-arch@freebsd.org
Date: Tue, 5 Jun 2012 09:47:48 -0400
User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p13; KDE/4.5.5; amd64; ; )
References: <86bokyvtc2.fsf@ds4.des.no>
In-Reply-To: <86bokyvtc2.fsf@ds4.des.no>
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Message-Id: <201206050947.48750.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7
	(bigwig.baldwin.cx); Tue, 05 Jun 2012 11:02:16 -0400 (EDT)
Cc: Dag-Erling =?utf-8?q?Sm=C3=B8rgrav?= <des@des.no>
Subject: Re: KTR_SPAREx
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Jun 2012 15:02:17 -0000

On Tuesday, June 05, 2012 8:25:33 am Dag-Erling Sm=C3=B8rgrav wrote:
> Most of this is in device drivers, which should use KTR_DEV.  There is
> one major use of KTR_SPAREx in common code: KTR_SPARE2 is used for clock
> events.  It is also used incorrectly by the sparc64 pmap core (there is
> a separate KTR_PMAP for that).
>=20
> I suggest that we
>=20
> 1) rename one of the spare KTRs to KTR_CLOCK and use that for clock
>    events.  I already have a patch for that.
>=20
> 2) eliminate all other use of KTR_SPARE[0-9] in non-device code.  I
>    think the existing KTRs should already cover most cases.
>=20
> 3) modify device drivers to use KTR_DEV for events that aren't covered
>    by existing, more specific KTRs, which is almost none.  For instance,
>    there is no reason why cxgb shouldn't just use KTR_NET.

There is a reason in that you may want to only get those specific events and
not drown in noise from the network stack itself for example.  What I tend =
to
do in drivers where I want to do this is have something like this:

#if 0
#define KTR_CXGB KTR_DEV
#else
#define KTR_CXGB 0
#endif

and then use 'KTR_CXGB' instead of 'KTR_DEV' or 'KTR_SPARE2' explicitly.  I=
t=20
looks like most of the drivers are already doing this and if it is #if 0'd =
by
default, then I would just let them be.  The two CTR()s in tom/cxgb_cpl_io.c
should probably be using KTR_TOM instead of KTR_SPARE2 directly.

As a long term goal I would like to switch to using individual ints instead=
 of=20
a 32-bit bitmask as that would let us add new trace classes with ease.  I=20
haven't figured out a design for that that I fully like yet however.

=2D-=20
John Baldwin

From owner-freebsd-arch@FreeBSD.ORG  Tue Jun  5 15:02:18 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id B188E106566B;
	Tue,  5 Jun 2012 15:02:18 +0000 (UTC) (envelope-from jhb@freebsd.org)
Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net
	[IPv6:2001:470:1f10:75::2])
	by mx1.freebsd.org (Postfix) with ESMTP id 826C18FC08;
	Tue,  5 Jun 2012 15:02:18 +0000 (UTC)
Received: from jhbbsd.localnet (unknown [209.249.190.124])
	by bigwig.baldwin.cx (Postfix) with ESMTPSA id D5EA7B91A;
	Tue,  5 Jun 2012 11:02:17 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: "Dag-Erling =?utf-8?q?Sm=C3=B8rgrav?=" <des@des.no>
Date: Tue, 5 Jun 2012 10:08:29 -0400
User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p13; KDE/4.5.5; amd64; ; )
References: <CACfq090r1tWhuDkxdSZ24fwafbVKU0yduu1yV2+oYo+wwT4ipA@mail.gmail.com>
	<201206041053.51802.jhb@freebsd.org> <86y5o1vrer.fsf@ds4.des.no>
In-Reply-To: <86y5o1vrer.fsf@ds4.des.no>
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Message-Id: <201206051008.29568.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7
	(bigwig.baldwin.cx); Tue, 05 Jun 2012 11:02:17 -0400 (EDT)
Cc: Gianni <gianni@freebsd.org>, Alan Cox <alc@rice.edu>,
	Alexander Kabaev <kan@freebsd.org>, Attilio Rao <attilio@freebsd.org>,
	Konstantin Belousov <kib@freebsd.org>, freebsd-arch@freebsd.org,
	Konstantin Belousov <kostikbel@gmail.com>
Subject: Re: Fwd: [RFC] Kernel shared variables
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Jun 2012 15:02:18 -0000

On Tuesday, June 05, 2012 9:07:08 am Dag-Erling Sm=C3=B8rgrav wrote:
> John Baldwin <jhb@freebsd.org> writes:
> > I think this is an important question actually.  Is there anything
> > that really needs to be here besides gettimeofday()?  I mean, is there
> > any real-world application that needs to call getpid() or getppid() a
> > bunch of times?
>=20
> Yes, for fork detection when accessing resources shared between
> descendants of the process that allocated them.

So you call getpid() on each access to a shared resource?

=2D-=20
John Baldwin

From owner-freebsd-arch@FreeBSD.ORG  Tue Jun  5 15:06:36 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id C3FCE1065670
	for <arch@freebsd.org>; Tue,  5 Jun 2012 15:06:36 +0000 (UTC)
	(envelope-from des@des.no)
Received: from smtp.des.no (smtp.des.no [194.63.250.102])
	by mx1.freebsd.org (Postfix) with ESMTP id 84CC38FC12
	for <arch@freebsd.org>; Tue,  5 Jun 2012 15:06:36 +0000 (UTC)
Received: from ds4.des.no (smtp.des.no [194.63.250.102])
	by smtp.des.no (Postfix) with ESMTP id 7032A600C;
	Tue,  5 Jun 2012 15:06:35 +0000 (UTC)
Received: by ds4.des.no (Postfix, from userid 1001)
	id 3CB259616; Tue,  5 Jun 2012 17:06:35 +0200 (CEST)
From: =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= <des@des.no>
To: Konstantin Belousov <kostikbel@gmail.com>
References: <86bokyvtc2.fsf@ds4.des.no>
	<20120605143215.GL85127@deviant.kiev.zoral.com.ua>
	<86pq9dvn33.fsf@ds4.des.no>
	<20120605144938.GN85127@deviant.kiev.zoral.com.ua>
Date: Tue, 05 Jun 2012 17:06:34 +0200
In-Reply-To: <20120605144938.GN85127@deviant.kiev.zoral.com.ua> (Konstantin
	Belousov's message of "Tue, 5 Jun 2012 17:49:38 +0300")
Message-ID: <86lik1vlvp.fsf@ds4.des.no>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.3 (berkeley-unix)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Cc: arch@freebsd.org
Subject: Re: KTR_SPAREx
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Jun 2012 15:06:36 -0000

Konstantin Belousov <kostikbel@gmail.com> writes:
> Dag-Erling Sm=C3=B8rgrav <des@des.no> writes:
>> We only have a limited number of KTR types - 32, to be precise.  We
>> can't spare one for each driver, and there's no reason why *your* driver
>> (for any value of "you") should get its own while everybody else shares
>> KTR_DEV.
> I want to have only *my* driver trace points in the ring, by whatever
> means. Breaking it right now would mean that I cannot do any GEM
> debugging.

Well, so does everybody else.  Here is a list of files that use the same
KTR that you use for GEM (KTR_SPARE2):

sys/kern/kern_clocksource.c
sys/amd64/amd64/machdep.c
sys/dev/cxgb/cxgb_osdep.h
sys/dev/cxgb/ulp/tom/cxgb_defs.h
sys/dev/cxgb/ulp/tom/cxgb_cpl_io.c
sys/dev/gem/if_gem.c
sys/dev/hme/if_hme.c
sys/dev/cas/if_cas.c
sys/i386/xen/xen_machdep.c
sys/i386/i386/machdep.c
sys/powerpc/powerpc/cpu.c
sys/pc98/pc98/machdep.c
sys/sparc64/sparc64/pmap.c
sys/sparc64/sparc64/tsb.c
sys/sparc64/include/bus.h

Note that sys/*/*/machdep.c issue a KTR_SPARE2 event every time the CPU
enters or exits the idle thread.

> > If you think KTR_DEV is too noisy, add sysctls to enable or disable
> > tracing on a per-device basis.  It should be quite easy to generalize.
> So you are planning to break some useful, but possibly randomly-achieved
> functionality, and delegate the work to repair it to somebody else ?

It's already broken, and you're one of the people responsible for
breaking it.  I'm trying to fix it.

DES
--=20
Dag-Erling Sm=C3=B8rgrav - des@des.no

From owner-freebsd-arch@FreeBSD.ORG  Tue Jun  5 15:18:55 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 71AE81065670
	for <arch@freebsd.org>; Tue,  5 Jun 2012 15:18:55 +0000 (UTC)
	(envelope-from asmrookie@gmail.com)
Received: from mail-lb0-f182.google.com (mail-lb0-f182.google.com
	[209.85.217.182])
	by mx1.freebsd.org (Postfix) with ESMTP id E0EF18FC1F
	for <arch@freebsd.org>; Tue,  5 Jun 2012 15:18:54 +0000 (UTC)
Received: by lbon10 with SMTP id n10so5079617lbo.13
	for <arch@freebsd.org>; Tue, 05 Jun 2012 08:18:53 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
	h=mime-version:sender:in-reply-to:references:date
	:x-google-sender-auth:message-id:subject:from:to:cc:content-type
	:content-transfer-encoding;
	bh=MHENfAEasMRpGfIstiRUJCKWvR7ZEbTuk4j/iUM9wFs=;
	b=A/4yJKtw07vu5KBI2n14+3AZju0hqGudY+FqTstSr7tkfyb5pWklTPUR3Ts/mqaZDE
	x+skyRSxWUpZ+ASX1lHoXvSjlTaWyhNR6NVqwG2g36n1xmnTCVdAD/PtJtIOCvvy5AZv
	kwMzQuPwUXh34XBEwKJozzJ5QXIWFc/RSvxRgCvHKU+nP8R76FRTUrzxB4TdtqH4Qpqd
	xchDcfT+0WdTRuTjYI3LSxoM5lxOYa8HnXv2QGM9evjHKTDjNYNTLo5td3zCwEGRqOjA
	6+ahlmdtFHyTerNmrx1000lLKmUKQ9Pgm6mI3OnRD7ysmICCJMnZ8cXzOVsxcYNd4hah
	oXNQ==
MIME-Version: 1.0
Received: by 10.152.104.171 with SMTP id gf11mr17537340lab.5.1338909533497;
	Tue, 05 Jun 2012 08:18:53 -0700 (PDT)
Sender: asmrookie@gmail.com
Received: by 10.112.27.65 with HTTP; Tue, 5 Jun 2012 08:18:53 -0700 (PDT)
In-Reply-To: <86bokyvtc2.fsf@ds4.des.no>
References: <86bokyvtc2.fsf@ds4.des.no>
Date: Tue, 5 Jun 2012 16:18:53 +0100
X-Google-Sender-Auth: yoGy18sgeGBgc5IfgGCTkwhD24c
Message-ID: <CAJ-FndAMsoB1RAyS-Pa1JCv7W0qsviRxtShZ3uk_Tpd+J_EBaQ@mail.gmail.com>
From: Attilio Rao <attilio@freebsd.org>
To: =?UTF-8?Q?Dag=2DErling_Sm=C3=B8rgrav?= <des@des.no>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Cc: arch@freebsd.org
Subject: Re: KTR_SPAREx
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Jun 2012 15:18:55 -0000

2012/6/5 Dag-Erling Sm=C3=B8rgrav <des@des.no>:
> While working on Capsicum last year, I noticed that some of the spare
> KTR types are (ab)used for different purposes by different parts of the
> code. =C2=A0KTR_SPARE[234] are all documented as "/* XXX Used by cxgb */"=
,
> but KTR_SPARE3, for instance, is widely used for clock events. =C2=A0Here=
 is
> a complete list:

The truth is, KTR is thought to be a mechanism for catering
"on-the-fly" the tracing of the events, but the very limited
mask/classes of events it provides makes this completely useless.
I don't recall a case where I had to not patch manually KTR knobs to
do actual debugging.

What I really would like to see is:
- Of course remove the bogus usage of KTR_SPAREX in the drivers
- Make the mask of events much bigger than the current one
- Enlarge the number of KTR_SPARE available (16 would be ok)
- By default have KTR_SPARE0-15 to be on in the kernel along with KTR
option, or maybe when the kernel is still in the debugging phase (but
leave in a knob for disabling it)
- Use the dynamic masking system to just mask the SPARE you are
interested into. This way your driver can simply use a KTR_SPARE for
development and you will mask out the right one at run time.

Attilio


--=20
Peace can only be achieved by understanding - A. Einstein

From owner-freebsd-arch@FreeBSD.ORG  Tue Jun  5 15:20:02 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 3A8A4106564A
	for <arch@freebsd.org>; Tue,  5 Jun 2012 15:20:02 +0000 (UTC)
	(envelope-from asmrookie@gmail.com)
Received: from mail-lb0-f182.google.com (mail-lb0-f182.google.com
	[209.85.217.182])
	by mx1.freebsd.org (Postfix) with ESMTP id A73818FC0A
	for <arch@freebsd.org>; Tue,  5 Jun 2012 15:20:01 +0000 (UTC)
Received: by lbon10 with SMTP id n10so5080875lbo.13
	for <arch@freebsd.org>; Tue, 05 Jun 2012 08:20:00 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
	h=mime-version:sender:in-reply-to:references:date
	:x-google-sender-auth:message-id:subject:from:to:cc:content-type
	:content-transfer-encoding;
	bh=7zHR9gqF6/POmh+5xGUSHk6TJRkKwirfO5Sewq48GJg=;
	b=P/AFgL1uecmxbDpA3bPz4lOJjDMQBQ+TAVyPdO3Q5Imh0xZoYk+xExa0gpF39U4Jzn
	5FjNJJAxk5dhOSY3mJwIVphn9wU+RAAJc/2lE0nhJDZ1IGqi7ujbNwWD80s3w0zPwtAd
	eqh36srOF6cEGb5KrULG4+uRIi67OAdKW1twmMIwknnFB2Miq2J/Rcju3vjspxZr63KG
	LfRtximyUGcs14ftGZKKwILDEAa5fIeaCa9lbaXP+jDtfkASR4OcITCihP9Hjc0sqb1q
	Ds0ymugdg3v4q+NKQEKob7i2/YEAJGrG7vYCmTRpADywMzJOrbIVQrdmOueEVJ1KFomn
	XGCg==
MIME-Version: 1.0
Received: by 10.112.42.34 with SMTP id k2mr8325423lbl.0.1338909600597; Tue, 05
	Jun 2012 08:20:00 -0700 (PDT)
Sender: asmrookie@gmail.com
Received: by 10.112.27.65 with HTTP; Tue, 5 Jun 2012 08:20:00 -0700 (PDT)
In-Reply-To: <CAJ-FndAMsoB1RAyS-Pa1JCv7W0qsviRxtShZ3uk_Tpd+J_EBaQ@mail.gmail.com>
References: <86bokyvtc2.fsf@ds4.des.no>
	<CAJ-FndAMsoB1RAyS-Pa1JCv7W0qsviRxtShZ3uk_Tpd+J_EBaQ@mail.gmail.com>
Date: Tue, 5 Jun 2012 16:20:00 +0100
X-Google-Sender-Auth: h5OrgPT3WFakYIK2E8w_Xp9Q5o4
Message-ID: <CAJ-FndAadYbqiWUTupXLEcRMkYYL50Ssehi8f8vv6YXvQzy4OA@mail.gmail.com>
From: Attilio Rao <attilio@freebsd.org>
To: =?UTF-8?Q?Dag=2DErling_Sm=C3=B8rgrav?= <des@des.no>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Cc: arch@freebsd.org
Subject: Re: KTR_SPAREx
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Jun 2012 15:20:02 -0000

2012/6/5 Attilio Rao <attilio@freebsd.org>:
> 2012/6/5 Dag-Erling Sm=C3=B8rgrav <des@des.no>:
>> While working on Capsicum last year, I noticed that some of the spare
>> KTR types are (ab)used for different purposes by different parts of the
>> code. =C2=A0KTR_SPARE[234] are all documented as "/* XXX Used by cxgb */=
",
>> but KTR_SPARE3, for instance, is widely used for clock events. =C2=A0Her=
e is
>> a complete list:
>
> The truth is, KTR is thought to be a mechanism for catering
> "on-the-fly" the tracing of the events, but the very limited
> mask/classes of events it provides makes this completely useless.
> I don't recall a case where I had to not patch manually KTR knobs to
> do actual debugging.
>
> What I really would like to see is:
> - Of course remove the bogus usage of KTR_SPAREX in the drivers
> - Make the mask of events much bigger than the current one
> - Enlarge the number of KTR_SPARE available (16 would be ok)
> - By default have KTR_SPARE0-15 to be on in the kernel along with KTR
> option, or maybe when the kernel is still in the debugging phase (but
> leave in a knob for disabling it)
> - Use the dynamic masking system to just mask the SPARE you are
> interested into. This way your driver can simply use a KTR_SPARE for
> development and you will mask out the right one at run time.

Forgot to mention, even if this is mostly unrelated to your point: we
should make a better job of breaking further the current set of KTR
classes on a per-subsystem basis. KTR_VFS or KTR_VM (and others) are
far too large right now.

Attilio


--=20
Peace can only be achieved by understanding - A. Einstein

From owner-freebsd-arch@FreeBSD.ORG  Tue Jun  5 15:44:39 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 166C710656E9;
	Tue,  5 Jun 2012 15:44:39 +0000 (UTC) (envelope-from des@des.no)
Received: from smtp.des.no (smtp.des.no [194.63.250.102])
	by mx1.freebsd.org (Postfix) with ESMTP id C54408FC16;
	Tue,  5 Jun 2012 15:44:38 +0000 (UTC)
Received: from ds4.des.no (smtp.des.no [194.63.250.102])
	by smtp.des.no (Postfix) with ESMTP id BF56D603A;
	Tue,  5 Jun 2012 15:44:37 +0000 (UTC)
Received: by ds4.des.no (Postfix, from userid 1001)
	id 93533961D; Tue,  5 Jun 2012 17:44:37 +0200 (CEST)
From: =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= <des@des.no>
To: John Baldwin <jhb@freebsd.org>
References: <CACfq090r1tWhuDkxdSZ24fwafbVKU0yduu1yV2+oYo+wwT4ipA@mail.gmail.com>
	<201206041053.51802.jhb@freebsd.org> <86y5o1vrer.fsf@ds4.des.no>
	<201206051008.29568.jhb@freebsd.org>
Date: Tue, 05 Jun 2012 17:44:37 +0200
In-Reply-To: <201206051008.29568.jhb@freebsd.org> (John Baldwin's message of
	"Tue, 5 Jun 2012 10:08:29 -0400")
Message-ID: <86haupvk4a.fsf@ds4.des.no>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.3 (berkeley-unix)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Cc: Gianni <gianni@freebsd.org>, Alan Cox <alc@rice.edu>,
	Alexander Kabaev <kan@freebsd.org>, Attilio Rao <attilio@freebsd.org>,
	Konstantin Belousov <kib@freebsd.org>, freebsd-arch@freebsd.org,
	Konstantin Belousov <kostikbel@gmail.com>
Subject: Re: Fwd: [RFC] Kernel shared variables
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Jun 2012 15:44:39 -0000

John Baldwin <jhb@freebsd.org> writes:
> So you call getpid() on each access to a shared resource?

I don't, but I've seen code that does, under the assumption that all the
world is Linux and getpid() is free.  Here's a sample from RHEL6 on a
3.1 GHz i5, using raise(0) as a baseline:

getpid(): 10,000,000 iterations in 24,400 ms
gettimeofday(0, 0): 10,000,000 iterations in 54,104 ms
raise(0): 10,000,000 iterations in 1,284,593 ms

The difference between the first two is due to the fact that while
getpid() just returns a constant, gettimeofday(0, 0) performs two
comparisons first.  Passing an actual struct timeval to gettimeofday()
slows it down by a factor of about 6.

(strace confirms that no system calls occur for either getpid() or
gettimeofday(0, 0))

Here is the same program running on FreeBSD 9.0-RELEASE in VirtualBox on
an otherwise idle 3.4 GHz i7:

getpid(): 10,000,000 iterations in 777,251 ms
gettimeofday(0, 0): 10,000,000 iterations in 799,808 ms
raise(0): 10,000,000 iterations in 2,142,275 ms

DES
--=20
Dag-Erling Sm=C3=B8rgrav - des@des.no

From owner-freebsd-arch@FreeBSD.ORG  Tue Jun  5 16:22:22 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id DF196106566C;
	Tue,  5 Jun 2012 16:22:22 +0000 (UTC) (envelope-from jhb@freebsd.org)
Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net
	[IPv6:2001:470:1f10:75::2])
	by mx1.freebsd.org (Postfix) with ESMTP id AE2A28FC0A;
	Tue,  5 Jun 2012 16:22:22 +0000 (UTC)
Received: from jhbbsd.localnet (unknown [209.249.190.124])
	by bigwig.baldwin.cx (Postfix) with ESMTPSA id 2775FB91A;
	Tue,  5 Jun 2012 12:22:22 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: "Dag-Erling =?utf-8?q?Sm=C3=B8rgrav?=" <des@des.no>
Date: Tue, 5 Jun 2012 12:22:12 -0400
User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p13; KDE/4.5.5; amd64; ; )
References: <CACfq090r1tWhuDkxdSZ24fwafbVKU0yduu1yV2+oYo+wwT4ipA@mail.gmail.com>
	<201206051008.29568.jhb@freebsd.org> <86haupvk4a.fsf@ds4.des.no>
In-Reply-To: <86haupvk4a.fsf@ds4.des.no>
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Message-Id: <201206051222.12627.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7
	(bigwig.baldwin.cx); Tue, 05 Jun 2012 12:22:22 -0400 (EDT)
Cc: Gianni <gianni@freebsd.org>, Alan Cox <alc@rice.edu>,
	Alexander Kabaev <kan@freebsd.org>, Attilio Rao <attilio@freebsd.org>,
	Konstantin Belousov <kib@freebsd.org>, freebsd-arch@freebsd.org,
	Konstantin Belousov <kostikbel@gmail.com>
Subject: Re: Fwd: [RFC] Kernel shared variables
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Jun 2012 16:22:23 -0000

On Tuesday, June 05, 2012 11:44:37 am Dag-Erling Sm=C3=B8rgrav wrote:
> John Baldwin <jhb@freebsd.org> writes:
> > So you call getpid() on each access to a shared resource?
>=20
> I don't, but I've seen code that does, under the assumption that all the
> world is Linux and getpid() is free.  Here's a sample from RHEL6 on a
> 3.1 GHz i5, using raise(0) as a baseline:
>=20
> getpid(): 10,000,000 iterations in 24,400 ms
> gettimeofday(0, 0): 10,000,000 iterations in 54,104 ms
> raise(0): 10,000,000 iterations in 1,284,593 ms
>=20
> The difference between the first two is due to the fact that while
> getpid() just returns a constant, gettimeofday(0, 0) performs two
> comparisons first.  Passing an actual struct timeval to gettimeofday()
> slows it down by a factor of about 6.
>=20
> (strace confirms that no system calls occur for either getpid() or
> gettimeofday(0, 0))
>=20
> Here is the same program running on FreeBSD 9.0-RELEASE in VirtualBox on
> an otherwise idle 3.4 GHz i7:
>=20
> getpid(): 10,000,000 iterations in 777,251 ms
> gettimeofday(0, 0): 10,000,000 iterations in 799,808 ms
> raise(0): 10,000,000 iterations in 2,142,275 ms

Yes, we know getpid() is slow, I think the question is does it matter that=
=20
it's slow in something other than a microbenchmark.  Can you name the=20
application that you've seen use getpid()?

=2D-=20
John Baldwin

From owner-freebsd-arch@FreeBSD.ORG  Tue Jun  5 16:56:11 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id C513F106567B;
	Tue,  5 Jun 2012 16:56:11 +0000 (UTC)
	(envelope-from luigi@onelab2.iet.unipi.it)
Received: from onelab2.iet.unipi.it (onelab2.iet.unipi.it [131.114.59.238])
	by mx1.freebsd.org (Postfix) with ESMTP id 782AF8FC0C;
	Tue,  5 Jun 2012 16:56:11 +0000 (UTC)
Received: by onelab2.iet.unipi.it (Postfix, from userid 275)
	id 9F9CA7300B; Tue,  5 Jun 2012 19:14:46 +0200 (CEST)
Date: Tue, 5 Jun 2012 19:14:46 +0200
From: Luigi Rizzo <rizzo@iet.unipi.it>
To: John Baldwin <jhb@freebsd.org>
Message-ID: <20120605171446.GA28387@onelab2.iet.unipi.it>
References: <CACfq090r1tWhuDkxdSZ24fwafbVKU0yduu1yV2+oYo+wwT4ipA@mail.gmail.com>
	<201206051008.29568.jhb@freebsd.org> <86haupvk4a.fsf@ds4.des.no>
	<201206051222.12627.jhb@freebsd.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <201206051222.12627.jhb@freebsd.org>
User-Agent: Mutt/1.4.2.3i
Cc: Gianni <gianni@freebsd.org>, Alan Cox <alc@rice.edu>,
	Alexander Kabaev <kan@freebsd.org>, Attilio Rao <attilio@freebsd.org>,
	Konstantin Belousov <kib@freebsd.org>, freebsd-arch@freebsd.org,
	Konstantin Belousov <kostikbel@gmail.com>,
	Dag-Erling Sm??rgrav <des@des.no>
Subject: Fast vs slow syscalls (Re: Fwd: [RFC] Kernel shared variables)
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Jun 2012 16:56:11 -0000

On Tue, Jun 05, 2012 at 12:22:12PM -0400, John Baldwin wrote:
> On Tuesday, June 05, 2012 11:44:37 am Dag-Erling Sm??rgrav wrote:
> > John Baldwin <jhb@freebsd.org> writes:
> > > So you call getpid() on each access to a shared resource?
> > 
> > I don't, but I've seen code that does, under the assumption that all the
> > world is Linux and getpid() is free.  Here's a sample from RHEL6 on a
> > 3.1 GHz i5, using raise(0) as a baseline:
> > 
> > getpid(): 10,000,000 iterations in 24,400 ms
> > gettimeofday(0, 0): 10,000,000 iterations in 54,104 ms
> > raise(0): 10,000,000 iterations in 1,284,593 ms
> > 
> > The difference between the first two is due to the fact that while
> > getpid() just returns a constant, gettimeofday(0, 0) performs two
> > comparisons first.  Passing an actual struct timeval to gettimeofday()
> > slows it down by a factor of about 6.
> > 
> > (strace confirms that no system calls occur for either getpid() or
> > gettimeofday(0, 0))
> > 
> > Here is the same program running on FreeBSD 9.0-RELEASE in VirtualBox on
> > an otherwise idle 3.4 GHz i7:
> > 
> > getpid(): 10,000,000 iterations in 777,251 ms
> > gettimeofday(0, 0): 10,000,000 iterations in 799,808 ms
> > raise(0): 10,000,000 iterations in 2,142,275 ms
> 
> Yes, we know getpid() is slow, I think the question is does it matter that 
> it's slow in something other than a microbenchmark.  Can you name the 
> application that you've seen use getpid()?

i think the important question is, for any function X:
    Q1	"does it require horrible hacks or a huge amount of work
	to make X syscall-free ?"
rather than
    Q2	"does it matter to make X fast"

If the answer to Q1 is "no" then there is no question
we should try to implement it.

Clearly the answer changes depending on the infrastructure we
have in place (e.g. without some shared kernel page we could not
export gettimeofday() calibration data, or PID numbers, etc).

And if we really want to educate people to use syscalls in a sensible
way (which I do see as a valuable goal, just not always)
we could always use an environment variable, LIBC_OPTIONS,
which enables or disables certain optimizations, similar
to MALLOC_OPTIONS.

cheers
luigi

  or 
> -- 
> John Baldwin
> _______________________________________________
> freebsd-arch@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-arch
> To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org"

From owner-freebsd-arch@FreeBSD.ORG  Tue Jun  5 17:26:22 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 4EBA21065672;
	Tue,  5 Jun 2012 17:26:22 +0000 (UTC)
	(envelope-from freebsd-listen@fabiankeil.de)
Received: from smtprelay06.ispgateway.de (smtprelay06.ispgateway.de
	[80.67.31.104])
	by mx1.freebsd.org (Postfix) with ESMTP id 0A2438FC19;
	Tue,  5 Jun 2012 17:26:22 +0000 (UTC)
Received: from [87.79.196.217] (helo=fabiankeil.de)
	by smtprelay06.ispgateway.de with esmtpsa (TLSv1:AES128-SHA:128)
	(Exim 4.68) (envelope-from <freebsd-listen@fabiankeil.de>)
	id 1SbxTH-0008PR-Pd; Tue, 05 Jun 2012 19:23:07 +0200
Date: Tue, 5 Jun 2012 19:15:45 +0200
From: Fabian Keil <freebsd-listen@fabiankeil.de>
To: George Neville-Neil <gnn@freebsd.org>
Message-ID: <20120605191545.65779e1e@fabiankeil.de>
In-Reply-To: <F68B592D-234D-4E75-BECC-6B9295779C37@freebsd.org>
References: <86wr40tfhf.wl%gnn@neville-neil.com>
	<20120528190300.3a43fc8d@fabiankeil.de>
	<F68B592D-234D-4E75-BECC-6B9295779C37@freebsd.org>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=PGP-SHA1;
	boundary="Sig_/Lte0l0kRmu9E+IA_EkjArnW";
	protocol="application/pgp-signature"
X-Df-Sender: Nzc1MDY3
Cc: arch@freebsd.org
Subject: Re: RFC: A trial io provider for DTrace...
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Jun 2012 17:26:22 -0000

--Sig_/Lte0l0kRmu9E+IA_EkjArnW
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable

George Neville-Neil <gnn@freebsd.org> wrote:

> On May 28, 2012, at 13:03 , Fabian Keil wrote:

> >> Remember you need to be root to use DTrace.
> >=20
> > Do you intent to eventually commit your patch to get dtrace working
> > with sudo? I've been using it since you posted it last October and
> > haven't seen any issues.
> > http://lists.freebsd.org/pipermail/freebsd-current/2011-October/028120.=
html
> >=20
>=20
> Sorry, what I meant was that you needed root privilege to run DTrace,
> sudo will give you that.

I got that, but was under the impression that the patch
was still necessary to get dtrace working with sudo and
thus was surprised that it hadn't been committed yet.

Apparently I missed the memo that the problem has already
been fixed differently and the patch is no longer required.

Fabian

--Sig_/Lte0l0kRmu9E+IA_EkjArnW
Content-Type: application/pgp-signature; name=signature.asc
Content-Disposition: attachment; filename=signature.asc

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (FreeBSD)

iEYEARECAAYFAk/OPsgACgkQBYqIVf93VJ3DUgCfXiSZjUbQEKXruMiHKUXpUesO
pyQAnjQ+/hVdoHfPqpqmmPr7hFCVHL22
=LRiD
-----END PGP SIGNATURE-----

--Sig_/Lte0l0kRmu9E+IA_EkjArnW--

From owner-freebsd-arch@FreeBSD.ORG  Tue Jun  5 17:40:16 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id DEDEF10656E8;
	Tue,  5 Jun 2012 17:40:16 +0000 (UTC) (envelope-from des@des.no)
Received: from smtp.des.no (smtp.des.no [194.63.250.102])
	by mx1.freebsd.org (Postfix) with ESMTP id 37A748FC12;
	Tue,  5 Jun 2012 17:40:15 +0000 (UTC)
Received: from ds4.des.no (smtp.des.no [194.63.250.102])
	by smtp.des.no (Postfix) with ESMTP id 3E1F760A6;
	Tue,  5 Jun 2012 17:40:14 +0000 (UTC)
Received: by ds4.des.no (Postfix, from userid 1001)
	id D346E962F; Tue,  5 Jun 2012 19:40:13 +0200 (CEST)
From: =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= <des@des.no>
To: John Baldwin <jhb@freebsd.org>
References: <CACfq090r1tWhuDkxdSZ24fwafbVKU0yduu1yV2+oYo+wwT4ipA@mail.gmail.com>
	<201206051008.29568.jhb@freebsd.org> <86haupvk4a.fsf@ds4.des.no>
	<201206051222.12627.jhb@freebsd.org>
Date: Tue, 05 Jun 2012 19:40:13 +0200
In-Reply-To: <201206051222.12627.jhb@freebsd.org> (John Baldwin's message of
	"Tue, 5 Jun 2012 12:22:12 -0400")
Message-ID: <868vg1verm.fsf@ds4.des.no>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.3 (berkeley-unix)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Cc: Gianni <gianni@freebsd.org>, Alan Cox <alc@rice.edu>,
	Alexander Kabaev <kan@freebsd.org>, Attilio Rao <attilio@freebsd.org>,
	Konstantin Belousov <kib@freebsd.org>, freebsd-arch@freebsd.org,
	Konstantin Belousov <kostikbel@gmail.com>
Subject: Re: Fwd: [RFC] Kernel shared variables
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Jun 2012 17:40:17 -0000

John Baldwin <jhb@freebsd.org> writes:
> Yes, we know getpid() is slow, I think the question is does it matter tha=
t=20
> it's slow in something other than a microbenchmark.  Can you name the=20
> application that you've seen use getpid()?

I've seen it in a proprietary multi-platform shared memory library.

Closer to home, I believe sqlite3 does the same thing, and we do this
ourselves, albeit on a smaller, non-performance-critical scale, e.g. in
the pidfile API and (IIRC) in nsswitch and the resolver.

BTW, raise(0) was a poor choice of baseline since it actually calls
getpid(), which makes no difference on Linux but does on FreeBSD.  The
actual numbers for FreeBSD are:

getpid(): 10,000,000 iterations in 784,638 ms
gettimeofday(0, 0): 10,000,000 iterations in 801,375 ms
kill(pid, 0): 10,000,000 iterations in 1,190,791 ms

DES
--=20
Dag-Erling Sm=C3=B8rgrav - des@des.no

From owner-freebsd-arch@FreeBSD.ORG  Tue Jun  5 18:37:10 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 6E4D41065673;
	Tue,  5 Jun 2012 18:37:10 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail05.syd.optusnet.com.au (mail05.syd.optusnet.com.au
	[211.29.132.186])
	by mx1.freebsd.org (Postfix) with ESMTP id EFC478FC1B;
	Tue,  5 Jun 2012 18:37:09 +0000 (UTC)
Received: from c122-106-171-232.carlnfd1.nsw.optusnet.com.au
	(c122-106-171-232.carlnfd1.nsw.optusnet.com.au [122.106.171.232])
	by mail05.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	q55IassO020480
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Wed, 6 Jun 2012 04:36:56 +1000
Date: Wed, 6 Jun 2012 04:36:54 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Luigi Rizzo <rizzo@iet.unipi.it>
In-Reply-To: <20120605171446.GA28387@onelab2.iet.unipi.it>
Message-ID: <20120606040931.F1050@besplex.bde.org>
References: <CACfq090r1tWhuDkxdSZ24fwafbVKU0yduu1yV2+oYo+wwT4ipA@mail.gmail.com>
	<201206051008.29568.jhb@freebsd.org> <86haupvk4a.fsf@ds4.des.no>
	<201206051222.12627.jhb@freebsd.org>
	<20120605171446.GA28387@onelab2.iet.unipi.it>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Mailman-Approved-At: Tue, 05 Jun 2012 18:47:47 +0000
Cc: Gianni <gianni@FreeBSD.org>, John Baldwin <jhb@FreeBSD.org>,
	Alan Cox <alc@rice.edu>, Alexander Kabaev <kan@FreeBSD.org>,
	Attilio Rao <attilio@FreeBSD.org>,
	Konstantin Belousov <kib@FreeBSD.org>, freebsd-arch@FreeBSD.org,
	Konstantin Belousov <kostikbel@gmail.com>,
	Dag-Erling Sm??rgrav <des@des.no>
Subject: Re: Fast vs slow syscalls (Re: Fwd: [RFC] Kernel shared variables)
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Jun 2012 18:37:10 -0000

On Tue, 5 Jun 2012, Luigi Rizzo wrote:

> On Tue, Jun 05, 2012 at 12:22:12PM -0400, John Baldwin wrote:
>> On Tuesday, June 05, 2012 11:44:37 am Dag-Erling Sm??rgrav wrote:
>>> John Baldwin <jhb@freebsd.org> writes:
>>>> So you call getpid() on each access to a shared resource?
>>>
>>> I don't, but I've seen code that does, under the assumption that all the
>>> world is Linux and getpid() is free.  Here's a sample from RHEL6 on a
>>> 3.1 GHz i5, using raise(0) as a baseline:
>>>
>>> getpid(): 10,000,000 iterations in 24,400 ms
>>> gettimeofday(0, 0): 10,000,000 iterations in 54,104 ms
>>> raise(0): 10,000,000 iterations in 1,284,593 ms

That's one slow system or broken units.  24.4 seconds for 10 million
"syscalls" in the fastest case?  If the comma is really a decimal
point, then 24.4 milliseconds makes sense, but then the number of
iterations would be only 10, with a the second comma being a syntax
error.  If ms actually means microseconds, then someone should fix
ping(1) to stop pretending that it is 1000 times as fast as it is.

After adjusting by factors of 1000 here and there, this format is still
hard to parse.  I like the format of nsec/operation.  24400 10 million
operations in 24400 moroseconds seems to scale to 2.44 nsec/call (if 1
moro = 1 micro).  But that is impossibly fast, unless getpid() is
inlined to a load of the shared variable (it may also need the load to
be moved outside the loop).  I can't see any reasonable adjustment that
gives 24.4 nsec/call.

>>> The difference between the first two is due to the fact that while
>>> getpid() just returns a constant, gettimeofday(0, 0) performs two
>>> comparisons first.  Passing an actual struct timeval to gettimeofday()
>>> slows it down by a factor of about 6.
>>>
>>> (strace confirms that no system calls occur for either getpid() or
>>> gettimeofday(0, 0))
>>>
>>> Here is the same program running on FreeBSD 9.0-RELEASE in VirtualBox on
>>> an otherwise idle 3.4 GHz i7:
>>>
>>> getpid(): 10,000,000 iterations in 777,251 ms
>>> gettimeofday(0, 0): 10,000,000 iterations in 799,808 ms
>>> raise(0): 10,000,000 iterations in 2,142,275 ms

2142.275 seconds is really slow.

>> Yes, we know getpid() is slow, I think the question is does it matter that
>> it's slow in something other than a microbenchmark.  Can you name the
>> application that you've seen use getpid()?
>
> i think the important question is, for any function X:
>    Q1	"does it require horrible hacks or a huge amount of work
> 	to make X syscall-free ?"
> rather than
>    Q2	"does it matter to make X fast"

s/huge amount/any/

Work is all the programming work to implement it and maintain it forever.

> If the answer to Q1 is "no" then there is no question
> we should try to implement it.

The answer is sure to be "no", but you should try to implement to
see if it is easier or works better than expected.

Bruce

From owner-freebsd-arch@FreeBSD.ORG  Tue Jun  5 18:57:38 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 4F3351065670;
	Tue,  5 Jun 2012 18:57:38 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail07.syd.optusnet.com.au (mail07.syd.optusnet.com.au
	[211.29.132.188])
	by mx1.freebsd.org (Postfix) with ESMTP id D4F048FC1E;
	Tue,  5 Jun 2012 18:57:37 +0000 (UTC)
Received: from c122-106-171-232.carlnfd1.nsw.optusnet.com.au
	(c122-106-171-232.carlnfd1.nsw.optusnet.com.au [122.106.171.232])
	by mail07.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	q55IvTj9015074
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Wed, 6 Jun 2012 04:57:30 +1000
Date: Wed, 6 Jun 2012 04:57:29 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Andrey Chernov <ache@FreeBSD.org>
In-Reply-To: <20120605130922.GE13306@vniz.net>
Message-ID: <20120606043731.D1124@besplex.bde.org>
References: <201206042134.q54LYoVJ067685@svn.freebsd.org>
	<20120605074741.GA1391@garage.freebsd.pl>
	<20120605130922.GE13306@vniz.net>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: svn-src-head@FreeBSD.org, svn-src-all@FreeBSD.org,
	src-committers@FreeBSD.org,
	Pawel Jakub Dawidek <pjd@FreeBSD.org>, freebsd-arch@FreeBSD.org
Subject: Re: svn commit: r236582 - head/lib/libc/stdlib
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Jun 2012 18:57:38 -0000

On Tue, 5 Jun 2012, Andrey Chernov wrote:

> On Tue, Jun 05, 2012 at 09:47:42AM +0200, Pawel Jakub Dawidek wrote:
>>>   "The setting of errno after a successful call to a function is
>>>   unspecified unless the description of that function specifies that
>>>   errno shall not be modified."
>>
>> Very interesting. However free(3) is always successful. Maybe we need
>> more context here, but the sentence above might talk about functions
>> that can either succeed or fail and such functions do set errno on
>> failure, but we don't know what they do to errno on success - they
>> sometimes interact with the errno, free(3) never does.
>
> According to Austing Group interpretation, this setence talks about
> funtions which always succeed too, please see
> http://austingroupbugs.net/view.php?id=385

This has very little to do with POSIX.  It is a basic part of Standard
C that the C library may, at its option, clobber errno, gratuitously
or otherwise.  From n869.txt:

        [#3]  The  value of errno is zero at program startup, but is
        never set to zero by any library function.159)  The value of
        errno  may  be  set  to  nonzero  by a library function call
        whether or not there is an error, provided the use of  errno
        is not documented in the description of the function in this
        International Standard.

Use of errno is not documented for free(); thus free() is permitted to
clobber errno.

POSIX may require errno to not be clobbered, especially for its functions.
It probably shouldn't do this for Standard C library functions like free(),
since this would be an extension and any use of the extension would give
unnecessarily unportanle code.

>> I aware that my interpretation might be too wishful, but it is pretty
>> obvious to save errno value when calling a function that can eventually
>> fail - when we save the errno we don't know if it will fail or not, so
>> we have to do that, but requiring to save errno when calling a void
>> function that can't fail is simply silly and complicates the code
>> without a reason.

This has very little to do with success or failure.  It does complicate
the code for callers, but actually simplifies the library.  Since most
libary functions aren't required to preserve errno, they can call each
other without having save and restore errno when they call each other.

> It still can fail due to internal errors, it just not returns failure.
> For internal errors POSIX states that errno state is unspecified.
>
>> I agree that the standards aren't clear, but if saving errno around
>> free(3) is the way to go, then I'm sure we have much more problems in
>> our code, even if it is not suppose to be portable it should be correct
>> - we never know who and when will take the code and port it.
>
> Currently they are pretty clear in that moment, although I agree that if
> POSIX says it should not modify errno, the life will be easy. Lets look at
> their further movement, since they are already aware of this specific
> problem.

They are perfectly clear.

>> I guess what I'm trying to say here is that this is much bigger change
>> than it looks and we should probably agree on some global rule here.
>
> ...which not violate standards.

Yes, its completion is a very large and ugly change.  realpath() is a
POSIX interface, so any code that implements or uses it can safely assume
POSIX requirements.  But non-POSIX code can only safely assume Standard
C requirements.  OTOH, the libary can assume anything that it wants and
implements for itself, since it is the implementation so it can make
free() easy to use for itself, with any extensions that aren't incompatible
with Standard C.  Since free() is allowed to clobber errno, it is also
allowed to do a null clobber as a compatible extension.

Bruce

From owner-freebsd-arch@FreeBSD.ORG  Tue Jun  5 18:44:53 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id F28541065672;
	Tue,  5 Jun 2012 18:44:52 +0000 (UTC)
	(envelope-from luigi@onelab2.iet.unipi.it)
Received: from onelab2.iet.unipi.it (onelab2.iet.unipi.it [131.114.59.238])
	by mx1.freebsd.org (Postfix) with ESMTP id A30FC8FC20;
	Tue,  5 Jun 2012 18:44:52 +0000 (UTC)
Received: by onelab2.iet.unipi.it (Postfix, from userid 275)
	id 3A39C7300A; Tue,  5 Jun 2012 21:03:34 +0200 (CEST)
Date: Tue, 5 Jun 2012 21:03:34 +0200
From: Luigi Rizzo <rizzo@iet.unipi.it>
To: Bruce Evans <brde@optusnet.com.au>
Message-ID: <20120605190334.GB29067@onelab2.iet.unipi.it>
References: <CACfq090r1tWhuDkxdSZ24fwafbVKU0yduu1yV2+oYo+wwT4ipA@mail.gmail.com>
	<201206051008.29568.jhb@freebsd.org> <86haupvk4a.fsf@ds4.des.no>
	<201206051222.12627.jhb@freebsd.org>
	<20120605171446.GA28387@onelab2.iet.unipi.it>
	<20120606040931.F1050@besplex.bde.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20120606040931.F1050@besplex.bde.org>
User-Agent: Mutt/1.4.2.3i
X-Mailman-Approved-At: Tue, 05 Jun 2012 19:09:36 +0000
Cc: Gianni <gianni@FreeBSD.org>, John Baldwin <jhb@FreeBSD.org>,
	Alan Cox <alc@rice.edu>, Alexander Kabaev <kan@FreeBSD.org>,
	Attilio Rao <attilio@FreeBSD.org>,
	Konstantin Belousov <kib@FreeBSD.org>, freebsd-arch@FreeBSD.org,
	Konstantin Belousov <kostikbel@gmail.com>,
	Dag-Erling Sm??rgrav <des@des.no>
Subject: Re: Fast vs slow syscalls (Re: Fwd: [RFC] Kernel shared variables)
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Jun 2012 18:44:53 -0000

On Wed, Jun 06, 2012 at 04:36:54AM +1000, Bruce Evans wrote:
> On Tue, 5 Jun 2012, Luigi Rizzo wrote:
...
> >>Yes, we know getpid() is slow, I think the question is does it matter that
> >>it's slow in something other than a microbenchmark.  Can you name the
> >>application that you've seen use getpid()?
> >
> >i think the important question is, for any function X:
> >   Q1	"does it require horrible hacks or a huge amount of work
> >	to make X syscall-free ?"
> >rather than
> >   Q2	"does it matter to make X fast"
> 
> s/huge amount/any/
> 
> Work is all the programming work to implement it and maintain it forever.

well, some work has a return in term of fun, beauty, pride
so the balance is still favourable.

cheers
luigi

From owner-freebsd-arch@FreeBSD.ORG  Tue Jun  5 19:18:39 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 988711065673;
	Tue,  5 Jun 2012 19:18:39 +0000 (UTC)
	(envelope-from adrian.chadd@gmail.com)
Received: from mail-pb0-f54.google.com (mail-pb0-f54.google.com
	[209.85.160.54])
	by mx1.freebsd.org (Postfix) with ESMTP id 653698FC15;
	Tue,  5 Jun 2012 19:18:39 +0000 (UTC)
Received: by pbbro2 with SMTP id ro2so8367613pbb.13
	for <multiple recipients>; Tue, 05 Jun 2012 12:18:38 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
	h=mime-version:sender:in-reply-to:references:date
	:x-google-sender-auth:message-id:subject:from:to:cc:content-type;
	bh=bxQJST5Khq0Vf22zizVcAw4ZL6jRkQ1IJtPSQo0K5sE=;
	b=iCaaTo0BJQMJozginIS4a6y0m/9aLr4jAQR6nPdoTTbJCNIwF0nAQ4FMDWfrp2mrbU
	+H13BsCCYGBEK2+KyLxDSSlv85yqHsfK2pLvdd8vFHIi/Lsl6kP+a6r8hu6zQw1iRzby
	Eh64zrQPfP4J3x1TNlU0Mo/eI/qnXBKnyHukGEVQ+yM+U79+DCUkdediDuzG1d935QKS
	UUZsahrmM3wEhcaNN7iH+QlO5SIw4X7ZLCPJTAlF6BeB5WlFBn9dvVyBCa1VDgWs8jq+
	PD6fAdA9fLedYAhE4Ljk6joIE0O7vZovJuJZJeTwUEN3BI/FE+NkL/WtCyVjmj/mHoY5
	yrCA==
MIME-Version: 1.0
Received: by 10.68.211.170 with SMTP id nd10mr15393689pbc.68.1338923918908;
	Tue, 05 Jun 2012 12:18:38 -0700 (PDT)
Sender: adrian.chadd@gmail.com
Received: by 10.143.91.18 with HTTP; Tue, 5 Jun 2012 12:18:38 -0700 (PDT)
In-Reply-To: <CAJ-FndAadYbqiWUTupXLEcRMkYYL50Ssehi8f8vv6YXvQzy4OA@mail.gmail.com>
References: <86bokyvtc2.fsf@ds4.des.no>
	<CAJ-FndAMsoB1RAyS-Pa1JCv7W0qsviRxtShZ3uk_Tpd+J_EBaQ@mail.gmail.com>
	<CAJ-FndAadYbqiWUTupXLEcRMkYYL50Ssehi8f8vv6YXvQzy4OA@mail.gmail.com>
Date: Tue, 5 Jun 2012 12:18:38 -0700
X-Google-Sender-Auth: DSeJkA5dK22r5PTP8ZQcZH3U_pA
Message-ID: <CAJ-Vmo=F-=1UYcW0xBiNbjHa8CN7S=iDJ_bQDn4ESS3CumJf_A@mail.gmail.com>
From: Adrian Chadd <adrian@freebsd.org>
To: Attilio Rao <attilio@freebsd.org>
Content-Type: text/plain; charset=ISO-8859-1
Cc: =?ISO-8859-1?Q?Dag=2DErling_Sm=F8rgrav?= <des@des.no>, arch@freebsd.org
Subject: Re: KTR_SPAREx
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Jun 2012 19:18:39 -0000

Hi,

I'm very tempted to make if_ath use KTR_DEV, but then have an extra
ath sysctl which does something like:

if (sc->sc_ktr_enable)
    KTR();


Adrian

From owner-freebsd-arch@FreeBSD.ORG  Tue Jun  5 19:41:05 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@FreeBSD.ORG
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id CCEDF1065749;
	Tue,  5 Jun 2012 19:41:05 +0000 (UTC) (envelope-from ache@vniz.net)
Received: from vniz.net (vniz.net [194.87.13.69])
	by mx1.freebsd.org (Postfix) with ESMTP id 2C4BE8FC19;
	Tue,  5 Jun 2012 19:41:04 +0000 (UTC)
Received: from localhost (localhost [127.0.0.1])
	by vniz.net (8.14.5/8.14.5) with ESMTP id q55Jf2ab021290;
	Tue, 5 Jun 2012 23:41:03 +0400 (MSK) (envelope-from ache@vniz.net)
Received: (from ache@localhost)
	by localhost (8.14.5/8.14.5/Submit) id q55Jf25B021289;
	Tue, 5 Jun 2012 23:41:02 +0400 (MSK) (envelope-from ache)
Date: Tue, 5 Jun 2012 23:41:02 +0400
From: Andrey Chernov <ache@FreeBSD.ORG>
To: Bruce Evans <brde@optusnet.com.au>
Message-ID: <20120605194102.GA21173@vniz.net>
Mail-Followup-To: Andrey Chernov <ache@freebsd.org>,
	Bruce Evans <brde@optusnet.com.au>, svn-src-head@FreeBSD.ORG,
	svn-src-all@FreeBSD.ORG, src-committers@FreeBSD.ORG,
	Pawel Jakub Dawidek <pjd@FreeBSD.ORG>, freebsd-arch@FreeBSD.ORG
References: <201206042134.q54LYoVJ067685@svn.freebsd.org>
	<20120605074741.GA1391@garage.freebsd.pl>
	<20120605130922.GE13306@vniz.net>
	<20120606043731.D1124@besplex.bde.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20120606043731.D1124@besplex.bde.org>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: svn-src-head@FreeBSD.ORG, svn-src-all@FreeBSD.ORG,
	src-committers@FreeBSD.ORG,
	Pawel Jakub Dawidek <pjd@FreeBSD.ORG>, freebsd-arch@FreeBSD.ORG
Subject: Re: svn commit: r236582 - head/lib/libc/stdlib
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Jun 2012 19:41:05 -0000

On Wed, Jun 06, 2012 at 04:57:29AM +1000, Bruce Evans wrote:
> POSIX may require errno to not be clobbered, especially for its functions.
> It probably shouldn't do this for Standard C library functions like free(),
> since this would be an extension and any use of the extension would give
> unnecessarily unportanle code.

POSIX feels itself like they own all Standard C functions now. See 
"Resolved state" text for upcoming standard there:

"At line 30583 [XSH free DESCRIPTION], add a paragraph with CX shading:

The free() function shall not modify errno if ptr is a null pointer
or a pointer previously returned as if by malloc() and not yet 
deallocated.

At line 30591 [APPLICATION USAGE], add a new paragraph:

Because the free() function does not modify errno for valid pointers, it
is safe to use it in cleanup code without corrupting earlier errors, ..."

> OTOH, the libary can assume anything that it wants and
> implements for itself, since it is the implementation so it can make
> free() easy to use for itself, with any extensions that aren't incompatible
> with Standard C.  Since free() is allowed to clobber errno, it is also
> allowed to do a null clobber as a compatible extension.

Yes, it is safe for free() itself to save errno and still stay compliant 
with both current and upcoming POSIX and with Standard C. But any code
which rely on that is compliant with upcoming POSIX only. Since people 
don't want mass changes in that area, this is some sort of compromise
acceptable for me (in case free() itself will save/restore errno, of 
course).

-- 
http://ache.vniz.net/

From owner-freebsd-arch@FreeBSD.ORG  Tue Jun  5 20:11:05 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@FreeBSD.ORG
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 7D2BF1065673;
	Tue,  5 Jun 2012 20:11:05 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail05.syd.optusnet.com.au (mail05.syd.optusnet.com.au
	[211.29.132.186])
	by mx1.freebsd.org (Postfix) with ESMTP id 0813E8FC18;
	Tue,  5 Jun 2012 20:11:04 +0000 (UTC)
Received: from c122-106-171-232.carlnfd1.nsw.optusnet.com.au
	(c122-106-171-232.carlnfd1.nsw.optusnet.com.au [122.106.171.232])
	by mail05.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	q55KB1HC016442
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Wed, 6 Jun 2012 06:11:02 +1000
Date: Wed, 6 Jun 2012 06:11:01 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Andrey Chernov <ache@FreeBSD.ORG>
In-Reply-To: <20120605194102.GA21173@vniz.net>
Message-ID: <20120606054555.U1456@besplex.bde.org>
References: <201206042134.q54LYoVJ067685@svn.freebsd.org>
	<20120605074741.GA1391@garage.freebsd.pl>
	<20120605130922.GE13306@vniz.net>
	<20120606043731.D1124@besplex.bde.org>
	<20120605194102.GA21173@vniz.net>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: src-committers@FreeBSD.ORG, Pawel Jakub Dawidek <pjd@FreeBSD.ORG>,
	svn-src-all@FreeBSD.ORG, freebsd-arch@FreeBSD.ORG, svn-src-head@FreeBSD.ORG
Subject: Re: svn commit: r236582 - head/lib/libc/stdlib
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Jun 2012 20:11:05 -0000

On Tue, 5 Jun 2012, Andrey Chernov wrote:

> On Wed, Jun 06, 2012 at 04:57:29AM +1000, Bruce Evans wrote:
>> POSIX may require errno to not be clobbered, especially for its functions.
>> It probably shouldn't do this for Standard C library functions like free(),
>> since this would be an extension and any use of the extension would give
>> unnecessarily unportanle code.
>
> POSIX feels itself like they own all Standard C functions now. See

Not really.  They can extend anything they want...

> "Resolved state" text for upcoming standard there:
>
> "At line 30583 [XSH free DESCRIPTION], add a paragraph with CX shading:
>
> The free() function shall not modify errno if ptr is a null pointer
> or a pointer previously returned as if by malloc() and not yet
> deallocated.

...but the have to mark it as an extension, as they do here.

> At line 30591 [APPLICATION USAGE], add a new paragraph:
>
> Because the free() function does not modify errno for valid pointers, it
> is safe to use it in cleanup code without corrupting earlier errors, ..."

This is essentially unusable (so a bad idea).  Instead of unconditionally
saving and restoring errno around calls to free(), portable POSIX code
can soon use a messy ifdef to avoid doing this in some cases, but still
has to do it in other cases.  The results is just bloat and complexity
at the source level:

#if _POSIX_VERSION < mumble
 	int sverrno;
#endif
 	...
 	if (wantfree)
#if _POSIX_VERSION < mumble
 	{			/* I made these braces condtional ... */
 		sverrno = errno;
#endif
 		free(p);
#if _POSIX_VERSION < mumble
 		errno = sverrno;
 	}			/* ... to maximise the ugliness */
#endif

>> OTOH, the libary can assume anything that it wants and
>> implements for itself, since it is the implementation so it can make
>> free() easy to use for itself, with any extensions that aren't incompatible
>> with Standard C.  Since free() is allowed to clobber errno, it is also
>> allowed to do a null clobber as a compatible extension.
>
> Yes, it is safe for free() itself to save errno and still stay compliant
> with both current and upcoming POSIX and with Standard C. But any code
> which rely on that is compliant with upcoming POSIX only. Since people
> don't want mass changes in that area, this is some sort of compromise
> acceptable for me (in case free() itself will save/restore errno, of
> course).

libc has lots of magic non-conforming code.  A little more won't hurt.

However, free() is currently not careful about errno.  It begins with
an optional utrace() call, and this can in theory fail with errno ENOMEM
even if there are no bugs in malloc() (all other errors from utrace()
indicate bugs in the caller, assuming that the list of errnos in its man
page is complete).  malloc.c makes a few other sys(lib?)calls and never
saves errno.  I don't know if the others are reachable from free().

Bruce

From owner-freebsd-arch@FreeBSD.ORG  Tue Jun  5 20:14:04 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 3A8441065670;
	Tue,  5 Jun 2012 20:14:04 +0000 (UTC)
	(envelope-from asmrookie@gmail.com)
Received: from mail-lpp01m010-f54.google.com (mail-lpp01m010-f54.google.com
	[209.85.215.54])
	by mx1.freebsd.org (Postfix) with ESMTP id 78BF98FC08;
	Tue,  5 Jun 2012 20:14:03 +0000 (UTC)
Received: by laai10 with SMTP id i10so5214325laa.13
	for <multiple recipients>; Tue, 05 Jun 2012 13:14:02 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
	h=mime-version:sender:in-reply-to:references:date
	:x-google-sender-auth:message-id:subject:from:to:cc:content-type
	:content-transfer-encoding;
	bh=48RbuAC1Bs4gm51Wx/oRorkom0LY4BcDYWBFZcd52IQ=;
	b=aDb/RaZPLos5Y9ZBL3uMc2ba/7As8Jz/Zh7iW2Rujg9Fl4cOYHSYl9o0C7/UBMQsQk
	7aTqm3JdRwQ20AbukwszX3Slw8Q2Vd3+uplN3QMAfKLAi1Sb4R8h5JXrIPlEYWHKYXq7
	eGsXlXj7X+P739Qhnh6wyhizLsVLy5eqIvtpP4BAKvth/hKqHBT2el4999+/suC54Obe
	ZhwV9RPh+bMYJv1nPepm7aw1uPegjLM7znMjZiO4aHZQs7iZM0JEHZ+syFzo4slPf3LJ
	fQXAx88Ys/apdYL77kmGFruki17nB53yXINbST9QHUYguz5TyJZ5hJiAIg8A9iIbA8gc
	ky+w==
MIME-Version: 1.0
Received: by 10.112.45.4 with SMTP id i4mr8723394lbm.79.1338927242177; Tue, 05
	Jun 2012 13:14:02 -0700 (PDT)
Sender: asmrookie@gmail.com
Received: by 10.112.27.65 with HTTP; Tue, 5 Jun 2012 13:14:02 -0700 (PDT)
In-Reply-To: <CAJ-Vmo=F-=1UYcW0xBiNbjHa8CN7S=iDJ_bQDn4ESS3CumJf_A@mail.gmail.com>
References: <86bokyvtc2.fsf@ds4.des.no>
	<CAJ-FndAMsoB1RAyS-Pa1JCv7W0qsviRxtShZ3uk_Tpd+J_EBaQ@mail.gmail.com>
	<CAJ-FndAadYbqiWUTupXLEcRMkYYL50Ssehi8f8vv6YXvQzy4OA@mail.gmail.com>
	<CAJ-Vmo=F-=1UYcW0xBiNbjHa8CN7S=iDJ_bQDn4ESS3CumJf_A@mail.gmail.com>
Date: Tue, 5 Jun 2012 21:14:02 +0100
X-Google-Sender-Auth: Hk2hMq-jYJ-9tNhvIusoz3k2wtM
Message-ID: <CAJ-FndAjcfd21xwYHPrSxgz32eHp2xTGRao1Kyqx4yBZTPD94A@mail.gmail.com>
From: Attilio Rao <attilio@freebsd.org>
To: Adrian Chadd <adrian@freebsd.org>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Cc: =?UTF-8?Q?Dag=2DErling_Sm=C3=B8rgrav?= <des@des.no>, arch@freebsd.org
Subject: Re: KTR_SPAREx
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Jun 2012 20:14:04 -0000

2012/6/5 Adrian Chadd <adrian@freebsd.org>:
> Hi,
>
> I'm very tempted to make if_ath use KTR_DEV, but then have an extra
> ath sysctl which does something like:
>
> if (sc->sc_ktr_enable)
> =C2=A0 =C2=A0KTR();

But the actual problem is that your output will be overwhelmed by the
clutter of all the other KTR_DEV consumers.

We very much need an much higher granularity on KTR classes and
possibly a way to use it on-the-fly for kernel development and I think
what I suggested earlier makes sense.

Attilio


--=20
Peace can only be achieved by understanding - A. Einstein

From owner-freebsd-arch@FreeBSD.ORG  Tue Jun  5 20:30:52 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 5A32A106566C;
	Tue,  5 Jun 2012 20:30:52 +0000 (UTC)
	(envelope-from gleb.kurtsou@gmail.com)
Received: from mail-lb0-f182.google.com (mail-lb0-f182.google.com
	[209.85.217.182])
	by mx1.freebsd.org (Postfix) with ESMTP id C05F08FC0A;
	Tue,  5 Jun 2012 20:30:50 +0000 (UTC)
Received: by lbon10 with SMTP id n10so5373119lbo.13
	for <multiple recipients>; Tue, 05 Jun 2012 13:30:49 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
	h=date:from:to:cc:subject:message-id:references:mime-version
	:content-type:content-disposition:content-transfer-encoding
	:in-reply-to:user-agent;
	bh=u8AP51ONs2zfWJ3eMhDrdW9UcthNAbwHhoYUE3ia+vg=;
	b=AC3AEKqo/GksdzFuQBRfpItOG6MX8gxwE+0+fpJlDUEiF4E1aB4MeZCu2JviEODC2f
	AJeQ2b6zhdZjUZekw5WmZZJACP0EdavtY1hzlIyjvGgAZiiWnB6R0fGkXtDih2IQKxNV
	Aq9bzfaEmGh41HKsF51i8/FzY3qLpljZhIaALQg+7yxnebA1sWwhBqblF0kzvqMvt0/f
	qrB1o2QgN3oCtjxIOE6JneH/34fw0gCFjQm8yPfcvcGjmq0QeHcoezN+dkR9Q1oT+0M9
	VMkebalIeG+1ejDI8uHtr/fK0XZcbWLvDkTUeDbk/wxlmprjOghnC8tAQDHgCkxrLlRg
	9sog==
Received: by 10.112.26.165 with SMTP id m5mr8686727lbg.15.1338928249549;
	Tue, 05 Jun 2012 13:30:49 -0700 (PDT)
Received: from localhost ([78.157.92.5])
	by mx.google.com with ESMTPS id hg4sm4139283lab.11.2012.06.05.13.30.47
	(version=SSLv3 cipher=OTHER); Tue, 05 Jun 2012 13:30:48 -0700 (PDT)
Date: Tue, 5 Jun 2012 23:30:42 +0300
From: Gleb Kurtsou <gleb.kurtsou@gmail.com>
To: John Baldwin <jhb@freebsd.org>
Message-ID: <20120605203042.GA4081@reks>
References: <CACfq090r1tWhuDkxdSZ24fwafbVKU0yduu1yV2+oYo+wwT4ipA@mail.gmail.com>
	<201206051008.29568.jhb@freebsd.org> <86haupvk4a.fsf@ds4.des.no>
	<201206051222.12627.jhb@freebsd.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <201206051222.12627.jhb@freebsd.org>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: Gianni <gianni@freebsd.org>, Alan Cox <alc@rice.edu>,
	Alexander Kabaev <kan@freebsd.org>, Attilio Rao <attilio@freebsd.org>,
	Konstantin Belousov <kib@freebsd.org>, freebsd-arch@freebsd.org,
	Konstantin Belousov <kostikbel@gmail.com>,
	Dag-Erling =?utf-8?B?U23DuHJncmF2?= <des@des.no>
Subject: Re: Fwd: [RFC] Kernel shared variables
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Jun 2012 20:30:52 -0000

On (05/06/2012 12:22), John Baldwin wrote:
> On Tuesday, June 05, 2012 11:44:37 am Dag-Erling Smørgrav wrote:
> > John Baldwin <jhb@freebsd.org> writes:
> > > So you call getpid() on each access to a shared resource?
> > 
> > I don't, but I've seen code that does, under the assumption that all the
> > world is Linux and getpid() is free.  Here's a sample from RHEL6 on a
> > 3.1 GHz i5, using raise(0) as a baseline:
> > 
> > getpid(): 10,000,000 iterations in 24,400 ms
> > gettimeofday(0, 0): 10,000,000 iterations in 54,104 ms
> > raise(0): 10,000,000 iterations in 1,284,593 ms
> > 
> > The difference between the first two is due to the fact that while
> > getpid() just returns a constant, gettimeofday(0, 0) performs two
> > comparisons first.  Passing an actual struct timeval to gettimeofday()
> > slows it down by a factor of about 6.
> > 
> > (strace confirms that no system calls occur for either getpid() or
> > gettimeofday(0, 0))
> > 
> > Here is the same program running on FreeBSD 9.0-RELEASE in VirtualBox on
> > an otherwise idle 3.4 GHz i7:
> > 
> > getpid(): 10,000,000 iterations in 777,251 ms
> > gettimeofday(0, 0): 10,000,000 iterations in 799,808 ms
> > raise(0): 10,000,000 iterations in 2,142,275 ms
> 
> Yes, we know getpid() is slow, I think the question is does it matter that 
> it's slow in something other than a microbenchmark.  Can you name the 
> application that you've seen use getpid()?
> 

arc4random* calls getpid() on every invocation (which is right thing to
do, imo) to reinitialize generator after fork.

As an example consider network daemon encrypting/decrypting packets that
is likely to need randomness to encrypt or process considerable portion
of data. Too much depends on the crypto protocols/algorithms used, but
scenario is pretty much real-life. It's a good example when getpid() is
actually needed, but not called often because of being cheap.

Thanks,
Gleb.

From owner-freebsd-arch@FreeBSD.ORG  Tue Jun  5 21:01:57 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@FreeBSD.ORG
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 8706810656B0;
	Tue,  5 Jun 2012 21:01:57 +0000 (UTC) (envelope-from ache@vniz.net)
Received: from vniz.net (vniz.net [194.87.13.69])
	by mx1.freebsd.org (Postfix) with ESMTP id E88D68FC0C;
	Tue,  5 Jun 2012 21:01:56 +0000 (UTC)
Received: from localhost (localhost [127.0.0.1])
	by vniz.net (8.14.5/8.14.5) with ESMTP id q55L1tOI022765;
	Wed, 6 Jun 2012 01:01:55 +0400 (MSK) (envelope-from ache@vniz.net)
Received: (from ache@localhost)
	by localhost (8.14.5/8.14.5/Submit) id q55L1sFR022764;
	Wed, 6 Jun 2012 01:01:54 +0400 (MSK) (envelope-from ache)
Date: Wed, 6 Jun 2012 01:01:54 +0400
From: Andrey Chernov <ache@FreeBSD.ORG>
To: Bruce Evans <brde@optusnet.com.au>
Message-ID: <20120605210154.GA22370@vniz.net>
Mail-Followup-To: Andrey Chernov <ache@freebsd.org>,
	Bruce Evans <brde@optusnet.com.au>, svn-src-head@FreeBSD.ORG,
	svn-src-all@FreeBSD.ORG, src-committers@FreeBSD.ORG,
	Pawel Jakub Dawidek <pjd@FreeBSD.ORG>, freebsd-arch@FreeBSD.ORG
References: <201206042134.q54LYoVJ067685@svn.freebsd.org>
	<20120605074741.GA1391@garage.freebsd.pl>
	<20120605130922.GE13306@vniz.net>
	<20120606043731.D1124@besplex.bde.org>
	<20120605194102.GA21173@vniz.net>
	<20120606054555.U1456@besplex.bde.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20120606054555.U1456@besplex.bde.org>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: svn-src-head@FreeBSD.ORG, svn-src-all@FreeBSD.ORG,
	src-committers@FreeBSD.ORG,
	Pawel Jakub Dawidek <pjd@FreeBSD.ORG>, freebsd-arch@FreeBSD.ORG
Subject: Re: svn commit: r236582 - head/lib/libc/stdlib
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Jun 2012 21:01:57 -0000

On Wed, Jun 06, 2012 at 06:11:01AM +1000, Bruce Evans wrote:
> This is essentially unusable (so a bad idea).  Instead of unconditionally
> saving and restoring errno around calls to free(), portable POSIX code
> can soon use a messy ifdef to avoid doing this in some cases, but still
> has to do it in other cases.  The results is just bloat and complexity
> at the source level:

It looks like they now consider POSIX as moving target where previous 
POSIX versions compatibility is not so essential to care about much. I 
don't have other interpretation of their decision to suddenly accept 
free() as not modifying errno. Since they clearly indicate code 
differences for old and new standard, they are well aware of them and of 
resulting code bloating.

> However, free() is currently not careful about errno.  It begins with
> an optional utrace() call, and this can in theory fail with errno ENOMEM
> even if there are no bugs in malloc() (all other errors from utrace()
> indicate bugs in the caller, assuming that the list of errnos in its man
> page is complete).  malloc.c makes a few other sys(lib?)calls and never
> saves errno.  I don't know if the others are reachable from free().

I fill PR about that:
http://www.freebsd.org/cgi/query-pr.cgi?pr=168719

-- 
http://ache.vniz.net/

From owner-freebsd-arch@FreeBSD.ORG  Tue Jun  5 21:30:42 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@FreeBSD.ORG
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id BB36D106564A;
	Tue,  5 Jun 2012 21:30:42 +0000 (UTC)
	(envelope-from joerg@britannica.bec.de)
Received: from mo6-p00-ob.rzone.de (mo6-p00-ob.rzone.de
	[IPv6:2a01:238:20a:202:5300::1])
	by mx1.freebsd.org (Postfix) with ESMTP id 94EE88FC0C;
	Tue,  5 Jun 2012 21:30:41 +0000 (UTC)
X-RZG-AUTH: :JiIXek6mfvEEUpFQdo7Fj1/zg48CFjWjQv0cW+St/nW/afgnrylsiW+1ZjV+pgsJ
X-RZG-CLASS-ID: mo00
Received: from britannica.bec.de
	(ip-109-45-139-202.web.vodafone.de [109.45.139.202])
	by smtp.strato.de (jored mo73) (RZmta 29.10 DYNA|AUTH)
	with (AES128-SHA encrypted) ESMTPA id A07bfao55ISlKb ;
	Tue, 5 Jun 2012 23:30:37 +0200 (CEST)
Received: by britannica.bec.de (sSMTP sendmail emulation);
	Tue, 05 Jun 2012 23:30:34 +0200
Date: Tue, 5 Jun 2012 23:30:34 +0200
From: Joerg Sonnenberger <joerg@britannica.bec.de>
To: svn-src-all@freebsd.org
Message-ID: <20120605213034.GA25293@britannica.bec.de>
References: <201206042134.q54LYoVJ067685@svn.freebsd.org>
	<20120605074741.GA1391@garage.freebsd.pl>
	<20120605130922.GE13306@vniz.net>
	<20120606043731.D1124@besplex.bde.org>
	<20120605194102.GA21173@vniz.net>
	<20120606054555.U1456@besplex.bde.org>
	<20120605210154.GA22370@vniz.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20120605210154.GA22370@vniz.net>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: src-committers@FreeBSD.ORG, Pawel Jakub Dawidek <pjd@FreeBSD.ORG>,
	freebsd-arch@FreeBSD.ORG, svn-src-head@FreeBSD.ORG
Subject: Re: svn commit: r236582 - head/lib/libc/stdlib
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Jun 2012 21:30:42 -0000

On Wed, Jun 06, 2012 at 01:01:54AM +0400, Andrey Chernov wrote:
> On Wed, Jun 06, 2012 at 06:11:01AM +1000, Bruce Evans wrote:
> > This is essentially unusable (so a bad idea).  Instead of unconditionally
> > saving and restoring errno around calls to free(), portable POSIX code
> > can soon use a messy ifdef to avoid doing this in some cases, but still
> > has to do it in other cases.  The results is just bloat and complexity
> > at the source level:
> 
> It looks like they now consider POSIX as moving target where previous 
> POSIX versions compatibility is not so essential to care about much. I 
> don't have other interpretation of their decision to suddenly accept 
> free() as not modifying errno. Since they clearly indicate code 
> differences for old and new standard, they are well aware of them and of 
> resulting code bloating.

Can you please stop the unjustified rants? The "new" behavior of free(3)
doesn't break any existing code, so it is certainly compatible with
"old" free(3). The "new" behavior can be obtained easily for code that
wants to be portable to "old" implementations using the C preprocessor
and a small inline wrapper. As such, there is no code bloating.

Joerg

From owner-freebsd-arch@FreeBSD.ORG  Tue Jun  5 21:48:14 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@FreeBSD.ORG
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 36347106566C;
	Tue,  5 Jun 2012 21:48:14 +0000 (UTC) (envelope-from ache@vniz.net)
Received: from vniz.net (vniz.net [194.87.13.69])
	by mx1.freebsd.org (Postfix) with ESMTP id 9F7C68FC14;
	Tue,  5 Jun 2012 21:48:13 +0000 (UTC)
Received: from localhost (localhost [127.0.0.1])
	by vniz.net (8.14.5/8.14.5) with ESMTP id q55LmC3a023628;
	Wed, 6 Jun 2012 01:48:12 +0400 (MSK) (envelope-from ache@vniz.net)
Received: (from ache@localhost)
	by localhost (8.14.5/8.14.5/Submit) id q55LmBPR023627;
	Wed, 6 Jun 2012 01:48:11 +0400 (MSK) (envelope-from ache)
Date: Wed, 6 Jun 2012 01:48:11 +0400
From: Andrey Chernov <ache@FreeBSD.ORG>
To: Joerg Sonnenberger <joerg@britannica.bec.de>
Message-ID: <20120605214811.GA23384@vniz.net>
Mail-Followup-To: Andrey Chernov <ache@freebsd.org>,
	Joerg Sonnenberger <joerg@britannica.bec.de>,
	svn-src-all@FreeBSD.ORG, Bruce Evans <brde@optusnet.com.au>,
	svn-src-head@FreeBSD.ORG, src-committers@FreeBSD.ORG,
	Pawel Jakub Dawidek <pjd@FreeBSD.ORG>, freebsd-arch@FreeBSD.ORG
References: <201206042134.q54LYoVJ067685@svn.freebsd.org>
	<20120605074741.GA1391@garage.freebsd.pl>
	<20120605130922.GE13306@vniz.net>
	<20120606043731.D1124@besplex.bde.org>
	<20120605194102.GA21173@vniz.net>
	<20120606054555.U1456@besplex.bde.org>
	<20120605210154.GA22370@vniz.net>
	<20120605213034.GA25293@britannica.bec.de>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20120605213034.GA25293@britannica.bec.de>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: src-committers@FreeBSD.ORG, Pawel Jakub Dawidek <pjd@FreeBSD.ORG>,
	svn-src-all@FreeBSD.ORG, freebsd-arch@FreeBSD.ORG, svn-src-head@FreeBSD.ORG
Subject: Re: svn commit: r236582 - head/lib/libc/stdlib
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Jun 2012 21:48:14 -0000

On Tue, Jun 05, 2012 at 11:30:34PM +0200, Joerg Sonnenberger wrote:
> On Wed, Jun 06, 2012 at 01:01:54AM +0400, Andrey Chernov wrote:
> > On Wed, Jun 06, 2012 at 06:11:01AM +1000, Bruce Evans wrote:
> > > This is essentially unusable (so a bad idea).  Instead of unconditionally
> > > saving and restoring errno around calls to free(), portable POSIX code
> > > can soon use a messy ifdef to avoid doing this in some cases, but still
> > > has to do it in other cases.  The results is just bloat and complexity
> > > at the source level:
> > 
> > It looks like they now consider POSIX as moving target where previous 
> > POSIX versions compatibility is not so essential to care about much. I 
> > don't have other interpretation of their decision to suddenly accept 
> > free() as not modifying errno. Since they clearly indicate code 
> > differences for old and new standard, they are well aware of them and of 
> > resulting code bloating.
> 
> Can you please stop the unjustified rants? The "new" behavior of free(3)
> doesn't break any existing code, so it is certainly compatible with
> "old" free(3). The "new" behavior can be obtained easily for code that
> wants to be portable to "old" implementations using the C preprocessor
> and a small inline wrapper. As such, there is no code bloating.

Could you please read more carefully, if you decide to stay in the topic?
I already say exactly that few messages behind:

> Yes, it is safe for free() itself to save errno and still stay compliant
> with both current and upcoming POSIX and with Standard C. 

> But any code which rely on that is compliant with upcoming POSIX only. 

It means that when some program wants to conform to current POSIX and 
future POSIX, it either must save errno across the free() in any case or 
use code bloating, just reduced by CPP macro you suggest, not eliminated.

And I don't think it is good decision from POSIX side, from compatibility 
point of view. Are you pretend to attack my personal opinion or what?

-- 
http://ache.vniz.net/

From owner-freebsd-arch@FreeBSD.ORG  Tue Jun  5 22:48:06 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 96F8D1065670;
	Tue,  5 Jun 2012 22:48:06 +0000 (UTC)
	(envelope-from grehan@freebsd.org)
Received: from alto.onthenet.com.au (alto.OntheNet.com.au [203.13.68.12])
	by mx1.freebsd.org (Postfix) with ESMTP id 460D78FC0C;
	Tue,  5 Jun 2012 22:48:06 +0000 (UTC)
Received: from dommail.onthenet.com.au (dommail.OntheNet.com.au [203.13.70.57])
	by alto.onthenet.com.au (Postfix) with ESMTPS id DE6C51268B;
	Wed,  6 Jun 2012 08:41:03 +1000 (EST)
Received: from 192-168-1-100.tpgi.com.au (110-174-216-99.static.tpgi.com.au
	[110.174.216.99]) by dommail.onthenet.com.au (MOS 4.2.4-GA)
	with ESMTP id BEH97495 (AUTH peterg@ptree32.com.au);
	Wed, 6 Jun 2012 08:41:01 +1000
Message-ID: <4FCE8AF7.40606@freebsd.org>
Date: Wed, 06 Jun 2012 08:40:55 +1000
From: Peter Grehan <grehan@freebsd.org>
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US;
	rv:1.9.2.28) Gecko/20120306 Thunderbird/3.1.20
MIME-Version: 1.0
To: Attilio Rao <attilio@freebsd.org>
X-Old-Subject: Re: KTR_SPAREx
References: <86bokyvtc2.fsf@ds4.des.no>	<CAJ-FndAMsoB1RAyS-Pa1JCv7W0qsviRxtShZ3uk_Tpd+J_EBaQ@mail.gmail.com>	<CAJ-FndAadYbqiWUTupXLEcRMkYYL50Ssehi8f8vv6YXvQzy4OA@mail.gmail.com>	<CAJ-Vmo=F-=1UYcW0xBiNbjHa8CN7S=iDJ_bQDn4ESS3CumJf_A@mail.gmail.com>
	<CAJ-FndAjcfd21xwYHPrSxgz32eHp2xTGRao1Kyqx4yBZTPD94A@mail.gmail.com>
In-Reply-To: <CAJ-FndAjcfd21xwYHPrSxgz32eHp2xTGRao1Kyqx4yBZTPD94A@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Junkmail: UCE(71)
X-Junkmail-Info: FH_HELO_EQ_D_D_D_D, HELO_DYNAMIC_IPADDR2, SPF_SOFTFAIL,
	TVD_RCVD_IP
X-Junkmail-Status: score=71/51, host=dommail.onthenet.com.au
Cc: =?UTF-8?B?YXY=?= <des@des.no>, Adrian Chadd <adrian@freebsd.org>,
	=?UTF-8?B?RGFnLUVybGluZyBTbcO4cmdy?=, arch@freebsd.org
Subject: {Spam?} Re: KTR_SPAREx
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Jun 2012 22:48:06 -0000

> We very much need an much higher granularity on KTR classes and
> possibly a way to use it on-the-fly for kernel development and I think
> what I suggested earlier makes sense.

  Anyone had a look at Dragonfly's ktr ?

    http://gitweb.dragonflybsd.org/dragonfly.git/blob/HEAD:/sys/sys/ktr.h

later,

Peter.

From owner-freebsd-arch@FreeBSD.ORG  Wed Jun  6 08:24:27 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 40843106566C;
	Wed,  6 Jun 2012 08:24:27 +0000 (UTC) (envelope-from des@des.no)
Received: from smtp.des.no (smtp.des.no [194.63.250.102])
	by mx1.freebsd.org (Postfix) with ESMTP id C47898FC16;
	Wed,  6 Jun 2012 08:24:26 +0000 (UTC)
Received: from ds4.des.no (smtp.des.no [194.63.250.102])
	by smtp.des.no (Postfix) with ESMTP id E63F16395;
	Wed,  6 Jun 2012 08:24:19 +0000 (UTC)
Received: by ds4.des.no (Postfix, from userid 1001)
	id 8AC2A96CE; Wed,  6 Jun 2012 10:24:19 +0200 (CEST)
From: =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= <des@des.no>
To: Bruce Evans <brde@optusnet.com.au>
References: <CACfq090r1tWhuDkxdSZ24fwafbVKU0yduu1yV2+oYo+wwT4ipA@mail.gmail.com>
	<201206051008.29568.jhb@freebsd.org> <86haupvk4a.fsf@ds4.des.no>
	<201206051222.12627.jhb@freebsd.org>
	<20120605171446.GA28387@onelab2.iet.unipi.it>
	<20120606040931.F1050@besplex.bde.org>
Date: Wed, 06 Jun 2012 10:24:19 +0200
In-Reply-To: <20120606040931.F1050@besplex.bde.org> (Bruce Evans's message of
	"Wed, 6 Jun 2012 04:36:54 +1000 (EST)")
Message-ID: <864nqovoek.fsf@ds4.des.no>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.3 (berkeley-unix)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Mailman-Approved-At: Wed, 06 Jun 2012 12:28:08 +0000
Cc: Gianni <gianni@FreeBSD.org>, John Baldwin <jhb@FreeBSD.org>,
	Alan Cox <alc@rice.edu>, Alexander Kabaev <kan@FreeBSD.org>,
	Attilio Rao <attilio@FreeBSD.org>,
	Konstantin Belousov <kib@FreeBSD.org>, freebsd-arch@FreeBSD.org,
	Konstantin Belousov <kostikbel@gmail.com>
Subject: Re: Fast vs slow syscalls (Re: Fwd: [RFC] Kernel shared variables)
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 06 Jun 2012 08:24:27 -0000

Bruce Evans <brde@optusnet.com.au> writes:
> Dag-Erling Sm=C3=B8rgrav <des@des.no> writes:
> > getpid(): 10,000,000 iterations in 24,400 ms
> > gettimeofday(0, 0): 10,000,000 iterations in 54,104 ms
> > raise(0): 10,000,000 iterations in 1,284,593 ms
> That's one slow system or broken units.

Broken units, these are microseconds not milliseconds.  Sorry.

> After adjusting by factors of 1000 here and there, this format is still
> hard to parse.  I like the format of nsec/operation.  24400 10 million
> operations in 24400 moroseconds seems to scale to 2.44 nsec/call (if 1
> moro =3D 1 micro).  But that is impossibly fast, unless getpid() is
> inlined to a load of the shared variable (it may also need the load to
> be moved outside the loop).  I can't see any reasonable adjustment that
> gives 24.4 nsec/call.

#define ITERATIONS 10000000

    struct timeval start, end;
    int i;

    gettimeofday(&start, NULL);
    for (i =3D 0; i < ITERATIONS; ++i)
        getpid();
    gettimeofday(&end, NULL);

On Linux, gcc 4.4.6 compiles this to:

   # gettimeofday(&start, NULL)
   0x000000000040064b <+23>:    lea    -0x20(%rbp),%rax
   0x000000000040064f <+27>:    mov    $0x0,%esi
   0x0000000000400654 <+32>:    mov    %rax,%rdi
   0x0000000000400657 <+35>:    callq  0x400500 <gettimeofday@plt>

   # i =3D 0
   0x000000000040065c <+40>:    movl   $0x0,-0x4(%rbp)
   0x0000000000400663 <+47>:    jmp    0x40066e <main+58>

   # getpid()
   0x0000000000400665 <+49>:    callq  0x400520 <getpid@plt>

   # ++i
   0x000000000040066a <+54>:    addl   $0x1,-0x4(%rbp)

   # i < ITERATIONS
   0x000000000040066e <+58>:    cmpl   $0x98967f,-0x4(%rbp)
   0x0000000000400675 <+65>:    jle    0x400665 <main+49>

   # gettimeofday(&end, NULL)
   0x0000000000400677 <+67>:    lea    -0x30(%rbp),%rax
   0x000000000040067b <+71>:    mov    $0x0,%esi
   0x0000000000400680 <+76>:    mov    %rax,%rdi
   0x0000000000400683 <+79>:    callq  0x400500 <gettimeofday@plt>

The code generated by gcc 4.2.1 on FreeBSD is almost identical:

   # gettimeofday(&start, NULL)
   0x00000000004006f7 <main+23>: lea    -0x20(%rbp),%rdi
   0x00000000004006fb <main+27>: mov    $0x0,%esi
   0x0000000000400700 <main+32>: callq  0x40057c <gettimeofday@plt>

   # i =3D 0
   0x0000000000400705 <main+37>: movl   $0x0,-0x4(%rbp)
   0x000000000040070c <main+44>: jmp    0x400717 <main+55>

   # getpid()
   0x000000000040070e <main+46>: callq  0x40059c <getpid@plt>

   # ++i
   0x0000000000400713 <main+51>: addl   $0x1,-0x4(%rbp)

   # i < ITERATIONS
   0x0000000000400717 <main+55>: cmpl   $0x98967f,-0x4(%rbp)
   0x000000000040071e <main+62>: jle    0x40070e <main+46>

   # gettimeofday(&end, NULL)
   0x0000000000400720 <main+64>: lea    -0x30(%rbp),%rdi
   0x0000000000400724 <main+68>: mov    $0x0,%esi
   0x0000000000400729 <main+73>: callq  0x40057c <gettimeofday@plt>

I don't know why gcc 4.4.6 loads &start / &end into %rax before copying
it to %esi instead of loading it directly into %esi like 4.2.1 does.  I
used the same command line (gcc -Wall -Wextra syscall.c) in both cases.

DES
--=20
Dag-Erling Sm=C3=B8rgrav - des@des.no

From owner-freebsd-arch@FreeBSD.ORG  Wed Jun  6 14:05:42 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id A74881065672;
	Wed,  6 Jun 2012 14:05:42 +0000 (UTC) (envelope-from jhb@freebsd.org)
Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net
	[IPv6:2001:470:1f10:75::2])
	by mx1.freebsd.org (Postfix) with ESMTP id 7A8BD8FC08;
	Wed,  6 Jun 2012 14:05:42 +0000 (UTC)
Received: from jhbbsd.localnet (unknown [209.249.190.124])
	by bigwig.baldwin.cx (Postfix) with ESMTPSA id AE995B977;
	Wed,  6 Jun 2012 10:05:41 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: freebsd-arch@freebsd.org
Date: Wed, 6 Jun 2012 08:06:34 -0400
User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p13; KDE/4.5.5; amd64; ; )
References: <86bokyvtc2.fsf@ds4.des.no>
	<CAJ-FndAjcfd21xwYHPrSxgz32eHp2xTGRao1Kyqx4yBZTPD94A@mail.gmail.com>
	<4FCE8AF7.40606@freebsd.org>
In-Reply-To: <4FCE8AF7.40606@freebsd.org>
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="utf-8"
Content-Transfer-Encoding: 7bit
Message-Id: <201206060806.34245.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7
	(bigwig.baldwin.cx); Wed, 06 Jun 2012 10:05:41 -0400 (EDT)
Cc: Attilio Rao <attilio@freebsd.org>, av <des@des.no>,
	Adrian Chadd <adrian@freebsd.org>, Peter Grehan <grehan@freebsd.org>
Subject: Re: {Spam?} Re: KTR_SPAREx
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 06 Jun 2012 14:05:42 -0000

On Tuesday, June 05, 2012 6:40:55 pm Peter Grehan wrote:
> > We very much need an much higher granularity on KTR classes and
> > possibly a way to use it on-the-fly for kernel development and I think
> > what I suggested earlier makes sense.
> 
>   Anyone had a look at Dragonfly's ktr ?
> 
>     http://gitweb.dragonflybsd.org/dragonfly.git/blob/HEAD:/sys/sys/ktr.h

That does seem to be closer to what I would like.

-- 
John Baldwin

From owner-freebsd-arch@FreeBSD.ORG  Wed Jun  6 16:51:30 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 1A116106564A
	for <arch@freebsd.org>; Wed,  6 Jun 2012 16:51:30 +0000 (UTC)
	(envelope-from kostikbel@gmail.com)
Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200])
	by mx1.freebsd.org (Postfix) with ESMTP id 6E1118FC26
	for <arch@freebsd.org>; Wed,  6 Jun 2012 16:51:28 +0000 (UTC)
Received: from skuns.kiev.zoral.com.ua (localhost [127.0.0.1])
	by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id q56GpGbG040880
	for <arch@freebsd.org>; Wed, 6 Jun 2012 19:51:16 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1])
	by deviant.kiev.zoral.com.ua (8.14.5/8.14.5) with ESMTP id
	q56GpFvi022201
	for <arch@freebsd.org>; Wed, 6 Jun 2012 19:51:15 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
Received: (from kostik@localhost)
	by deviant.kiev.zoral.com.ua (8.14.5/8.14.5/Submit) id q56GpFQI022200
	for arch@freebsd.org; Wed, 6 Jun 2012 19:51:15 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to
	kostikbel@gmail.com using -f
Date: Wed, 6 Jun 2012 19:51:15 +0300
From: Konstantin Belousov <kostikbel@gmail.com>
To: arch@freebsd.org
Message-ID: <20120606165115.GQ85127@deviant.kiev.zoral.com.ua>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="kZU6r8y0YpRwyDfh"
Content-Disposition: inline
User-Agent: Mutt/1.4.2.3i
X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua
X-Virus-Status: Clean
X-Spam-Status: No, score=-4.0 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00
	autolearn=ham version=3.2.5
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on
	skuns.kiev.zoral.com.ua
Cc: 
Subject: Fast gettimeofday(2) and clock_gettime(2)
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 06 Jun 2012 16:51:30 -0000


--kZU6r8y0YpRwyDfh
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

A positive result from the recent flame-bait on arch@ is the working
implementation of the fast gettimeofday(2) and clock_gettime(2). The
speedup I see is around 6-7x on the 2600K. I think the speedup could
be even bigger on the previous generation of CPUs, where lock
operations and syscall entry are costlier. A sample test runs of
tools/tools/syscall_timing are presented at the end of message.

Patch finds yet another use for the shared page, exporting
time-keeping information for the binuptime(9) algorithm and
re-implementing binuptime(9) in userspace. Kernel directs usermode
whether the rdtsc instruction can be used, there is a global override
sysctl kern.timecounter.fast_gettime to turn it off regardless of
hardware capabilities.

The whole struct vdso_timekeep is versioned, as well as individual
struct vdso_timehands, which should allow to implement future
algorithms without breaking binary compatibility.  The code is
structured to eventually move __vdso_* functions out of libc into
VDSO, if it ever materialize. This desire explains vdso prefix and
header file names.

I implemented and tested the userspace timecounter on amd64, both for
64 and 32 bit binaries, it would probably work for i386 too. Other
architecture maintainers are welcome to add neccessary support there.
You need to provide machine/vdso.h header with definitions of
VDSO_TIMEHANDS_MD fields for struct vdso_timehands, which should
provide information for userspace to implement fast
tc_get_timecount(). The fields are filled in per-arch
cpu_fill_vdso_timehands(9) function. If your architecture support
32bit compat, there are cpu_fill_vdso_timehands32(9) and
VDSO_TIMEHANDS_MD32 to code as well. After that, the
lib/libc/<arch>/sys/__vdso_gettc.c should contain an implemention of
__vdso_gettc() function, exact analogue of tc_get_timecount().

Another potential improvement for the patch is to start using rdtscp
instruction on the CPUs which support it. Then we could correct rdtsc
skews between packages, provided kernel starts maintaining this
information, instead of refusing to activate tsc timecounter. In
particular, on one Nehalem box I see the rdtsc SMP test failing, but
Nehalems do have useful rdtsc, so it is could be fixed later.

Patch is available at http://people.freebsd.org/~kib/misc/moronix.2.patch
It is not a commit candidate yet, since non-x86 architectures are not
handled even at compilation, and i386 is not tested.

sandy% /usr/home/pooma/build/bsd/DEV/stuff/tests/syscall_timing_32 gettimeofday
Clock resolution: 0.000000076
test    loop    time    iterations      periteration
gettimeofday    0       1.000994225     21623297        0.000000046
gettimeofday    1       1.000994980     21596492        0.000000046
gettimeofday    2       1.001070595     21598326        0.000000046
gettimeofday    3       1.000922308     21581398        0.000000046
gettimeofday    4       1.000984264     21605539        0.000000046
gettimeofday    5       1.000989697     21601659        0.000000046
gettimeofday    6       1.000996261     21598385        0.000000046
gettimeofday    7       1.001002223     21583933        0.000000046
gettimeofday    8       1.000985847     21599442        0.000000046
gettimeofday    9       1.000994977     21600935        0.000000046
sandy% sudo sysctl kern.timecounter.fast_gettime=0                            ~
kern.timecounter.fast_gettime: 1 -> 0
sandy% /usr/home/pooma/build/bsd/DEV/stuff/tests/syscall_timing_32 gettimeofday
Clock resolution: 0.000000076
test    loop    time    iterations      periteration
gettimeofday    0       1.001002747     3219274 0.000000310
gettimeofday    1       1.000971052     3220793 0.000000310
gettimeofday    2       1.001067494     3220768 0.000000310
gettimeofday    3       1.000929999     3220812 0.000000310
gettimeofday    4       1.000996106     3217503 0.000000311
gettimeofday    5       1.001058438     3220346 0.000000310
gettimeofday    6       1.000911510     3217308 0.000000311
gettimeofday    7       1.001085906     3220128 0.000000310
gettimeofday    8       1.000920338     3216582 0.000000311
gettimeofday    9       1.000983577     3219559 0.000000310


--kZU6r8y0YpRwyDfh
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (FreeBSD)

iEYEARECAAYFAk/PioMACgkQC3+MBN1Mb4jPzwCfS14QKbr3jY5UhMGJDowJalb/
NrAAoNhv10qQJOytIVY46eOp5IZ3Z9s1
=D2Fs
-----END PGP SIGNATURE-----

--kZU6r8y0YpRwyDfh--

From owner-freebsd-arch@FreeBSD.ORG  Wed Jun  6 17:06:03 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 7486A1065686
	for <arch@freebsd.org>; Wed,  6 Jun 2012 17:06:03 +0000 (UTC)
	(envelope-from luigi@onelab2.iet.unipi.it)
Received: from onelab2.iet.unipi.it (onelab2.iet.unipi.it [131.114.59.238])
	by mx1.freebsd.org (Postfix) with ESMTP id E91A28FC14
	for <arch@freebsd.org>; Wed,  6 Jun 2012 17:06:02 +0000 (UTC)
Received: by onelab2.iet.unipi.it (Postfix, from userid 275)
	id F3CA17300A; Wed,  6 Jun 2012 19:24:39 +0200 (CEST)
Date: Wed, 6 Jun 2012 19:24:39 +0200
From: Luigi Rizzo <rizzo@iet.unipi.it>
To: Konstantin Belousov <kostikbel@gmail.com>
Message-ID: <20120606172439.GA42362@onelab2.iet.unipi.it>
References: <20120606165115.GQ85127@deviant.kiev.zoral.com.ua>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20120606165115.GQ85127@deviant.kiev.zoral.com.ua>
User-Agent: Mutt/1.4.2.3i
Cc: arch@freebsd.org
Subject: Re: Fast gettimeofday(2) and clock_gettime(2)
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 06 Jun 2012 17:06:03 -0000

On Wed, Jun 06, 2012 at 07:51:15PM +0300, Konstantin Belousov wrote:
> A positive result from the recent flame-bait on arch@ is the working
> implementation of the fast gettimeofday(2) and clock_gettime(2). The

great job! congratulations and thanks for this work!

cheers
luigi

> speedup I see is around 6-7x on the 2600K. I think the speedup could
> be even bigger on the previous generation of CPUs, where lock
> operations and syscall entry are costlier. A sample test runs of
> tools/tools/syscall_timing are presented at the end of message.
> 
> Patch finds yet another use for the shared page, exporting
> time-keeping information for the binuptime(9) algorithm and
> re-implementing binuptime(9) in userspace. Kernel directs usermode
> whether the rdtsc instruction can be used, there is a global override
> sysctl kern.timecounter.fast_gettime to turn it off regardless of
> hardware capabilities.
> 
> The whole struct vdso_timekeep is versioned, as well as individual
> struct vdso_timehands, which should allow to implement future
> algorithms without breaking binary compatibility.  The code is
> structured to eventually move __vdso_* functions out of libc into
> VDSO, if it ever materialize. This desire explains vdso prefix and
> header file names.
> 
> I implemented and tested the userspace timecounter on amd64, both for
> 64 and 32 bit binaries, it would probably work for i386 too. Other
> architecture maintainers are welcome to add neccessary support there.
> You need to provide machine/vdso.h header with definitions of
> VDSO_TIMEHANDS_MD fields for struct vdso_timehands, which should
> provide information for userspace to implement fast
> tc_get_timecount(). The fields are filled in per-arch
> cpu_fill_vdso_timehands(9) function. If your architecture support
> 32bit compat, there are cpu_fill_vdso_timehands32(9) and
> VDSO_TIMEHANDS_MD32 to code as well. After that, the
> lib/libc/<arch>/sys/__vdso_gettc.c should contain an implemention of
> __vdso_gettc() function, exact analogue of tc_get_timecount().
> 
> Another potential improvement for the patch is to start using rdtscp
> instruction on the CPUs which support it. Then we could correct rdtsc
> skews between packages, provided kernel starts maintaining this
> information, instead of refusing to activate tsc timecounter. In
> particular, on one Nehalem box I see the rdtsc SMP test failing, but
> Nehalems do have useful rdtsc, so it is could be fixed later.
> 
> Patch is available at http://people.freebsd.org/~kib/misc/moronix.2.patch
> It is not a commit candidate yet, since non-x86 architectures are not
> handled even at compilation, and i386 is not tested.
> 
> sandy% /usr/home/pooma/build/bsd/DEV/stuff/tests/syscall_timing_32 gettimeofday
> Clock resolution: 0.000000076
> test    loop    time    iterations      periteration
> gettimeofday    0       1.000994225     21623297        0.000000046
> gettimeofday    1       1.000994980     21596492        0.000000046
> gettimeofday    2       1.001070595     21598326        0.000000046
> gettimeofday    3       1.000922308     21581398        0.000000046
> gettimeofday    4       1.000984264     21605539        0.000000046
> gettimeofday    5       1.000989697     21601659        0.000000046
> gettimeofday    6       1.000996261     21598385        0.000000046
> gettimeofday    7       1.001002223     21583933        0.000000046
> gettimeofday    8       1.000985847     21599442        0.000000046
> gettimeofday    9       1.000994977     21600935        0.000000046
> sandy% sudo sysctl kern.timecounter.fast_gettime=0                            ~
> kern.timecounter.fast_gettime: 1 -> 0
> sandy% /usr/home/pooma/build/bsd/DEV/stuff/tests/syscall_timing_32 gettimeofday
> Clock resolution: 0.000000076
> test    loop    time    iterations      periteration
> gettimeofday    0       1.001002747     3219274 0.000000310
> gettimeofday    1       1.000971052     3220793 0.000000310
> gettimeofday    2       1.001067494     3220768 0.000000310
> gettimeofday    3       1.000929999     3220812 0.000000310
> gettimeofday    4       1.000996106     3217503 0.000000311
> gettimeofday    5       1.001058438     3220346 0.000000310
> gettimeofday    6       1.000911510     3217308 0.000000311
> gettimeofday    7       1.001085906     3220128 0.000000310
> gettimeofday    8       1.000920338     3216582 0.000000311
> gettimeofday    9       1.000983577     3219559 0.000000310
> 


From owner-freebsd-arch@FreeBSD.ORG  Wed Jun  6 18:23:54 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id B0E701065672
	for <freebsd-arch@freebsd.org>; Wed,  6 Jun 2012 18:23:54 +0000 (UTC)
	(envelope-from jhb@freebsd.org)
Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net
	[IPv6:2001:470:1f10:75::2])
	by mx1.freebsd.org (Postfix) with ESMTP id 86DD38FC14
	for <freebsd-arch@freebsd.org>; Wed,  6 Jun 2012 18:23:54 +0000 (UTC)
Received: from jhbbsd.localnet (unknown [209.249.190.124])
	by bigwig.baldwin.cx (Postfix) with ESMTPSA id EC55EB918;
	Wed,  6 Jun 2012 14:23:53 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: freebsd-arch@freebsd.org
Date: Wed, 6 Jun 2012 14:23:53 -0400
User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p13; KDE/4.5.5; amd64; ; )
References: <20120606165115.GQ85127@deviant.kiev.zoral.com.ua>
In-Reply-To: <20120606165115.GQ85127@deviant.kiev.zoral.com.ua>
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="iso-8859-15"
Content-Transfer-Encoding: 7bit
Message-Id: <201206061423.53179.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7
	(bigwig.baldwin.cx); Wed, 06 Jun 2012 14:23:54 -0400 (EDT)
Cc: Konstantin Belousov <kostikbel@gmail.com>
Subject: Re: Fast gettimeofday(2) and clock_gettime(2)
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 06 Jun 2012 18:23:54 -0000

On Wednesday, June 06, 2012 12:51:15 pm Konstantin Belousov wrote:
> A positive result from the recent flame-bait on arch@ is the working
> implementation of the fast gettimeofday(2) and clock_gettime(2). The
> speedup I see is around 6-7x on the 2600K. I think the speedup could
> be even bigger on the previous generation of CPUs, where lock
> operations and syscall entry are costlier. A sample test runs of
> tools/tools/syscall_timing are presented at the end of message.

In general this looks good but I see a few nits / races:

1) You don't follow the model of clearing tk_current to 0 while you
   are updating the structure that the in-kernel timecounter code
   uses.  This also means you have to avoid using a tk_current of 0
   and that userland has to keep spinning as long as tk_current is 0.
   Without this I believe userland can read a partially updated
   structure.

2) You read tk->tk_boottime without the tk_current protection in your
   non-uptime routines.  This is racey as the kernel alters the
   boottime when it skews time for large adjustments from ntp, etc.
   To be really safe you need to read the boottime inside the loop
   into a local variable and perhaps use a boolean parameter to decide
   if you should add it to the computed uptime.

> sandy% /usr/home/pooma/build/bsd/DEV/stuff/tests/syscall_timing_32 
gettimeofday
> Clock resolution: 0.000000076
> test    loop    time    iterations      periteration
> gettimeofday    0       1.000994225     21623297        0.000000046
> gettimeofday    1       1.000994980     21596492        0.000000046
> gettimeofday    2       1.001070595     21598326        0.000000046
> gettimeofday    3       1.000922308     21581398        0.000000046
> gettimeofday    4       1.000984264     21605539        0.000000046
> gettimeofday    5       1.000989697     21601659        0.000000046
> gettimeofday    6       1.000996261     21598385        0.000000046
> gettimeofday    7       1.001002223     21583933        0.000000046
> gettimeofday    8       1.000985847     21599442        0.000000046
> gettimeofday    9       1.000994977     21600935        0.000000046
> sandy% sudo sysctl kern.timecounter.fast_gettime=0

I think this means you can call gettimeofday() in about 46 ns now
vs 310 the "old" way?

-- 
John Baldwin

From owner-freebsd-arch@FreeBSD.ORG  Wed Jun  6 19:03:44 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 3B7FA106564A;
	Wed,  6 Jun 2012 19:03:44 +0000 (UTC)
	(envelope-from iwasaki@jp.FreeBSD.org)
Received: from locore.org (ns01.locore.org [218.45.21.227])
	by mx1.freebsd.org (Postfix) with ESMTP id DCF5F8FC0A;
	Wed,  6 Jun 2012 19:03:43 +0000 (UTC)
Received: from localhost (celeron.v4.locore.org [192.168.0.10])
	by locore.org (8.14.5/8.14.5/iwasaki) with ESMTP/inet id q56J3gH9050606;
	Thu, 7 Jun 2012 04:03:42 +0900 (JST)
	(envelope-from iwasaki@jp.FreeBSD.org)
Date: Thu, 07 Jun 2012 04:03:42 +0900 (JST)
Message-Id: <20120607.040342.73368798.iwasaki@jp.FreeBSD.org>
To: avg@FreeBSD.org
From: Mitsuru IWASAKI <iwasaki@jp.FreeBSD.org>
In-Reply-To: <4FCBBEDD.5000604@FreeBSD.org>
References: <4FCB0FE5.4050607@FreeBSD.org>
	<20120603.234243.28389486.iwasaki@jp.FreeBSD.org>
	<4FCBBEDD.5000604@FreeBSD.org>
X-Mailer: Mew version 3.3 on Emacs 20.7 / Mule 4.0 (HANANOEN)
Mime-Version: 1.0
Content-Type: Text/Plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Cc: attilio@FreeBSD.org, freebsd-acpi@FreeBSD.org, freebsd-arch@FreeBSD.org
Subject: Re: cpu stopping
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 06 Jun 2012 19:03:44 -0000

Hi,

I've created the patches of experimental implementation based on
discussion so far.

http://people.freebsd.org/~iwasaki/acpi/cpustop_hook-20120606.diff

In acpi_wakeup.c, cpususpend_handler() and susppcbs are replaced with
cpustop_handler() and stoppcbs.

This is for RELENG_9 and only for i386 but I think it's enough for the
start.


From: Andriy Gapon <avg@FreeBSD.org>
Subject: Re: cpu stopping
Date: Sun, 03 Jun 2012 22:45:33 +0300
Message-ID: <4FCBBEDD.5000604@FreeBSD.org>

> > Never mind :) What I'm trying to do in the patches is just to unify
> > amd64/i386 independent part (acpi_wakeup.c) for the code maintenance,
> > so please let's commit it first, then start re-design the
> > cpususpend_handler().
> 
> In no way I am trying to delay your work :)
> Just shared my view on the design of cpu stopping code.

I got it :)

> >> My view of how this should work is:
> >> - there can be only one master CPU that controls all other (slave) CPUs
> >> - the master sets entry and exit hooks
> > 
> > Entry hook for suspending might be
> > ----
> >                 ctx_fpusave(suspfpusave[cpu]);
> >                 wbinvd();
> >                 CPU_SET_ATOMIC(cpu, &stopped_cpus);
> > ----
> > 
> > and for stopping is
> > ----
> >         /* Indicate that we are stopped */
> >         CPU_SET_ATOMIC(cpu, &stopped_cpus);
> > ----
> > 
> > Correct?
> 
> Yes.  The only nit is that CPU_SET_ATOMIC(cpu, &stopped_cpus) could be part of
> the wait loop prologue.  No need to duplicate it in each hook.

OK, I did so.

I hope the patch is not far from your idea.

Thanks!

From owner-freebsd-arch@FreeBSD.ORG  Wed Jun  6 19:50:07 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 43C171065670;
	Wed,  6 Jun 2012 19:50:07 +0000 (UTC) (envelope-from imp@bsdimp.com)
Received: from harmony.bsdimp.com (bsdimp.com [199.45.160.85])
	by mx1.freebsd.org (Postfix) with ESMTP id D355F8FC16;
	Wed,  6 Jun 2012 19:50:06 +0000 (UTC)
Received: from [10.30.101.53] ([209.117.142.2]) (authenticated bits=0)
	by harmony.bsdimp.com (8.14.4/8.14.3) with ESMTP id q56JjBx3036296
	(version=TLSv1/SSLv3 cipher=DHE-DSS-AES128-SHA bits=128 verify=NO);
	Wed, 6 Jun 2012 13:45:12 -0600 (MDT) (envelope-from imp@bsdimp.com)
Mime-Version: 1.0 (Apple Message framework v1084)
Content-Type: text/plain; charset=us-ascii
From: Warner Losh <imp@bsdimp.com>
In-Reply-To: <201206061423.53179.jhb@freebsd.org>
Date: Wed, 6 Jun 2012 13:45:05 -0600
Content-Transfer-Encoding: 7bit
Message-Id: <78461459-8D90-4AD1-9983-3522E4DA5816@bsdimp.com>
References: <20120606165115.GQ85127@deviant.kiev.zoral.com.ua>
	<201206061423.53179.jhb@freebsd.org>
To: John Baldwin <jhb@FreeBSD.org>
X-Mailer: Apple Mail (2.1084)
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1
	(harmony.bsdimp.com [10.0.0.6]);
	Wed, 06 Jun 2012 13:45:13 -0600 (MDT)
Cc: Konstantin Belousov <kostikbel@gmail.com>, freebsd-arch@FreeBSD.org
Subject: Re: Fast gettimeofday(2) and clock_gettime(2)
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 06 Jun 2012 19:50:07 -0000


On Jun 6, 2012, at 12:23 PM, John Baldwin wrote:
> 2) You read tk->tk_boottime without the tk_current protection in your
>   non-uptime routines.  This is racey as the kernel alters the
>   boottime when it skews time for large adjustments from ntp, etc.

One of the 'etc' is leap seconds.

Warner


From owner-freebsd-arch@FreeBSD.ORG  Wed Jun  6 20:16:20 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 01E35106567C;
	Wed,  6 Jun 2012 20:16:20 +0000 (UTC) (envelope-from avg@FreeBSD.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
	by mx1.freebsd.org (Postfix) with ESMTP id D79468FC1C;
	Wed,  6 Jun 2012 20:16:18 +0000 (UTC)
Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua
	[212.40.38.100])
	by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id XAA09341;
	Wed, 06 Jun 2012 23:16:11 +0300 (EEST)
	(envelope-from avg@FreeBSD.org)
Received: from localhost ([127.0.0.1])
	by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
	id 1ScMeJ-0009IJ-3l; Wed, 06 Jun 2012 23:16:11 +0300
Message-ID: <4FCFBA89.9030105@FreeBSD.org>
Date: Wed, 06 Jun 2012 23:16:09 +0300
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:12.0) Gecko/20120503 Thunderbird/12.0.1
MIME-Version: 1.0
To: Mitsuru IWASAKI <iwasaki@jp.FreeBSD.org>
References: <4FCB0FE5.4050607@FreeBSD.org>
	<20120603.234243.28389486.iwasaki@jp.FreeBSD.org>
	<4FCBBEDD.5000604@FreeBSD.org>
	<20120607.040342.73368798.iwasaki@jp.FreeBSD.org>
In-Reply-To: <20120607.040342.73368798.iwasaki@jp.FreeBSD.org>
X-Enigmail-Version: 1.5pre
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: attilio@FreeBSD.org, freebsd-acpi@FreeBSD.org, freebsd-arch@FreeBSD.org
Subject: Re: cpu stopping
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 06 Jun 2012 20:16:20 -0000

on 06/06/2012 22:03 Mitsuru IWASAKI said the following:
> Hi,
> 
> I've created the patches of experimental implementation based on
> discussion so far.
> 
> http://people.freebsd.org/~iwasaki/acpi/cpustop_hook-20120606.diff
> 
> In acpi_wakeup.c, cpususpend_handler() and susppcbs are replaced with
> cpustop_handler() and stoppcbs.
> 
> This is for RELENG_9 and only for i386 but I think it's enough for the
> start.

I think that there is no need for DPCPU here.  All (affected) CPUs should see
the same hook, IMO.  At least I can not imagine the case where something else
would be required.
Also, it might make sense to provide a void pointer as a potential context for
for the context.  As Attilio has said before this has many similarities to what
smp_rendezvous does, just for different kind of situations.

> From: Andriy Gapon <avg@FreeBSD.org>
> Subject: Re: cpu stopping
> Date: Sun, 03 Jun 2012 22:45:33 +0300
> Message-ID: <4FCBBEDD.5000604@FreeBSD.org>
> 
>>> Never mind :) What I'm trying to do in the patches is just to unify
>>> amd64/i386 independent part (acpi_wakeup.c) for the code maintenance,
>>> so please let's commit it first, then start re-design the
>>> cpususpend_handler().
>>
>> In no way I am trying to delay your work :)
>> Just shared my view on the design of cpu stopping code.
> 
> I got it :)
> 
>>>> My view of how this should work is:
>>>> - there can be only one master CPU that controls all other (slave) CPUs
>>>> - the master sets entry and exit hooks
>>>
>>> Entry hook for suspending might be
>>> ----
>>>                 ctx_fpusave(suspfpusave[cpu]);
>>>                 wbinvd();
>>>                 CPU_SET_ATOMIC(cpu, &stopped_cpus);
>>> ----
>>>
>>> and for stopping is
>>> ----
>>>         /* Indicate that we are stopped */
>>>         CPU_SET_ATOMIC(cpu, &stopped_cpus);
>>> ----
>>>
>>> Correct?
>>
>> Yes.  The only nit is that CPU_SET_ATOMIC(cpu, &stopped_cpus) could be part of
>> the wait loop prologue.  No need to duplicate it in each hook.
> 
> OK, I did so.
> 
> I hope the patch is not far from your idea.
> 
> Thanks!


-- 
Andriy Gapon

From owner-freebsd-arch@FreeBSD.ORG  Wed Jun  6 20:59:46 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 56E061065672;
	Wed,  6 Jun 2012 20:59:46 +0000 (UTC)
	(envelope-from kostikbel@gmail.com)
Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200])
	by mx1.freebsd.org (Postfix) with ESMTP id CB9678FC18;
	Wed,  6 Jun 2012 20:59:45 +0000 (UTC)
Received: from skuns.kiev.zoral.com.ua (localhost [127.0.0.1])
	by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id q56Kxct6080208;
	Wed, 6 Jun 2012 23:59:38 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1])
	by deviant.kiev.zoral.com.ua (8.14.5/8.14.5) with ESMTP id
	q56KxcOL023466; Wed, 6 Jun 2012 23:59:38 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
Received: (from kostik@localhost)
	by deviant.kiev.zoral.com.ua (8.14.5/8.14.5/Submit) id q56KxckZ023465; 
	Wed, 6 Jun 2012 23:59:38 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to
	kostikbel@gmail.com using -f
Date: Wed, 6 Jun 2012 23:59:38 +0300
From: Konstantin Belousov <kostikbel@gmail.com>
To: John Baldwin <jhb@freebsd.org>
Message-ID: <20120606205938.GS85127@deviant.kiev.zoral.com.ua>
References: <20120606165115.GQ85127@deviant.kiev.zoral.com.ua>
	<201206061423.53179.jhb@freebsd.org>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="qZLIv6EoKi7YuaSc"
Content-Disposition: inline
In-Reply-To: <201206061423.53179.jhb@freebsd.org>
User-Agent: Mutt/1.4.2.3i
X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua
X-Virus-Status: Clean
X-Spam-Status: No, score=-4.0 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00
	autolearn=ham version=3.2.5
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on
	skuns.kiev.zoral.com.ua
Cc: freebsd-arch@freebsd.org
Subject: Re: Fast gettimeofday(2) and clock_gettime(2)
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 06 Jun 2012 20:59:46 -0000


--qZLIv6EoKi7YuaSc
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Wed, Jun 06, 2012 at 02:23:53PM -0400, John Baldwin wrote:
> On Wednesday, June 06, 2012 12:51:15 pm Konstantin Belousov wrote:
> > A positive result from the recent flame-bait on arch@ is the working
> > implementation of the fast gettimeofday(2) and clock_gettime(2). The
> > speedup I see is around 6-7x on the 2600K. I think the speedup could
> > be even bigger on the previous generation of CPUs, where lock
> > operations and syscall entry are costlier. A sample test runs of
> > tools/tools/syscall_timing are presented at the end of message.
>=20
> In general this looks good but I see a few nits / races:
>=20
> 1) You don't follow the model of clearing tk_current to 0 while you
>    are updating the structure that the in-kernel timecounter code
>    uses.  This also means you have to avoid using a tk_current of 0
>    and that userland has to keep spinning as long as tk_current is 0.
>    Without this I believe userland can read a partially updated
>    structure.
I changed the code to be much more similar to the kern_tc.c. I (re)added
the generation field, which is set to 0 upon kernel touching timehands.

I think this can only happen if tc_windups occurs quite close in
succession, or usermode thread is suspended for long enough. BTW,
even generation could loop back to the previous value if thread is
stopped.

There was apparently another issue with version 2. The bcopy() is not
atomic, so potentially libc could read wrong tk_current. I redid
the interface to write to the shared page to allow use of real atomics.

>=20
> 2) You read tk->tk_boottime without the tk_current protection in your
>    non-uptime routines.  This is racey as the kernel alters the
>    boottime when it skews time for large adjustments from ntp, etc.
>    To be really safe you need to read the boottime inside the loop
>    into a local variable and perhaps use a boolean parameter to decide
>    if you should add it to the computed uptime.
I moved the bootime to timehands from timekeep, thank you for the
clarification.

>=20
> > sandy% /usr/home/pooma/build/bsd/DEV/stuff/tests/syscall_timing_32=20
> gettimeofday
> > Clock resolution: 0.000000076
> > test    loop    time    iterations      periteration
> > gettimeofday    0       1.000994225     21623297        0.000000046
> > gettimeofday    1       1.000994980     21596492        0.000000046
> > gettimeofday    2       1.001070595     21598326        0.000000046
> > gettimeofday    3       1.000922308     21581398        0.000000046
> > gettimeofday    4       1.000984264     21605539        0.000000046
> > gettimeofday    5       1.000989697     21601659        0.000000046
> > gettimeofday    6       1.000996261     21598385        0.000000046
> > gettimeofday    7       1.001002223     21583933        0.000000046
> > gettimeofday    8       1.000985847     21599442        0.000000046
> > gettimeofday    9       1.000994977     21600935        0.000000046
> > sandy% sudo sysctl kern.timecounter.fast_gettime=3D0
>=20
> I think this means you can call gettimeofday() in about 46 ns now
> vs 310 the "old" way?

Yes. This is for 32bit, while for 64 bit binaries the numbers are
155->25 ns on the same hw.

Updated patch is at=20
http://people.freebsd.org/~kib/misc/moronix.3.patch

--qZLIv6EoKi7YuaSc
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (FreeBSD)

iEYEARECAAYFAk/PxLkACgkQC3+MBN1Mb4jxiwCfcpH7xT549HAK2pcuZFMjR6V7
pjsAoKXKsHQmD+JU5VnKmiUXve1yOlcH
=U/tF
-----END PGP SIGNATURE-----

--qZLIv6EoKi7YuaSc--

From owner-freebsd-arch@FreeBSD.ORG  Wed Jun  6 22:48:21 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 17CFB106566C
	for <freebsd-arch@FreeBSD.org>; Wed,  6 Jun 2012 22:48:21 +0000 (UTC)
	(envelope-from avg@FreeBSD.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
	by mx1.freebsd.org (Postfix) with ESMTP id 42ADF8FC0A
	for <freebsd-arch@FreeBSD.org>; Wed,  6 Jun 2012 22:48:20 +0000 (UTC)
Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua
	[212.40.38.100])
	by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id BAA11256;
	Thu, 07 Jun 2012 01:48:18 +0300 (EEST)
	(envelope-from avg@FreeBSD.org)
Received: from localhost ([127.0.0.1])
	by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
	id 1ScP1V-0009SO-WE; Thu, 07 Jun 2012 01:48:18 +0300
Message-ID: <4FCFDE30.4020109@FreeBSD.org>
Date: Thu, 07 Jun 2012 01:48:16 +0300
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:12.0) Gecko/20120503 Thunderbird/12.0.1
MIME-Version: 1.0
To: freebsd-arch@FreeBSD.org
References: <4FAC3EAB.6050303@delphij.net> <861umkurt8.fsf@ds4.des.no>
	<CAJ-VmokY+pgcq999NHShbq-3rK3=oeWT2WY7NmTvVdXOHZJhdg@mail.gmail.com>
	<CAF6rxgmDW21aPJ5Mp6Tbk1z02ivw4UPhSaNEX+Wiu7O0v13skA@mail.gmail.com>
	<20120517055425.GA802@infradead.org>
	<4FC762DD.90101@FreeBSD.org> <4FC81D9C.2080801@FreeBSD.org>
	<4FC8E29F.2010806@shatow.net> <4FC95A10.7000806@freebsd.org>
	<4FC9F94B.8060708@FreeBSD.org>
In-Reply-To: <4FC9F94B.8060708@FreeBSD.org>
X-Enigmail-Version: 1.5pre
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: d@delphij.net
Subject: Re: Allow small amount of memory be mlock()'ed by unprivileged
	process?
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 06 Jun 2012 22:48:21 -0000

on 02/06/2012 14:30 Andriy Gapon said the following:
[snip]
> Some further technical observations:
> o  I was overly optimistic about _full_ support for RLIMIT_MEMLOCK - mlockall()
> doesn't support itat the moment and I am not sure if it is easy to implement the
> support for the MCL_FUTURE case.
> 
> o  Currently the default class in default login.conf has memorylocked=unlimited
> - not very smart.
> 
> o  There is also vm.max_wired sysctl (with no equivalent tunable), which
> specifies number of _pages_ that can be wired system wide (by both kernel and
> userland).  But note that the limit applies only to userland requests, the
> kernel is allowed to wire new pages even when the limit is exceeded.  By default
> the limit is set to 1/3 of available pages.
> So watch out for this limit when using ZFS, ZFS can easily starve userland.
> 
> o  I've just discovered :-) that we also have RCTL/RACCT framework (not enabled
> by default) aka "Resource Accounting" / "Resource Limits", which seems to
> parallel the conventional limits in many categories including the locked memory.
>  Not sure why we have that and if the interactions between conventional limits,
> resource limits and privileges would be easy to untangle.
[snip]

In case someone still follows this thread, here is another observation.
While non-privileged users can not explicitly wire/lock memory for their private
use, they are still subject to RLIMIT_MEMLOCK accounting.
E.g. sysctl system call may temporarily wire userspace buffers and that wiring
is checked against the RLIMIT_MEMLOCK limit.  And some sysctl calls may require
quite large buffer sizes, e.g. OIDs under kern.proc when used by e.g. fstat.
I observed the cases when the sysctl wired more than 128KB of memory.  I think
that on larger/busier systems it could be even more.

So, on one hand this vslock-against-RLIMIT_MEMLOCK check is good because it
protects against resource starvation via abuse.
On the other hand, I am not sure if this is a proper use of RLIMIT_MEMLOCK.
After all, vslock-ing by e.g. sysctl is an implementation detail.  The memory is
wired because of how kernel does things, not because a user/process wants to
wire that memory.  Besides the wiring is temporary.  So I am not sure that it is
fair to charge that kind of memory wiring to userland.

In any case, beware that if you decide to lower "locked-in-memory size" limit
(RLIMIT_MEMLOCK), then some sysctls and the tools using them (like fstat) may
start failing.

-- 
Andriy Gapon

From owner-freebsd-arch@FreeBSD.ORG  Thu Jun  7 01:42:06 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 90CC2106566C;
	Thu,  7 Jun 2012 01:42:06 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from fallbackmx07.syd.optusnet.com.au
	(fallbackmx07.syd.optusnet.com.au [211.29.132.9])
	by mx1.freebsd.org (Postfix) with ESMTP id 0B0948FC14;
	Thu,  7 Jun 2012 01:42:05 +0000 (UTC)
Received: from mail34.syd.optusnet.com.au (mail34.syd.optusnet.com.au
	[211.29.133.218])
	by fallbackmx07.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	q571aGmt020643; Thu, 7 Jun 2012 11:36:16 +1000
Received: from c122-106-171-232.carlnfd1.nsw.optusnet.com.au
	(c122-106-171-232.carlnfd1.nsw.optusnet.com.au [122.106.171.232])
	by mail34.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	q571ZnIp015171
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Thu, 7 Jun 2012 11:35:52 +1000
Date: Thu, 7 Jun 2012 11:35:49 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: John Baldwin <jhb@FreeBSD.org>
In-Reply-To: <201206061423.53179.jhb@freebsd.org>
Message-ID: <20120607084229.C1474@besplex.bde.org>
References: <20120606165115.GQ85127@deviant.kiev.zoral.com.ua>
	<201206061423.53179.jhb@freebsd.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: Konstantin Belousov <kostikbel@gmail.com>, freebsd-arch@FreeBSD.org
Subject: Re: Fast gettimeofday(2) and clock_gettime(2)
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 07 Jun 2012 01:42:06 -0000

On Wed, 6 Jun 2012, John Baldwin wrote:

> On Wednesday, June 06, 2012 12:51:15 pm Konstantin Belousov wrote:
>> A positive result from the recent flame-bait on arch@ is the working
>> implementation of the fast gettimeofday(2) and clock_gettime(2). The
>> speedup I see is around 6-7x on the 2600K. I think the speedup could
>> be even bigger on the previous generation of CPUs, where lock
>> operations and syscall entry are costlier. A sample test runs of
>> tools/tools/syscall_timing are presented at the end of message.
>
> In general this looks good but I see a few nits / races:

It is awefully (sic) complete and large.  The patch is almost twice as
large as the entire kern_tc.c in FreeBSD-4, and that was quite bloated.

> 1) You don't follow the model of clearing tk_current to 0 while you
>   are updating the structure that the in-kernel timecounter code
>   uses.  This also means you have to avoid using a tk_current of 0
>   and that userland has to keep spinning as long as tk_current is 0.
>   Without this I believe userland can read a partially updated
>   structure.

I thought that too at first, but after looking at the patch decided
that it may be correct, but is too hard for me to understand.
Urk, we both missed that tk_current is an index into the timehands
array, so it cannot act as a generation count and it seems to be harder
to lock.

> 2) You read tk->tk_boottime without the tk_current protection in your
>   non-uptime routines.  This is racey as the kernel alters the
>   boottime when it skews time for large adjustments from ntp, etc.
>   To be really safe you need to read the boottime inside the loop
>   into a local variable and perhaps use a boolean parameter to decide
>   if you should add it to the computed uptime.

The critical problems seem to be mostly here:

+static void
+timehands_update(void *arg)
+{
+	struct sysentvec *sv;
+	struct vdso_timehands th;
+	uint32_t enabled, idx;
+
+	sv = arg;
+	sx_xlock(&shared_page_alloc_sx);
+	enabled = tc_fill_vdso_timehands(&th);

I think tc_windup() should just write to the shared page using the same
delicate order that it uses for its variables now, but there are callbacks
and fill functions like this.

This fill function seems to be OK, since it copies to a local variable
and checks th_generation to get a consistent snapshot.  Now we have to
copy it to the shared page atomically.

+	idx = sv->sv_timekeep_curr;
+	if (++idx >= VDSO_TH_NUM)
+		idx = 0;
+	sv->sv_timekeep_curr = idx;
+	if (enabled) {
+		shared_page_write(sv->sv_timekeep_off +
+		    sizeof(struct vdso_timekeep) + idx *
+		    sizeof(struct vdso_timehands), sizeof(th), &th);
+	}

Now I seem to understand this.  It has race (1) as you said.  Problems are
limited by it copying to (previously) old timehands which is unlikely
to be in use.  The user must have grabbed the pointer to them 10-100 msec
ago and been preempted and still be using it.  But this is precisely the
corner case that the generation count is supposed to fix.
shared_page_write() is essentially bcopy(), so it writes non-atomically
in any order.

+	shared_page_write(sv->sv_timekeep_off + offsetof(struct vdso_timekeep,
+	    tk_boottime), sizeof(struct bintime), &boottimebin);
+	shared_page_write(sv->sv_timekeep_off + offsetof(struct vdso_timekeep,
+	    tk_enabled), sizeof(uint32_t), &enabled);

Then more large variables are written non-atomically in any order.  The
kernel has bugs in this area too (tc_setclock() hacks on bootimebin and
then does an invalid (possibly concurrent) call to tc_windup().

+	wmb();

Then things become written if we get this far.

+	shared_page_write(sv->sv_timekeep_off +
+	    offsetof(struct vdso_timekeep, tk_current), sizeof(uint32_t),
+	    &idx);

I don't understand this.  Why isn't it it before wmb(), or at least done
atomically.  Ah, it is tk_current.  Writing this as atomically 0 at the
start and then atomically here should be enough (no wmb()), except for
the problems with boottimebin().  Except tk_current is actually the
timehands index and there is no timehands generation in userland.  I
don't understand this.

+	sx_xunlock(&shared_page_alloc_sx);
+}

The enabled flag should be cleared when the timecounter is switched
away from a TSC.  I can't see where that happens.  Also, things should
change if a TSC is switched to another one (TSC-low <-> TSC).  That is
a bit more delicate and not convered by the enabled flag.

% +static int
% +binuptime(struct bintime *bt, struct vdso_timekeep *tk)
% +{
% +	struct vdso_timehands *th;
% +	uint32_t curr;
% +
% +	do {
% +		if (!tk->tk_enabled)
% +			return (ENOSYS);

This should not be acted on before the generation count stablizes.

% +
% +		/*
% +		 * XXXKIB. The load of tk->tk_current should use
% +		 * atomic_load_acq_32 to provide load barrier. But
% +		 * since tk points to r/o mapped page, x86
% +		 * implementation of atomic_load_acq faults.
% +		 */
% +		curr = tk->tk_current;
% +		rmb();

Memory barriers are intentionally left out in the kernel version.  Isn't
the generation count enough, provided it is stored using atomic_rel?

% +		th = &tk->tk_th[curr];
% +		if (th->th_algo != VDSO_TH_ALGO_1)
% +			return (ENOSYS);

I don't like having 2 conditional tests.  1 more than in the kernel
seems to be needed because all timehands may become unusable here (when
the kernel timecounter hardware stops being a TSC, and this happens
after any previous userland check of the flags).

% +		*bt = th->th_offset;
% +		bintime_addx(bt, th->th_scale * tc_delta(th));
% +	} while (curr != tk->tk_current);

With generation counts, it is only this second access to what was the
generation count that needs to be atomic.  If the other one is stale,
then it is different from this one.

% +	return (0);
% +}

It is a large regression to use the current index instead of the old
timehands.  The old timehands is stable for 10-100 msec after you load
it -- nothing in it, including its generation count changes in that
time.  So the kernel version of the above loop almost never iterates
more than once -- it only iterates if it is preempted for 10-100 msec.
But using the current index, you see this change as soon as the kernel
updates it, and then iterate, and aren't protected by the 10-100 msec
of time based locking.  Even accessing the current index requires more
locking.

Bruce

From owner-freebsd-arch@FreeBSD.ORG  Thu Jun  7 01:35:14 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 14A19106566B;
	Thu,  7 Jun 2012 01:35:14 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from fallbackmx10.syd.optusnet.com.au
	(fallbackmx10.syd.optusnet.com.au [211.29.132.251])
	by mx1.freebsd.org (Postfix) with ESMTP id 6B4878FC20;
	Thu,  7 Jun 2012 01:35:11 +0000 (UTC)
Received: from mail26.syd.optusnet.com.au (mail26.syd.optusnet.com.au
	[211.29.133.167])
	by fallbackmx10.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	q56LEbQE027963; Thu, 7 Jun 2012 07:15:06 +1000
Received: from c122-106-171-232.carlnfd1.nsw.optusnet.com.au
	(c122-106-171-232.carlnfd1.nsw.optusnet.com.au [122.106.171.232])
	by mail26.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	q56LEIJS013016
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Thu, 7 Jun 2012 07:14:26 +1000
Date: Thu, 7 Jun 2012 07:14:06 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= <des@des.no>
In-Reply-To: <864nqovoek.fsf@ds4.des.no>
Message-ID: <20120607064951.C1106@besplex.bde.org>
References: <CACfq090r1tWhuDkxdSZ24fwafbVKU0yduu1yV2+oYo+wwT4ipA@mail.gmail.com>
	<201206051008.29568.jhb@freebsd.org> <86haupvk4a.fsf@ds4.des.no>
	<201206051222.12627.jhb@freebsd.org>
	<20120605171446.GA28387@onelab2.iet.unipi.it>
	<20120606040931.F1050@besplex.bde.org> <864nqovoek.fsf@ds4.des.no>
MIME-Version: 1.0
Content-Type: MULTIPART/MIXED; BOUNDARY="0-925939591-1339017246=:1106"
X-Mailman-Approved-At: Thu, 07 Jun 2012 01:49:13 +0000
Cc: Gianni <gianni@FreeBSD.org>, John Baldwin <jhb@FreeBSD.org>,
	Alan Cox <alc@rice.edu>, Alexander Kabaev <kan@FreeBSD.org>,
	Attilio Rao <attilio@FreeBSD.org>,
	Konstantin Belousov <kib@FreeBSD.org>, freebsd-arch@FreeBSD.org,
	Konstantin Belousov <kostikbel@gmail.com>
Subject: Re: Fast vs slow syscalls (Re: Fwd: [RFC] Kernel shared variables)
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 07 Jun 2012 01:35:14 -0000

  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.

--0-925939591-1339017246=:1106
Content-Type: TEXT/PLAIN; charset=X-UNKNOWN; format=flowed
Content-Transfer-Encoding: QUOTED-PRINTABLE

On Wed, 6 Jun 2012, [utf-8] Dag-Erling Sm=C3=B8rgrav wrote:

> Bruce Evans <brde@optusnet.com.au> writes:
>> Dag-Erling Sm=C3=B8rgrav <des@des.no> writes:
>>> getpid(): 10,000,000 iterations in 24,400 ms
>>> gettimeofday(0, 0): 10,000,000 iterations in 54,104 ms
>>> raise(0): 10,000,000 iterations in 1,284,593 ms
>> That's one slow system or broken units.
>
> Broken units, these are microseconds not milliseconds.  Sorry.
>
>> After adjusting by factors of 1000 here and there, this format is still
>> hard to parse.  I like the format of nsec/operation.  24400 10 million
>> operations in 24400 moroseconds seems to scale to 2.44 nsec/call (if 1
>> moro =3D 1 micro).  But that is impossibly fast, unless getpid() is
>> inlined to a load of the shared variable (it may also need the load to
>> be moved outside the loop).  I can't see any reasonable adjustment that
>> gives 24.4 nsec/call.
>
> #define ITERATIONS 10000000
>
>    struct timeval start, end;
>    int i;
>
>    gettimeofday(&start, NULL);
>    for (i =3D 0; i < ITERATIONS; ++i)
>        getpid();
>    gettimeofday(&end, NULL);

Now 2.44 nsec/call makes sense, but you really should add some volatiles
here to ensure that getpid() is not optimized away.  I get 3.48-3.49
nsec/call on an Athlon64 2GHz (the ratio of the times is almost exactly
proportional to the clock freqencies, so the times in cycles must be
almost identical.

> On Linux, gcc 4.4.6 compiles this to:
>
>   # gettimeofday(&start, NULL)
>   0x000000000040064b <+23>:    lea    -0x20(%rbp),%rax
>   0x000000000040064f <+27>:    mov    $0x0,%esi
>   0x0000000000400654 <+32>:    mov    %rax,%rdi
>   0x0000000000400657 <+35>:    callq  0x400500 <gettimeofday@plt>
>
>   # i =3D 0
>   0x000000000040065c <+40>:    movl   $0x0,-0x4(%rbp)
>   0x0000000000400663 <+47>:    jmp    0x40066e <main+58>
>
>   # getpid()
>   0x0000000000400665 <+49>:    callq  0x400520 <getpid@plt>
>
>   # ++i
>   0x000000000040066a <+54>:    addl   $0x1,-0x4(%rbp)
>
>   # i < ITERATIONS
>   0x000000000040066e <+58>:    cmpl   $0x98967f,-0x4(%rbp)
>   0x0000000000400675 <+65>:    jle    0x400665 <main+49>
>
>   # gettimeofday(&end, NULL)
>   0x0000000000400677 <+67>:    lea    -0x30(%rbp),%rax
>   0x000000000040067b <+71>:    mov    $0x0,%esi
>   0x0000000000400680 <+76>:    mov    %rax,%rdi
>   0x0000000000400683 <+79>:    callq  0x400500 <gettimeofday@plt>
>
> The code generated by gcc 4.2.1 on FreeBSD is almost identical:
> ...

SO it loops OK, but we can't see what getpid() does.  It must not be
doing much.

> I don't know why gcc 4.4.6 loads &start / &end into %rax before copying
> it to %esi instead of loading it directly into %esi like 4.2.1 does.  I
> used the same command line (gcc -Wall -Wextra syscall.c) in both cases.

Probably unimportant (buried in loop overhead).

Program for 3.48-3.49 nsec:

% volatile int gpid;

It isn't volatile, but declaring it volatile prevents gcc-3.3.1 optimizing
away the whole call to getpid() (this reduces the time to 0.99 nsec =3D 2
cycles (2 cycles is the minimum loop overhead on most current x86)).

%=20
% int
% getpid(void)
% {
% =09return gpid;
% }
%=20
% main()
% {
% =09int i;
%=20
% =09for (i =3D 0; i < 1000000000; i++)
% =09=09getpid();
% }

Compiling with cc -O -fomit-frame-pointer gives:

% 08048520 <getpid>:
%  8048520:=09a1 0c 97 04 08       =09mov    0x804970c,% eax
%  8048525:=09c3                   =09ret=20
%  8048526:=0989 f6                =09mov    % esi,%esi
%=20
% 08048528 <main>:
%  8048528:=0955                   =09push   % ebp
%  8048529:=0989 e5                =09mov    % esp,%ebp
%  804852b:=0953                   =09push   % ebx
%  804852c:=0983 ec 04             =09sub    $0x4,% esp
%  804852f:=0983 e4 f0             =09and    $0xfffffff0,% esp
%  8048532:=09bb 00 00 00 00       =09mov    $0x0,% ebx
%  8048537:=0990                   =09nop=20
%  8048538:=09e8 e3 ff ff ff       =09call   8048520 <getpid>
%  804853d:=0943                   =09inc    % ebx
%  804853e:=0981 fb ff c9 9a 3b    =09cmp    $0x3b9ac9ff,% ebx
%  8048544:=097e f2                =09jle    8048538 <main+0x10>
%  8048546:=098b 5d fc             =09mov    0xfffffffc(% ebp),%ebx
%=20
%  8048549:=09c9                   =09leave=20
%  804854a:=09c3                   =09ret=20
%  804854b:=0990                   =09nop

-fomit-frame-pointer gives nicer object code but has no effect on the
runtime.

gettimeofday() needs several branches for null pointers, so it much slower
even before it does useful work.  Your system has an indirection or 2
for shared libraries (1 for the function call and maybe more for the global
pid), so it is doing well for getpid() to be no slower in cycles.  kib's
version has lots of layering (function calls and indirections inherited fro=
m
the kernel version where they are more needed) that might make it get to th=
e
useful work at about the same time Linux has done it and returned.

5.4104 nsec/call for gettimeofday() is impossible if there is any
rdtsc() hardware call or much layering.  rdtsc() takes 9-12 cycles on
AthlonXP and Athlon64, but 40+ cycles on Phenom+ and on most (?) Intel
CPUs and on most CPUs where it is P-state invariant (it is apparently
as hard or harder to synchronize in hardware as in software).  So Linux
can't be calling it to get 5.4104 nsec/call.  But calling and using
it should only take another 13-20 nsec at 3 GHz.  Excessive generality
in the software parts probably adds 10-20 nsec to this.  ISTR measuring
29 nsec (60+ cycles) for binuptime() Athlon XP.  That's with the hardware
part taking about 12 cycles.  gettimeofday()'s poor API adds a lot to
this.

Bruce
--0-925939591-1339017246=:1106--

From owner-freebsd-arch@FreeBSD.ORG  Thu Jun  7 03:00:44 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 90C21106566C;
	Thu,  7 Jun 2012 03:00:44 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail06.syd.optusnet.com.au (mail06.syd.optusnet.com.au
	[211.29.132.187])
	by mx1.freebsd.org (Postfix) with ESMTP id 24DD78FC14;
	Thu,  7 Jun 2012 03:00:43 +0000 (UTC)
Received: from c122-106-171-232.carlnfd1.nsw.optusnet.com.au
	(c122-106-171-232.carlnfd1.nsw.optusnet.com.au [122.106.171.232])
	by mail06.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	q5730YqF018640
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Thu, 7 Jun 2012 13:00:36 +1000
Date: Thu, 7 Jun 2012 13:00:34 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Konstantin Belousov <kostikbel@gmail.com>
In-Reply-To: <20120606205938.GS85127@deviant.kiev.zoral.com.ua>
Message-ID: <20120607130029.K1962@besplex.bde.org>
References: <20120606165115.GQ85127@deviant.kiev.zoral.com.ua>
	<201206061423.53179.jhb@freebsd.org>
	<20120606205938.GS85127@deviant.kiev.zoral.com.ua>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-arch@freebsd.org
Subject: Re: Fast gettimeofday(2) and clock_gettime(2)
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 07 Jun 2012 03:00:44 -0000

On Wed, 6 Jun 2012, Konstantin Belousov wrote:

> On Wed, Jun 06, 2012 at 02:23:53PM -0400, John Baldwin wrote:
>> In general this looks good but I see a few nits / races:
>>
>> 1) You don't follow the model of clearing tk_current to 0 while you
>>    are updating the structure that the in-kernel timecounter code
>>    uses.  This also means you have to avoid using a tk_current of 0
>>    and that userland has to keep spinning as long as tk_current is 0.
>>    Without this I believe userland can read a partially updated
>>    structure.
> I changed the code to be much more similar to the kern_tc.c. I (re)added
> the generation field, which is set to 0 upon kernel touching timehands.

Seems necessary.

> I think this can only happen if tc_windups occurs quite close in
> succession, or usermode thread is suspended for long enough. BTW,
> even generation could loop back to the previous value if thread is
> stopped.

tc_windup()'s close in succession are bugs, since they cycle the timehands
faster than they were designed to be.  We already have too many of these
bugs (where tc_setclock() calls tc_windup().  I didn't notice this
particular problem with it before).  Now I will point out that version
2 of your patch adds more of these calls, apparently to get changes to
happen sooner.  But in sysctl_kern_timecounter_hardware(), such a call
was intentionaly left out since it is not needed.  Note that tc_tick
prevents calls to tc_windup() more often than about once per msec if
hz > 1000.

The generation count makes tc_windup()s close in succession harmless,
except they increase race possibilities by reducing the time-domain
locking.  The generation count is 32 bits, so it can only loop back to
a previous value after 2**32 tc_windup_calls.  This "can't happen".
What can happen is for the timehands to cycle after something is
preempted for 10-100 msec.  Then the generation count allows detection
of the cycling.  It only has an effect in this case.  Otherwise, the
a thread can be preempted for 10-100 seconds and start up using a
timehands pointer that it read into a register that long ago, and
safely use the old pointer unless its generation has changed.  Even
switching the timecounter works in that case.  This depends on the
hardware part of the timecounter not going away and the software
keeping most state per-timehands.

> There was apparently another issue with version 2. The bcopy() is not
> atomic, so potentially libc could read wrong tk_current. I redid
> the interface to write to the shared page to allow use of real atomics.

Timecounter code is supposed to be lock-free except for some time-domain
locking.  I only see 1 problem with this: where tc_windup() writes the
generation count and other things without asking for these writes to
be ordered.  In most cases, the time-domain locking prevents problems.
E.g., when the timehands pointer is read, it remains valid for 9+
generations of cycling timehands (9+ to 90+ msec).  It is only when
it sleeps for this long while holding and planning to use the old
pointer that it needs the generation count to actually work.  Another
case is if writes are out of order (can't happen on x86), so:

  	/*
  	 * The write to th_generation fails to protect users of th
  	 * via 10-100 msec old pointers if it becomes visible unordered
  	 * after any of the writes done by the bcopy().  Very rare to
  	 * lose here, but th_generation's point is to not lose here.
  	 */
  	th->th_generation = 0;
  	bcopy(tho, th, offsetof(struct timehands, th_generation));

  	// finish writing th except for th_generation
  	th->th_generation = ogen;
  	/*
  	 * The previous write to th_generation fails to protect users
  	 * of th via old pointers if becomes visible unordered before
  	 * all of the other writes (users see the generation change
  	 * via the old pointer, and now since it has become nonzero
  	 * they use the incompletely written data.  Again, only a problem
  	 * after 10-100 msec.
  	 */

  	timehands = th;
  	/*
  	 * Now users can grab th via timehands.  If timehands became visible
  	 * unordered before all of the other writes except th_generation,
  	 * then users use the incompletely written data.  Now the time
  	 * domain locking doesn't help.
  	 */

>> 2) You read tk->tk_boottime without the tk_current protection in your
>>    non-uptime routines.  This is racey as the kernel alters the
>>    boottime when it skews time for large adjustments from ntp, etc.
>>    To be really safe you need to read the boottime inside the loop
>>    into a local variable and perhaps use a boolean parameter to decide
>>    if you should add it to the computed uptime.
> I moved the bootime to timehands from timekeep, thank you for the
> clarification.

This isn't bug for bug compatible with the kernel.  The kernel has a
global boottimebin which affects uses of old timehands the instance
that it is changed (even before tc_windup() is called).

> Updated patch is at
> http://people.freebsd.org/~kib/misc/moronix.3.patch

I had better not be awed by looking at it :-).

Bruce

From owner-freebsd-arch@FreeBSD.ORG  Thu Jun  7 09:12:53 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 47489106566B;
	Thu,  7 Jun 2012 09:12:53 +0000 (UTC)
	(envelope-from kostikbel@gmail.com)
Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200])
	by mx1.freebsd.org (Postfix) with ESMTP id A61278FC1A;
	Thu,  7 Jun 2012 09:12:52 +0000 (UTC)
Received: from skuns.kiev.zoral.com.ua (localhost [127.0.0.1])
	by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id q579ChPH062268;
	Thu, 7 Jun 2012 12:12:43 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1])
	by deviant.kiev.zoral.com.ua (8.14.5/8.14.5) with ESMTP id
	q579ChHj027961; Thu, 7 Jun 2012 12:12:43 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
Received: (from kostik@localhost)
	by deviant.kiev.zoral.com.ua (8.14.5/8.14.5/Submit) id q579Chjd027960; 
	Thu, 7 Jun 2012 12:12:43 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to
	kostikbel@gmail.com using -f
Date: Thu, 7 Jun 2012 12:12:43 +0300
From: Konstantin Belousov <kostikbel@gmail.com>
To: Bruce Evans <brde@optusnet.com.au>
Message-ID: <20120607091243.GV85127@deviant.kiev.zoral.com.ua>
References: <20120606165115.GQ85127@deviant.kiev.zoral.com.ua>
	<201206061423.53179.jhb@freebsd.org>
	<20120606205938.GS85127@deviant.kiev.zoral.com.ua>
	<20120607130029.K1962@besplex.bde.org>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="fryGc0vzirnrYIcd"
Content-Disposition: inline
In-Reply-To: <20120607130029.K1962@besplex.bde.org>
User-Agent: Mutt/1.4.2.3i
X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua
X-Virus-Status: Clean
X-Spam-Status: No, score=-4.0 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00
	autolearn=ham version=3.2.5
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on
	skuns.kiev.zoral.com.ua
Cc: freebsd-arch@freebsd.org
Subject: Re: Fast gettimeofday(2) and clock_gettime(2)
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 07 Jun 2012 09:12:53 -0000


--fryGc0vzirnrYIcd
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Thu, Jun 07, 2012 at 01:00:34PM +1000, Bruce Evans wrote:
> On Wed, 6 Jun 2012, Konstantin Belousov wrote:
>=20
> >On Wed, Jun 06, 2012 at 02:23:53PM -0400, John Baldwin wrote:
> >>In general this looks good but I see a few nits / races:
> >>
> >>1) You don't follow the model of clearing tk_current to 0 while you
> >>   are updating the structure that the in-kernel timecounter code
> >>   uses.  This also means you have to avoid using a tk_current of 0
> >>   and that userland has to keep spinning as long as tk_current is 0.
> >>   Without this I believe userland can read a partially updated
> >>   structure.
> >I changed the code to be much more similar to the kern_tc.c. I (re)added
> >the generation field, which is set to 0 upon kernel touching timehands.
>=20
> Seems necessary.
>=20
> >I think this can only happen if tc_windups occurs quite close in
> >succession, or usermode thread is suspended for long enough. BTW,
> >even generation could loop back to the previous value if thread is
> >stopped.
>=20
> tc_windup()'s close in succession are bugs, since they cycle the timehands
> faster than they were designed to be.  We already have too many of these
> bugs (where tc_setclock() calls tc_windup().  I didn't notice this
> particular problem with it before).  Now I will point out that version
> 2 of your patch adds more of these calls, apparently to get changes to
> happen sooner.  But in sysctl_kern_timecounter_hardware(), such a call
> was intentionaly left out since it is not needed.  Note that tc_tick
> prevents calls to tc_windup() more often than about once per msec if
> hz > 1000.
No, I did not added more tc_windup calls. I added a recalculation
of the shared page content on the timecounter change, which is not
the same as tc_windup() call. This is exactly to handle a disable
of usermode rdtsc use when kernel timecounter hardware changes.

>=20
> The generation count makes tc_windup()s close in succession harmless,
> except they increase race possibilities by reducing the time-domain
> locking.  The generation count is 32 bits, so it can only loop back to
> a previous value after 2**32 tc_windup_calls.  This "can't happen".
> What can happen is for the timehands to cycle after something is
> preempted for 10-100 msec.  Then the generation count allows detection
> of the cycling.  It only has an effect in this case.  Otherwise, the
> a thread can be preempted for 10-100 seconds and start up using a
> timehands pointer that it read into a register that long ago, and
> safely use the old pointer unless its generation has changed.  Even
> switching the timecounter works in that case.  This depends on the
> hardware part of the timecounter not going away and the software
> keeping most state per-timehands.
I reinstantiated the generation counter for rev. 3.

>=20
> >There was apparently another issue with version 2. The bcopy() is not
> >atomic, so potentially libc could read wrong tk_current. I redid
> >the interface to write to the shared page to allow use of real atomics.
>=20
> Timecounter code is supposed to be lock-free except for some time-domain
> locking.  I only see 1 problem with this: where tc_windup() writes the
> generation count and other things without asking for these writes to
> be ordered.  In most cases, the time-domain locking prevents problems.
In fact, on x86 the ordering is strong enough that no barriers are needed,
this is why the problem goes unnoticed so far.

> E.g., when the timehands pointer is read, it remains valid for 9+
> generations of cycling timehands (9+ to 90+ msec).  It is only when
> it sleeps for this long while holding and planning to use the old
> pointer that it needs the generation count to actually work.  Another
> case is if writes are out of order (can't happen on x86), so:
>=20
>  	/*
>  	 * The write to th_generation fails to protect users of th
>  	 * via 10-100 msec old pointers if it becomes visible unordered
>  	 * after any of the writes done by the bcopy().  Very rare to
>  	 * lose here, but th_generation's point is to not lose here.
>  	 */
>  	th->th_generation =3D 0;
>  	bcopy(tho, th, offsetof(struct timehands, th_generation));
>=20
>  	// finish writing th except for th_generation
>  	th->th_generation =3D ogen;
>  	/*
>  	 * The previous write to th_generation fails to protect users
>  	 * of th via old pointers if becomes visible unordered before
>  	 * all of the other writes (users see the generation change
>  	 * via the old pointer, and now since it has become nonzero
>  	 * they use the incompletely written data.  Again, only a problem
>  	 * after 10-100 msec.
>  	 */
>=20
>  	timehands =3D th;
>  	/*
>  	 * Now users can grab th via timehands.  If timehands became visible
>  	 * unordered before all of the other writes except th_generation,
>  	 * then users use the incompletely written data.  Now the time
>  	 * domain locking doesn't help.
>  	 */
>=20
> >>2) You read tk->tk_boottime without the tk_current protection in your
> >>   non-uptime routines.  This is racey as the kernel alters the
> >>   boottime when it skews time for large adjustments from ntp, etc.
> >>   To be really safe you need to read the boottime inside the loop
> >>   into a local variable and perhaps use a boolean parameter to decide
> >>   if you should add it to the computed uptime.
> >I moved the bootime to timehands from timekeep, thank you for the
> >clarification.
>=20
> This isn't bug for bug compatible with the kernel.  The kernel has a
> global boottimebin which affects uses of old timehands the instance
> that it is changed (even before tc_windup() is called).
>=20
> >Updated patch is at
> >http://people.freebsd.org/~kib/misc/moronix.3.patch
>=20
> I had better not be awed by looking at it :-).
I will test this with your test code when return to home.

--fryGc0vzirnrYIcd
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (FreeBSD)

iEYEARECAAYFAk/QcIsACgkQC3+MBN1Mb4jL9gCeM2BJ7raUIf4lK9/cnn7oOt9L
DZ0AoLk1bHMpwPz6kSv9mSCtMu5jUbRJ
=d4j3
-----END PGP SIGNATURE-----

--fryGc0vzirnrYIcd--

From owner-freebsd-arch@FreeBSD.ORG  Thu Jun  7 10:04:22 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 4D1D8106567F;
	Thu,  7 Jun 2012 10:04:22 +0000 (UTC)
	(envelope-from kostikbel@gmail.com)
Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200])
	by mx1.freebsd.org (Postfix) with ESMTP id DBBFA8FC1F;
	Thu,  7 Jun 2012 10:04:21 +0000 (UTC)
Received: from skuns.kiev.zoral.com.ua (localhost [127.0.0.1])
	by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id q57A42Q5072985;
	Thu, 7 Jun 2012 13:04:02 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1])
	by deviant.kiev.zoral.com.ua (8.14.5/8.14.5) with ESMTP id
	q57A42K6028244; Thu, 7 Jun 2012 13:04:02 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
Received: (from kostik@localhost)
	by deviant.kiev.zoral.com.ua (8.14.5/8.14.5/Submit) id q57A41Lb028243; 
	Thu, 7 Jun 2012 13:04:01 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to
	kostikbel@gmail.com using -f
Date: Thu, 7 Jun 2012 13:04:01 +0300
From: Konstantin Belousov <kostikbel@gmail.com>
To: Dag-Erling Sm??rgrav <des@des.no>
Message-ID: <20120607100401.GW85127@deviant.kiev.zoral.com.ua>
References: <CACfq090r1tWhuDkxdSZ24fwafbVKU0yduu1yV2+oYo+wwT4ipA@mail.gmail.com>
	<201206051008.29568.jhb@freebsd.org> <86haupvk4a.fsf@ds4.des.no>
	<201206051222.12627.jhb@freebsd.org>
	<20120605171446.GA28387@onelab2.iet.unipi.it>
	<20120606040931.F1050@besplex.bde.org> <864nqovoek.fsf@ds4.des.no>
	<20120607064951.C1106@besplex.bde.org> <86sje7sf31.fsf@ds4.des.no>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="PoKbPPFu8MuDl6RC"
Content-Disposition: inline
In-Reply-To: <86sje7sf31.fsf@ds4.des.no>
User-Agent: Mutt/1.4.2.3i
X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua
X-Virus-Status: Clean
X-Spam-Status: No, score=-4.0 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00
	autolearn=ham version=3.2.5
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on
	skuns.kiev.zoral.com.ua
Cc: John Baldwin <jhb@FreeBSD.org>, freebsd-arch@FreeBSD.org
Subject: Re: Fast vs slow syscalls (Re: Fwd: [RFC] Kernel shared variables)
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 07 Jun 2012 10:04:22 -0000


--PoKbPPFu8MuDl6RC
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Thu, Jun 07, 2012 at 10:26:10AM +0200, Dag-Erling Sm??rgrav wrote:
> Bruce Evans <brde@optusnet.com.au> writes:
> > Now 2.44 nsec/call makes sense, but you really should add some volatiles
> > here to ensure that getpid() is not optimized away.
>=20
> As you can see from the disassembly I provided, it isn't.
>=20
> > SO it loops OK, but we can't see what getpid() does.  It must not be
> > doing much.
>=20
> Umm, yes, that's the whole point of this conversation.  Linux's getpid()
> is not a syscall, but a library function that returns a constant from a
> page shared by the kernel.
>=20
> > 5.4104 nsec/call for gettimeofday() is impossible if there is any
> > rdtsc() hardware call or much layering.
>=20
> It's gettimeofday(0, 0), actually, so it doesn't need to read the clock.
> If I pass a struct timeval as the first argument - so it *does* need to
> read the clock - it's a little bit slower but still faster than an
> actual system call.  Here's another run that demonstrates this - a
> little bit slower than previous runs because I have other processes
> running:
>=20
> getpid(): 10,000,000 iterations in 30,377 us
> gettimeofday(0, 0): 10,000,000 iterations in 55,571 us
> gettimeofday(&tv, 0): 10,000,000 iterations in 302,634 us
So this timing seems to be approximately same by the order of magnitude
as the times I get for the patch, around 25 vs. 30ns/per gettimeofday()
call.

Linux seems slower probably due to slower CPU ? Mine is 3.4Ghz, while
des used 3.1Ghz for Linux box.

> kill(pid, 0): 10,000,000 iterations in 1,291,793 us
>=20
> I can't test a static build since RHEL6 does not provide a static libc.
>=20
> DES
> --=20
> Dag-Erling Sm??rgrav - des@des.no

--PoKbPPFu8MuDl6RC
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (FreeBSD)

iEYEARECAAYFAk/QfJEACgkQC3+MBN1Mb4itsgCgsxTeKDTcDUfT3Q8hK0aYFBDs
0+sAoMzkk9S8GR9ivMLh2+70M0nWjqOz
=tk9Z
-----END PGP SIGNATURE-----

--PoKbPPFu8MuDl6RC--

From owner-freebsd-arch@FreeBSD.ORG  Thu Jun  7 11:02:53 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id E4BDA106566B;
	Thu,  7 Jun 2012 11:02:53 +0000 (UTC) (envelope-from des@des.no)
Received: from smtp.des.no (smtp.des.no [194.63.250.102])
	by mx1.freebsd.org (Postfix) with ESMTP id 9F4508FC0C;
	Thu,  7 Jun 2012 11:02:53 +0000 (UTC)
Received: from ds4.des.no (smtp.des.no [194.63.250.102])
	by smtp.des.no (Postfix) with ESMTP id 8D02868C0;
	Thu,  7 Jun 2012 11:02:52 +0000 (UTC)
Received: by ds4.des.no (Postfix, from userid 1001)
	id 270AF9A97; Thu,  7 Jun 2012 13:02:51 +0200 (CEST)
From: =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= <des@des.no>
To: Konstantin Belousov <kostikbel@gmail.com>
References: <CACfq090r1tWhuDkxdSZ24fwafbVKU0yduu1yV2+oYo+wwT4ipA@mail.gmail.com>
	<201206051008.29568.jhb@freebsd.org> <86haupvk4a.fsf@ds4.des.no>
	<201206051222.12627.jhb@freebsd.org>
	<20120605171446.GA28387@onelab2.iet.unipi.it>
	<20120606040931.F1050@besplex.bde.org> <864nqovoek.fsf@ds4.des.no>
	<20120607064951.C1106@besplex.bde.org> <86sje7sf31.fsf@ds4.des.no>
	<20120607100401.GW85127@deviant.kiev.zoral.com.ua>
Date: Thu, 07 Jun 2012 13:02:51 +0200
In-Reply-To: <20120607100401.GW85127@deviant.kiev.zoral.com.ua> (Konstantin
	Belousov's message of "Thu, 7 Jun 2012 13:04:01 +0300")
Message-ID: <8662b3s7tw.fsf@ds4.des.no>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.3 (berkeley-unix)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Cc: John Baldwin <jhb@FreeBSD.org>, freebsd-arch@FreeBSD.org
Subject: Re: Fast vs slow syscalls (Re: Fwd: [RFC] Kernel shared variables)
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 07 Jun 2012 11:02:54 -0000

Konstantin Belousov <kostikbel@gmail.com> writes:
> Linux seems slower probably due to slower CPU ? Mine is 3.4Ghz, while
> des used 3.1Ghz for Linux box.

I got better results on the same Linux box yesterday (by about 20%).
I'm not sure what has changed.

DES
--=20
Dag-Erling Sm=C3=B8rgrav - des@des.no

From owner-freebsd-arch@FreeBSD.ORG  Thu Jun  7 08:26:18 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 464A2106564A;
	Thu,  7 Jun 2012 08:26:18 +0000 (UTC) (envelope-from des@des.no)
Received: from smtp.des.no (smtp.des.no [194.63.250.102])
	by mx1.freebsd.org (Postfix) with ESMTP id EB3AE8FC08;
	Thu,  7 Jun 2012 08:26:17 +0000 (UTC)
Received: from ds4.des.no (smtp.des.no [194.63.250.102])
	by smtp.des.no (Postfix) with ESMTP id E482C682D;
	Thu,  7 Jun 2012 08:26:10 +0000 (UTC)
Received: by ds4.des.no (Postfix, from userid 1001)
	id 8DD0A9A65; Thu,  7 Jun 2012 10:26:10 +0200 (CEST)
From: =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= <des@des.no>
To: Bruce Evans <brde@optusnet.com.au>
References: <CACfq090r1tWhuDkxdSZ24fwafbVKU0yduu1yV2+oYo+wwT4ipA@mail.gmail.com>
	<201206051008.29568.jhb@freebsd.org> <86haupvk4a.fsf@ds4.des.no>
	<201206051222.12627.jhb@freebsd.org>
	<20120605171446.GA28387@onelab2.iet.unipi.it>
	<20120606040931.F1050@besplex.bde.org> <864nqovoek.fsf@ds4.des.no>
	<20120607064951.C1106@besplex.bde.org>
Date: Thu, 07 Jun 2012 10:26:10 +0200
In-Reply-To: <20120607064951.C1106@besplex.bde.org> (Bruce Evans's message of
	"Thu, 7 Jun 2012 07:14:06 +1000 (EST)")
Message-ID: <86sje7sf31.fsf@ds4.des.no>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.3 (berkeley-unix)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Mailman-Approved-At: Thu, 07 Jun 2012 11:13:20 +0000
Cc: Gianni <gianni@FreeBSD.org>, John Baldwin <jhb@FreeBSD.org>,
	Alan Cox <alc@rice.edu>, Alexander Kabaev <kan@FreeBSD.org>,
	Attilio Rao <attilio@FreeBSD.org>,
	Konstantin Belousov <kib@FreeBSD.org>, freebsd-arch@FreeBSD.org,
	Konstantin Belousov <kostikbel@gmail.com>
Subject: Re: Fast vs slow syscalls (Re: Fwd: [RFC] Kernel shared variables)
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 07 Jun 2012 08:26:18 -0000

Bruce Evans <brde@optusnet.com.au> writes:
> Now 2.44 nsec/call makes sense, but you really should add some volatiles
> here to ensure that getpid() is not optimized away.

As you can see from the disassembly I provided, it isn't.

> SO it loops OK, but we can't see what getpid() does.  It must not be
> doing much.

Umm, yes, that's the whole point of this conversation.  Linux's getpid()
is not a syscall, but a library function that returns a constant from a
page shared by the kernel.

> 5.4104 nsec/call for gettimeofday() is impossible if there is any
> rdtsc() hardware call or much layering.

It's gettimeofday(0, 0), actually, so it doesn't need to read the clock.
If I pass a struct timeval as the first argument - so it *does* need to
read the clock - it's a little bit slower but still faster than an
actual system call.  Here's another run that demonstrates this - a
little bit slower than previous runs because I have other processes
running:

getpid(): 10,000,000 iterations in 30,377 us
gettimeofday(0, 0): 10,000,000 iterations in 55,571 us
gettimeofday(&tv, 0): 10,000,000 iterations in 302,634 us
kill(pid, 0): 10,000,000 iterations in 1,291,793 us

I can't test a static build since RHEL6 does not provide a static libc.

DES
--=20
Dag-Erling Sm=C3=B8rgrav - des@des.no

From owner-freebsd-arch@FreeBSD.ORG  Thu Jun  7 12:37:51 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id D1C9A106564A
	for <freebsd-arch@freebsd.org>; Thu,  7 Jun 2012 12:37:51 +0000 (UTC)
	(envelope-from jhb@freebsd.org)
Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net
	[IPv6:2001:470:1f10:75::2])
	by mx1.freebsd.org (Postfix) with ESMTP id A68A58FC19
	for <freebsd-arch@freebsd.org>; Thu,  7 Jun 2012 12:37:51 +0000 (UTC)
Received: from jhbbsd.localnet (unknown [209.249.190.124])
	by bigwig.baldwin.cx (Postfix) with ESMTPSA id 1CF94B922;
	Thu,  7 Jun 2012 08:37:51 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: freebsd-arch@freebsd.org
Date: Thu, 7 Jun 2012 08:10:08 -0400
User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p13; KDE/4.5.5; amd64; ; )
References: <20120606165115.GQ85127@deviant.kiev.zoral.com.ua>
	<201206061423.53179.jhb@freebsd.org>
	<20120607084229.C1474@besplex.bde.org>
In-Reply-To: <20120607084229.C1474@besplex.bde.org>
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Message-Id: <201206070810.08166.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7
	(bigwig.baldwin.cx); Thu, 07 Jun 2012 08:37:51 -0400 (EDT)
Cc: Konstantin Belousov <kostikbel@gmail.com>
Subject: Re: Fast gettimeofday(2) and clock_gettime(2)
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 07 Jun 2012 12:37:51 -0000

On Wednesday, June 06, 2012 9:35:49 pm Bruce Evans wrote:
> On Wed, 6 Jun 2012, John Baldwin wrote:
> 
> > On Wednesday, June 06, 2012 12:51:15 pm Konstantin Belousov wrote:
> >> A positive result from the recent flame-bait on arch@ is the working
> >> implementation of the fast gettimeofday(2) and clock_gettime(2). The
> >> speedup I see is around 6-7x on the 2600K. I think the speedup could
> >> be even bigger on the previous generation of CPUs, where lock
> >> operations and syscall entry are costlier. A sample test runs of
> >> tools/tools/syscall_timing are presented at the end of message.
> >
> > In general this looks good but I see a few nits / races:
> 
> It is awefully (sic) complete and large.  The patch is almost twice as
> large as the entire kern_tc.c in FreeBSD-4, and that was quite bloated.
> 
> > 1) You don't follow the model of clearing tk_current to 0 while you
> >   are updating the structure that the in-kernel timecounter code
> >   uses.  This also means you have to avoid using a tk_current of 0
> >   and that userland has to keep spinning as long as tk_current is 0.
> >   Without this I believe userland can read a partially updated
> >   structure.
> 
> I thought that too at first, but after looking at the patch decided
> that it may be correct, but is too hard for me to understand.
> Urk, we both missed that tk_current is an index into the timehands
> array, so it cannot act as a generation count and it seems to be harder
> to lock.

Ugh, so it goes a long way to emulate the timehands array in userland.  As I 
mentioned previously, I consider the timehands array to be a bug.  However, I 
do think the generation count in the in-kernel timehands structure is useful 
and should be kept (and follow the same model of setting it to 0 before doing
updates, then updating the structure, then setting the new generation).

-- 
John Baldwin

From owner-freebsd-arch@FreeBSD.ORG  Thu Jun  7 12:55:34 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id D4FFD1065674
	for <freebsd-arch@freebsd.org>; Thu,  7 Jun 2012 12:55:34 +0000 (UTC)
	(envelope-from jhb@freebsd.org)
Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net
	[IPv6:2001:470:1f10:75::2])
	by mx1.freebsd.org (Postfix) with ESMTP id 916C78FC18
	for <freebsd-arch@freebsd.org>; Thu,  7 Jun 2012 12:55:34 +0000 (UTC)
Received: from jhbbsd.localnet (unknown [209.249.190.124])
	by bigwig.baldwin.cx (Postfix) with ESMTPSA id D8930B978;
	Thu,  7 Jun 2012 08:55:33 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: Konstantin Belousov <kostikbel@gmail.com>
Date: Thu, 7 Jun 2012 08:50:55 -0400
User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p13; KDE/4.5.5; amd64; ; )
References: <20120606165115.GQ85127@deviant.kiev.zoral.com.ua>
	<201206061423.53179.jhb@freebsd.org>
	<20120606205938.GS85127@deviant.kiev.zoral.com.ua>
In-Reply-To: <20120606205938.GS85127@deviant.kiev.zoral.com.ua>
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="iso-8859-15"
Content-Transfer-Encoding: 7bit
Message-Id: <201206070850.55751.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7
	(bigwig.baldwin.cx); Thu, 07 Jun 2012 08:55:34 -0400 (EDT)
Cc: freebsd-arch@freebsd.org
Subject: Re: Fast gettimeofday(2) and clock_gettime(2)
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 07 Jun 2012 12:55:35 -0000

On Wednesday, June 06, 2012 4:59:38 pm Konstantin Belousov wrote:
> On Wed, Jun 06, 2012 at 02:23:53PM -0400, John Baldwin wrote:
> > On Wednesday, June 06, 2012 12:51:15 pm Konstantin Belousov wrote:
> > > A positive result from the recent flame-bait on arch@ is the working
> > > implementation of the fast gettimeofday(2) and clock_gettime(2). The
> > > speedup I see is around 6-7x on the 2600K. I think the speedup could
> > > be even bigger on the previous generation of CPUs, where lock
> > > operations and syscall entry are costlier. A sample test runs of
> > > tools/tools/syscall_timing are presented at the end of message.
> > 
> > In general this looks good but I see a few nits / races:
> > 
> > 1) You don't follow the model of clearing tk_current to 0 while you
> >    are updating the structure that the in-kernel timecounter code
> >    uses.  This also means you have to avoid using a tk_current of 0
> >    and that userland has to keep spinning as long as tk_current is 0.
> >    Without this I believe userland can read a partially updated
> >    structure.
> I changed the code to be much more similar to the kern_tc.c. I (re)added
> the generation field, which is set to 0 upon kernel touching timehands.

Thank you.  BTW, I think we should use atomic_load_acq_int() on both accesses 
to th_gen (and the in-kernel binuptime should do the same).  I realize this
requires using rmb before the while condition in userland since we can't
use atomic_load_acq_int() here.  I think it should also use 
atomic_store_rel_int() for both stores to th_gen during the tc_windup()
callback.

> I think this can only happen if tc_windups occurs quite close in
> succession, or usermode thread is suspended for long enough. BTW,
> even generation could loop back to the previous value if thread is
> stopped.

Having the 32-bit generation count roll over should take a long while.

> > > sandy% /usr/home/pooma/build/bsd/DEV/stuff/tests/syscall_timing_32 
> > gettimeofday
> > > Clock resolution: 0.000000076
> > > test    loop    time    iterations      periteration
> > > gettimeofday    0       1.000994225     21623297        0.000000046
> > > gettimeofday    1       1.000994980     21596492        0.000000046
> > > gettimeofday    2       1.001070595     21598326        0.000000046
> > > gettimeofday    3       1.000922308     21581398        0.000000046
> > > gettimeofday    4       1.000984264     21605539        0.000000046
> > > gettimeofday    5       1.000989697     21601659        0.000000046
> > > gettimeofday    6       1.000996261     21598385        0.000000046
> > > gettimeofday    7       1.001002223     21583933        0.000000046
> > > gettimeofday    8       1.000985847     21599442        0.000000046
> > > gettimeofday    9       1.000994977     21600935        0.000000046
> > > sandy% sudo sysctl kern.timecounter.fast_gettime=0
> > 
> > I think this means you can call gettimeofday() in about 46 ns now
> > vs 310 the "old" way?
> 
> Yes. This is for 32bit, while for 64 bit binaries the numbers are
> 155->25 ns on the same hw.

Ah, good.  A non-generic hardcoded amd64 version is around 20ns, so
this is comparable.

-- 
John Baldwin

From owner-freebsd-arch@FreeBSD.ORG  Thu Jun  7 16:07:30 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 10FAA1065675;
	Thu,  7 Jun 2012 16:07:30 +0000 (UTC) (envelope-from jhb@freebsd.org)
Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net
	[IPv6:2001:470:1f10:75::2])
	by mx1.freebsd.org (Postfix) with ESMTP id D9AFC8FC21;
	Thu,  7 Jun 2012 16:07:29 +0000 (UTC)
Received: from jhbbsd.localnet (unknown [209.249.190.124])
	by bigwig.baldwin.cx (Postfix) with ESMTPSA id 3FDE7B95E;
	Thu,  7 Jun 2012 12:07:29 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: freebsd-arch@freebsd.org
Date: Thu, 7 Jun 2012 09:56:02 -0400
User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p13; KDE/4.5.5; amd64; ; )
References: <CACfq090r1tWhuDkxdSZ24fwafbVKU0yduu1yV2+oYo+wwT4ipA@mail.gmail.com>
In-Reply-To: <CACfq090r1tWhuDkxdSZ24fwafbVKU0yduu1yV2+oYo+wwT4ipA@mail.gmail.com>
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Message-Id: <201206070956.03129.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7
	(bigwig.baldwin.cx); Thu, 07 Jun 2012 12:07:29 -0400 (EDT)
Cc: Attilio Rao <attilio@freebsd.org>, alc@freebsd.org,
	Giovanni Trematerra <giovanni.trematerra@gmail.com>,
	Konstantin Belousov <kib@freebsd.org>, Alexander Kabaev <kan@freebsd.org>
Subject: Re: [RFC] Kernel shared variables
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 07 Jun 2012 16:07:30 -0000

On Friday, June 01, 2012 1:53:15 pm Giovanni Trematerra wrote:
> Hello,
> I'd like to discuss a way to provide a mechanism to share some read-only
> data between kernel and user space programs avoiding syscall overhead,
> implementing some them, such as gettimeofday(3) and time(3) as ordinary
> user space routine.
> 
> The patch at
> http://www.trematerra.net/patches/ksvar_experimental.patch

I realize this thread descended a bit, and I do still think that Konstantin's
patch is probably the right way forward for gettimeofday().  However, have you
thought at all about a per-process page?  There was another fork in this 
thread that dealt with per-process data such as getpid() (for which it does 
seem there are real-world uses).  I realize the KSVAR stuff might not easily 
be adjusted to working with a per-process page (though Jeff did do something 
interesting with having a template page defined by DPCPU that was then copied 
for each CPU).  It would also seem that for things like getpid(), getppid(), 
and getuid() it might be best to go the vdso route.  Is that something you 
would be interested in working on?
 
-- 
John Baldwin

From owner-freebsd-arch@FreeBSD.ORG  Thu Jun  7 17:28:51 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 635ED106566B;
	Thu,  7 Jun 2012 17:28:51 +0000 (UTC)
	(envelope-from kostikbel@gmail.com)
Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200])
	by mx1.freebsd.org (Postfix) with ESMTP id F0CF88FC0C;
	Thu,  7 Jun 2012 17:28:50 +0000 (UTC)
Received: from skuns.kiev.zoral.com.ua (localhost [127.0.0.1])
	by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id q57HSeLB054444;
	Thu, 7 Jun 2012 20:28:40 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1])
	by deviant.kiev.zoral.com.ua (8.14.5/8.14.5) with ESMTP id
	q57HSd0v030478; Thu, 7 Jun 2012 20:28:39 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
Received: (from kostik@localhost)
	by deviant.kiev.zoral.com.ua (8.14.5/8.14.5/Submit) id q57HSd4F030477; 
	Thu, 7 Jun 2012 20:28:39 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to
	kostikbel@gmail.com using -f
Date: Thu, 7 Jun 2012 20:28:39 +0300
From: Konstantin Belousov <kostikbel@gmail.com>
To: John Baldwin <jhb@freebsd.org>
Message-ID: <20120607172839.GZ85127@deviant.kiev.zoral.com.ua>
References: <20120606165115.GQ85127@deviant.kiev.zoral.com.ua>
	<201206061423.53179.jhb@freebsd.org>
	<20120606205938.GS85127@deviant.kiev.zoral.com.ua>
	<201206070850.55751.jhb@freebsd.org>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="YBGzgpgHAney5ErF"
Content-Disposition: inline
In-Reply-To: <201206070850.55751.jhb@freebsd.org>
User-Agent: Mutt/1.4.2.3i
X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua
X-Virus-Status: Clean
X-Spam-Status: No, score=-4.0 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00
	autolearn=ham version=3.2.5
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on
	skuns.kiev.zoral.com.ua
Cc: freebsd-arch@freebsd.org
Subject: Re: Fast gettimeofday(2) and clock_gettime(2)
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 07 Jun 2012 17:28:51 -0000


--YBGzgpgHAney5ErF
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Thu, Jun 07, 2012 at 08:50:55AM -0400, John Baldwin wrote:
> On Wednesday, June 06, 2012 4:59:38 pm Konstantin Belousov wrote:
> > On Wed, Jun 06, 2012 at 02:23:53PM -0400, John Baldwin wrote:
> > > On Wednesday, June 06, 2012 12:51:15 pm Konstantin Belousov wrote:
> > > > A positive result from the recent flame-bait on arch@ is the working
> > > > implementation of the fast gettimeofday(2) and clock_gettime(2). The
> > > > speedup I see is around 6-7x on the 2600K. I think the speedup could
> > > > be even bigger on the previous generation of CPUs, where lock
> > > > operations and syscall entry are costlier. A sample test runs of
> > > > tools/tools/syscall_timing are presented at the end of message.
> > >=20
> > > In general this looks good but I see a few nits / races:
> > >=20
> > > 1) You don't follow the model of clearing tk_current to 0 while you
> > >    are updating the structure that the in-kernel timecounter code
> > >    uses.  This also means you have to avoid using a tk_current of 0
> > >    and that userland has to keep spinning as long as tk_current is 0.
> > >    Without this I believe userland can read a partially updated
> > >    structure.
> > I changed the code to be much more similar to the kern_tc.c. I (re)added
> > the generation field, which is set to 0 upon kernel touching timehands.
>=20
> Thank you.  BTW, I think we should use atomic_load_acq_int() on both acce=
sses=20
> to th_gen (and the in-kernel binuptime should do the same).  I realize th=
is
> requires using rmb before the while condition in userland since we can't
> use atomic_load_acq_int() here.  I think it should also use=20
> atomic_store_rel_int() for both stores to th_gen during the tc_windup()
> callback.
This is done. On the other hand, I removed a store_rel from updating
tk_current, since it is after enabling store to th_gen, and the order
there does not matter.

I also did some restructuring of the userspace, removing layers that
Bruce did not liked. Now top-level functions directly call binuptime().
I also shortened the preliminary operations by caching timekeep pointer.
Its double-initialization is safe.

Latest version is at
http://people.freebsd.org/~kib/misc/moronix.4.patch

I probably move all shared page helpers to separate file from kern_exec.c,
but this will happen after moronix is committed.

--YBGzgpgHAney5ErF
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (FreeBSD)

iEYEARECAAYFAk/Q5McACgkQC3+MBN1Mb4goxQCg1CEB9/qDJ7WNNVdNleSpqiUS
kZwAniRrYMNQOjHycMeeoCOu4ixtChdl
=j52Z
-----END PGP SIGNATURE-----

--YBGzgpgHAney5ErF--

From owner-freebsd-arch@FreeBSD.ORG  Thu Jun  7 20:10:01 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 32A0F1065675;
	Thu,  7 Jun 2012 20:10:01 +0000 (UTC)
	(envelope-from alexander@leidinger.net)
Received: from mail.ebusiness-leidinger.de (mail.ebusiness-leidinger.de
	[217.11.53.44])
	by mx1.freebsd.org (Postfix) with ESMTP id A2FB78FC1A;
	Thu,  7 Jun 2012 20:09:59 +0000 (UTC)
Received: from outgoing.leidinger.net (p4FC4380C.dip.t-dialin.net
	[79.196.56.12])
	by mail.ebusiness-leidinger.de (Postfix) with ESMTPSA id 9B80B84473A;
	Thu,  7 Jun 2012 22:09:39 +0200 (CEST)
Received: from unknown (IO.Leidinger.net [192.168.1.12])
	by outgoing.leidinger.net (Postfix) with ESMTPS id BD8922B97;
	Thu,  7 Jun 2012 22:09:36 +0200 (CEST)
Date: Thu, 7 Jun 2012 22:09:33 +0200
From: Alexander Leidinger <Alexander@Leidinger.net>
To: Attilio Rao <attilio@freebsd.org>
Message-ID: <20120607220933.00003865@unknown>
In-Reply-To: <CAJ-FndAjcfd21xwYHPrSxgz32eHp2xTGRao1Kyqx4yBZTPD94A@mail.gmail.com>
References: <86bokyvtc2.fsf@ds4.des.no>
	<CAJ-FndAMsoB1RAyS-Pa1JCv7W0qsviRxtShZ3uk_Tpd+J_EBaQ@mail.gmail.com>
	<CAJ-FndAadYbqiWUTupXLEcRMkYYL50Ssehi8f8vv6YXvQzy4OA@mail.gmail.com>
	<CAJ-Vmo=F-=1UYcW0xBiNbjHa8CN7S=iDJ_bQDn4ESS3CumJf_A@mail.gmail.com>
	<CAJ-FndAjcfd21xwYHPrSxgz32eHp2xTGRao1Kyqx4yBZTPD94A@mail.gmail.com>
X-Mailer: Claws Mail 3.7.10cvs42 (GTK+ 2.16.6; i586-pc-mingw32msvc)
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
X-EBL-MailScanner-Information: Please contact the ISP for more information
X-EBL-MailScanner-ID: 9B80B84473A.A2700
X-EBL-MailScanner: Found to be clean
X-EBL-MailScanner-SpamCheck: not spam, spamhaus-ZEN,
	SpamAssassin (not cached, score=-0.733, required 6,
	autolearn=disabled, ALL_TRUSTED -1.00, AWL 0.28,
	T_RP_MATCHES_RCVD -0.01)
X-EBL-MailScanner-From: alexander@leidinger.net
X-EBL-MailScanner-Watermark: 1339704580.62523@qvMjeo86+nGn4Gz9SRgRPw
X-EBL-Spam-Status: No
Cc: =?ISO-8859-1?Q?grav?= <des@des.no>, Adrian Chadd <adrian@freebsd.org>,
	Dag-Erling, arch@freebsd.org
Subject: Re: KTR_SPAREx
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 07 Jun 2012 20:10:01 -0000

On Tue, 5 Jun 2012 21:14:02 +0100 Attilio Rao <attilio@freebsd.org>
wrote:

> 2012/6/5 Adrian Chadd <adrian@freebsd.org>:
> > Hi,
> >
> > I'm very tempted to make if_ath use KTR_DEV, but then have an extra
> > ath sysctl which does something like:
> >
> > if (sc->sc_ktr_enable)
> > =A0 =A0KTR();
>=20
> But the actual problem is that your output will be overwhelmed by the
> clutter of all the other KTR_DEV consumers.
>=20
> We very much need an much higher granularity on KTR classes and
> possibly a way to use it on-the-fly for kernel development and I think
> what I suggested earlier makes sense.

How much of the uncovered uses of KTR really need KTR (instead of
dtrace)? How many of them are time critical enough that dtrace is not
fast enough? How many of them need to run very early so that not enough
kernel infrastructure is available to run dtrace (can we run dtrace
scripts very early during boot (when enough kernel infrastructure is
available, before anything in userland starts) like in Solaris)?

Bye,
Alexander.

--=20
http://www.Leidinger.net    Alexander @ Leidinger.net: PGP ID =3D B0063FE7
http://www.FreeBSD.org       netchild @ FreeBSD.org  : PGP ID =3D 72077137

From owner-freebsd-arch@FreeBSD.ORG  Thu Jun  7 20:44:46 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 3FDAD106566C;
	Thu,  7 Jun 2012 20:44:46 +0000 (UTC)
	(envelope-from rysto32@gmail.com)
Received: from mail-ey0-f182.google.com (mail-ey0-f182.google.com
	[209.85.215.182])
	by mx1.freebsd.org (Postfix) with ESMTP id 25EF48FC15;
	Thu,  7 Jun 2012 20:44:41 +0000 (UTC)
Received: by eaac13 with SMTP id c13so577717eaa.13
	for <multiple recipients>; Thu, 07 Jun 2012 13:44:41 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type;
	bh=nf0hL9PoXJ3UNUyQuR6Ouy/4h9VUIfDg+RT6VvUX+ak=;
	b=KuAS+YugtXK0NtcynR3mtDRtq8oW2mvePEX7LmQQ2UNJmsnnT90nzlX7dbehEkV/+R
	dW8pjgs57+RkNRoS6nKXlRxfiiU3WsjBX3FNMZqW7Np2TATEV3z26RHolURVz1D7q00N
	MS1aijIkdCoWaBFSpYKG20ZSlzuIWzobdrC35Y+uOR+wEkTmeomOkhaRGcCsFXkBJJh4
	cZ0ot7YLdTkjBZ4JcjSnVpUO6pMuT1i0GQWQlfMy+LpohUw76xC3xqKwRt6B8FgfkREq
	P7aOvLS5jeA/Fqnyt07k/g/V126Ki2jV90WJAnFdQO4ll/6Ji6Thk06Q+gBLaL7Giytc
	AgXQ==
MIME-Version: 1.0
Received: by 10.14.95.207 with SMTP id p55mr2040788eef.40.1339101881046; Thu,
	07 Jun 2012 13:44:41 -0700 (PDT)
Received: by 10.180.146.131 with HTTP; Thu, 7 Jun 2012 13:44:40 -0700 (PDT)
In-Reply-To: <20120607220933.00003865@unknown>
References: <86bokyvtc2.fsf@ds4.des.no>
	<CAJ-FndAMsoB1RAyS-Pa1JCv7W0qsviRxtShZ3uk_Tpd+J_EBaQ@mail.gmail.com>
	<CAJ-FndAadYbqiWUTupXLEcRMkYYL50Ssehi8f8vv6YXvQzy4OA@mail.gmail.com>
	<CAJ-Vmo=F-=1UYcW0xBiNbjHa8CN7S=iDJ_bQDn4ESS3CumJf_A@mail.gmail.com>
	<CAJ-FndAjcfd21xwYHPrSxgz32eHp2xTGRao1Kyqx4yBZTPD94A@mail.gmail.com>
	<20120607220933.00003865@unknown>
Date: Thu, 7 Jun 2012 16:44:40 -0400
Message-ID: <CAFMmRNxUoXixSwewEQTrzP7k2oMDsxu8MtskTBDbhvj_ot46rQ@mail.gmail.com>
From: Ryan Stone <rysto32@gmail.com>
To: Alexander Leidinger <Alexander@leidinger.net>
Content-Type: text/plain; charset=ISO-8859-1
Cc: Attilio Rao <attilio@freebsd.org>, grav <des@des.no>,
	Adrian Chadd <adrian@freebsd.org>, arch@freebsd.org, Dag-Erling@freebsd.org
Subject: Re: KTR_SPAREx
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 07 Jun 2012 20:44:46 -0000

On Thu, Jun 7, 2012 at 4:09 PM, Alexander Leidinger
<Alexander@leidinger.net> wrote:
> How many of them need to run very early so that not enough
> kernel infrastructure is available to run dtrace (can we run dtrace
> scripts very early during boot (when enough kernel infrastructure is
> available, before anything in userland starts) like in Solaris)?

We don't currently have boot-time DTrace in FreeBSD.  We also don't
have post mortem DTrace (ie the equivalent of ktrdump -m).

However, I would suspect that most of the cases in the tree where
drivers have been checked in using KTR_SPARE could be replaced by
DTrace.

From owner-freebsd-arch@FreeBSD.ORG  Thu Jun  7 20:47:29 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 74E22106566B;
	Thu,  7 Jun 2012 20:47:29 +0000 (UTC) (envelope-from jhb@freebsd.org)
Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net
	[IPv6:2001:470:1f10:75::2])
	by mx1.freebsd.org (Postfix) with ESMTP id 3BE3D8FC14;
	Thu,  7 Jun 2012 20:47:29 +0000 (UTC)
Received: from jhbbsd.localnet (unknown [209.249.190.124])
	by bigwig.baldwin.cx (Postfix) with ESMTPSA id A2522B948;
	Thu,  7 Jun 2012 16:47:28 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: freebsd-arch@freebsd.org
Date: Thu, 7 Jun 2012 16:42:41 -0400
User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p13; KDE/4.5.5; amd64; ; )
References: <86bokyvtc2.fsf@ds4.des.no>
	<CAJ-FndAjcfd21xwYHPrSxgz32eHp2xTGRao1Kyqx4yBZTPD94A@mail.gmail.com>
	<20120607220933.00003865@unknown>
In-Reply-To: <20120607220933.00003865@unknown>
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Message-Id: <201206071642.41216.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7
	(bigwig.baldwin.cx); Thu, 07 Jun 2012 16:47:28 -0400 (EDT)
Cc: Attilio Rao <attilio@freebsd.org>,
	Alexander Leidinger <Alexander@leidinger.net>,
	Adrian Chadd <adrian@freebsd.org>, Dag-Erling@freebsd.org,
	grav <des@des.no>
Subject: Re: KTR_SPAREx
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 07 Jun 2012 20:47:29 -0000

On Thursday, June 07, 2012 4:09:33 pm Alexander Leidinger wrote:
> On Tue, 5 Jun 2012 21:14:02 +0100 Attilio Rao <attilio@freebsd.org>
> wrote:
> 
> > 2012/6/5 Adrian Chadd <adrian@freebsd.org>:
> > > Hi,
> > >
> > > I'm very tempted to make if_ath use KTR_DEV, but then have an extra
> > > ath sysctl which does something like:
> > >
> > > if (sc->sc_ktr_enable)
> > >    KTR();
> > 
> > But the actual problem is that your output will be overwhelmed by the
> > clutter of all the other KTR_DEV consumers.
> > 
> > We very much need an much higher granularity on KTR classes and
> > possibly a way to use it on-the-fly for kernel development and I think
> > what I suggested earlier makes sense.
> 
> How much of the uncovered uses of KTR really need KTR (instead of
> dtrace)? How many of them are time critical enough that dtrace is not
> fast enough? How many of them need to run very early so that not enough
> kernel infrastructure is available to run dtrace (can we run dtrace
> scripts very early during boot (when enough kernel infrastructure is
> available, before anything in userland starts) like in Solaris)?

Can you run a dtrace script from ddb?  (Hint: you can run 'show ktr' from
DDB, and you can use ktrdump on a crash dump to get a timeline of events when 
doing post-mortem analysis.)

-- 
John Baldwin

From owner-freebsd-arch@FreeBSD.ORG  Thu Jun  7 22:43:11 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 87CE1106564A;
	Thu,  7 Jun 2012 22:43:11 +0000 (UTC)
	(envelope-from kostikbel@gmail.com)
Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200])
	by mx1.freebsd.org (Postfix) with ESMTP id 04F768FC0C;
	Thu,  7 Jun 2012 22:43:09 +0000 (UTC)
Received: from skuns.kiev.zoral.com.ua (localhost [127.0.0.1])
	by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id q57Mh24b033391;
	Fri, 8 Jun 2012 01:43:02 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1])
	by deviant.kiev.zoral.com.ua (8.14.5/8.14.5) with ESMTP id
	q57Mh1XM031780; Fri, 8 Jun 2012 01:43:01 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
Received: (from kostik@localhost)
	by deviant.kiev.zoral.com.ua (8.14.5/8.14.5/Submit) id q57Mh18n031779; 
	Fri, 8 Jun 2012 01:43:01 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to
	kostikbel@gmail.com using -f
Date: Fri, 8 Jun 2012 01:43:01 +0300
From: Konstantin Belousov <kostikbel@gmail.com>
To: Peter Wemm <peter@wemm.org>
Message-ID: <20120607224301.GB85127@deviant.kiev.zoral.com.ua>
References: <CACfq090r1tWhuDkxdSZ24fwafbVKU0yduu1yV2+oYo+wwT4ipA@mail.gmail.com>
	<201206051008.29568.jhb@freebsd.org> <86haupvk4a.fsf@ds4.des.no>
	<201206051222.12627.jhb@freebsd.org>
	<20120605171446.GA28387@onelab2.iet.unipi.it>
	<20120606040931.F1050@besplex.bde.org> <864nqovoek.fsf@ds4.des.no>
	<20120607064951.C1106@besplex.bde.org> <86sje7sf31.fsf@ds4.des.no>
	<CAGE5yCp0VUwcPh1_L2uU=wmCh96pkrrpuZWNMNw6RuMnYPyXQw@mail.gmail.com>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="ds9maZbwT7uk2FVi"
Content-Disposition: inline
In-Reply-To: <CAGE5yCp0VUwcPh1_L2uU=wmCh96pkrrpuZWNMNw6RuMnYPyXQw@mail.gmail.com>
User-Agent: Mutt/1.4.2.3i
X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua
X-Virus-Status: Clean
X-Spam-Status: No, score=-4.0 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00
	autolearn=ham version=3.2.5
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on
	skuns.kiev.zoral.com.ua
Cc: Gianni <gianni@freebsd.org>, Alan Cox <alc@rice.edu>,
	Alexander Kabaev <kan@freebsd.org>,
	Attilio Rao <attilio@freebsd.org>, freebsd-arch@freebsd.org,
	Dag-Erling Sm?rgrav <des@des.no>
Subject: Re: Fast vs slow syscalls (Re: Fwd: [RFC] Kernel shared variables)
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 07 Jun 2012 22:43:11 -0000


--ds9maZbwT7uk2FVi
Content-Type: text/plain; charset=koi8-r
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Thu, Jun 07, 2012 at 03:30:54PM -0700, Peter Wemm wrote:
> On Thu, Jun 7, 2012 at 1:26 AM, Dag-Erling Sm?rgrav <des@des.no> wrote:
> > Bruce Evans <brde@optusnet.com.au> writes:
> >> Now 2.44 nsec/call makes sense, but you really should add some volatil=
es
> >> here to ensure that getpid() is not optimized away.
> >
> > As you can see from the disassembly I provided, it isn't.
> >
> >> SO it loops OK, but we can't see what getpid() does. =9AIt must not be
> >> doing much.
> >
> > Umm, yes, that's the whole point of this conversation. =9ALinux's getpi=
d()
> > is not a syscall, but a library function that returns a constant from a
> > page shared by the kernel.
>=20
> It might be worth taking a peek at what they do before going too far
> down the rabbit hole.  They've had to deal with the whole ABI
> stability vs kernel layout thing already.
>=20
> As I recall, they literally embed a userland style .so shared object
> into the kernel and make it available to the user.  The dynamic linker
> "finds" it via elf auxinfo and inserts it into the symbol search
> order.
>=20
> That way, the shared page layout is kernel specific.  If they chose to
> provide getpid() or gettimeofday() or whatever, its a matter of
> adjusting the shared page and inserting code into the .so file.  If
> the page changes, the code changes.
>=20
> Think of what we do with signal trampolines except in a way
> ld-elf.so.1 can pull it into user space and gdb "sees" it as a .so
> file with debug info.
>=20
> I think I remember that they did the shared page thing and then
> switched to providing a stub .so file.

Yes, this is the thing called VDSO in the thread discussion.

--ds9maZbwT7uk2FVi
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (FreeBSD)

iEYEARECAAYFAk/RLnUACgkQC3+MBN1Mb4hULgCg6R/ekHO3tW9BYjjiMafdKXmR
gccAoLWFdYgh2qDFvMB7fGpWn1myusKE
=HShI
-----END PGP SIGNATURE-----

--ds9maZbwT7uk2FVi--

From owner-freebsd-arch@FreeBSD.ORG  Thu Jun  7 22:47:05 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 31CB21065670
	for <freebsd-arch@freebsd.org>; Thu,  7 Jun 2012 22:47:05 +0000 (UTC)
	(envelope-from peter@wemm.org)
Received: from mail-ob0-f182.google.com (mail-ob0-f182.google.com
	[209.85.214.182])
	by mx1.freebsd.org (Postfix) with ESMTP id DEBC58FC17
	for <freebsd-arch@freebsd.org>; Thu,  7 Jun 2012 22:47:04 +0000 (UTC)
Received: by obcni5 with SMTP id ni5so1952667obc.13
	for <freebsd-arch@freebsd.org>; Thu, 07 Jun 2012 15:47:04 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=wemm.org; s=google;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type:content-transfer-encoding;
	bh=ZaB+F0tRHi5JZqeDAo3qPkprunOQYi/KY9XrXhcuGNM=;
	b=f6nzVzE8ee7r/Kudk434F2fagmnXcM4gFqa4a0jTyx5xEbapI/cNYUFrtGHR/DtRlZ
	irJDhqc4Erq1am8qqM87pfDtNx+S7jyOLfe/GQ28tgDNk7Asp/E8kYN03NjqftpTmtry
	ujakbrRMNC4/jyS2X8BV5FvCjrm8oSqNDE3c4=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=google.com; s=20120113;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type:content-transfer-encoding:x-gm-message-state;
	bh=ZaB+F0tRHi5JZqeDAo3qPkprunOQYi/KY9XrXhcuGNM=;
	b=nTGRU5M/+SDl4/7n9eMcyR413UlgJJ2C6nAccF3XiKNT/02mNImnlwmt8SI5xQPQxY
	4UeAgjh2Fjr3/8XXuPJbw3S7eRJxj2PV+RY3Q6RO/M5GmX1D3wNXBorDylp2eurAwvbh
	EFAGc5Wzo3vPFNxnbULfSC7t6cJmT10Jnc5rbBSSqVxssr77R3N5hzTwnU4uAd3Labe6
	KaQ1z4P5jHDcItB+DUjVLHoAom1QQImfZ+PTXvr3SrwBRiWUxpKcYzy6gIyIRVQ9g5EZ
	5KxFR28mFymu35rF5bl77zD3VMVNaMY1rcx2+64boQvSVAnQpD/wAFxASB5Cg0Lpf2rJ
	aUag==
MIME-Version: 1.0
Received: by 10.60.172.195 with SMTP id be3mr3981812oec.48.1339109224093; Thu,
	07 Jun 2012 15:47:04 -0700 (PDT)
Received: by 10.182.115.35 with HTTP; Thu, 7 Jun 2012 15:47:04 -0700 (PDT)
In-Reply-To: <20120607172839.GZ85127@deviant.kiev.zoral.com.ua>
References: <20120606165115.GQ85127@deviant.kiev.zoral.com.ua>
	<201206061423.53179.jhb@freebsd.org>
	<20120606205938.GS85127@deviant.kiev.zoral.com.ua>
	<201206070850.55751.jhb@freebsd.org>
	<20120607172839.GZ85127@deviant.kiev.zoral.com.ua>
Date: Thu, 7 Jun 2012 15:47:04 -0700
Message-ID: <CAGE5yCrk8E5DikNNVQzEZ7bkj98nxQi+aWsLsi6d4jc8vLg2PA@mail.gmail.com>
From: Peter Wemm <peter@wemm.org>
To: Konstantin Belousov <kostikbel@gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
X-Gm-Message-State: ALoCoQnX4FaIIZpmicN4p3UdGvue4oUpi+fiZK8hhtRZJmoHXmqMrlFQI/lLhYvMzrdLsHWnqGxa
Cc: freebsd-arch@freebsd.org
Subject: Re: Fast gettimeofday(2) and clock_gettime(2)
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 07 Jun 2012 22:47:05 -0000

On Thu, Jun 7, 2012 at 10:28 AM, Konstantin Belousov
<kostikbel@gmail.com> wrote:
> On Thu, Jun 07, 2012 at 08:50:55AM -0400, John Baldwin wrote:
>> On Wednesday, June 06, 2012 4:59:38 pm Konstantin Belousov wrote:
>> > On Wed, Jun 06, 2012 at 02:23:53PM -0400, John Baldwin wrote:
>> > > On Wednesday, June 06, 2012 12:51:15 pm Konstantin Belousov wrote:
>> > > > A positive result from the recent flame-bait on arch@ is the worki=
ng
>> > > > implementation of the fast gettimeofday(2) and clock_gettime(2). T=
he
>> > > > speedup I see is around 6-7x on the 2600K. I think the speedup cou=
ld
>> > > > be even bigger on the previous generation of CPUs, where lock
>> > > > operations and syscall entry are costlier. A sample test runs of
>> > > > tools/tools/syscall_timing are presented at the end of message.
>> > >
>> > > In general this looks good but I see a few nits / races:
>> > >
>> > > 1) You don't follow the model of clearing tk_current to 0 while you
>> > > =A0 =A0are updating the structure that the in-kernel timecounter cod=
e
>> > > =A0 =A0uses. =A0This also means you have to avoid using a tk_current=
 of 0
>> > > =A0 =A0and that userland has to keep spinning as long as tk_current =
is 0.
>> > > =A0 =A0Without this I believe userland can read a partially updated
>> > > =A0 =A0structure.
>> > I changed the code to be much more similar to the kern_tc.c. I (re)add=
ed
>> > the generation field, which is set to 0 upon kernel touching timehands=
.
>>
>> Thank you. =A0BTW, I think we should use atomic_load_acq_int() on both a=
ccesses
>> to th_gen (and the in-kernel binuptime should do the same). =A0I realize=
 this
>> requires using rmb before the while condition in userland since we can't
>> use atomic_load_acq_int() here. =A0I think it should also use
>> atomic_store_rel_int() for both stores to th_gen during the tc_windup()
>> callback.
> This is done. On the other hand, I removed a store_rel from updating
> tk_current, since it is after enabling store to th_gen, and the order
> there does not matter.
>
> I also did some restructuring of the userspace, removing layers that
> Bruce did not liked. Now top-level functions directly call binuptime().
> I also shortened the preliminary operations by caching timekeep pointer.
> Its double-initialization is safe.
>
> Latest version is at
> http://people.freebsd.org/~kib/misc/moronix.4.patch
>
> I probably move all shared page helpers to separate file from kern_exec.c=
,
> but this will happen after moronix is committed.

Stepping back for a moment.. why even have a shared page at all, in
common MI code?

The AMD64 kernel can simply make a page readable from within kernel
space since it's page level protected.

The i386 kernel needs the same treatment.  We can save one clock cycle
from address generation by switching to page protection for the kernel
and using a full 4GB %cs/%ds/etc.  With that fix we could do the same
there.  I've been meaning to "fix" this for about 8 years now.

There would have been no need to allocate "space" in userland for
things like signal trampolines because it could be executed directly
from a kernel page by unprivileged user code.

Things like allocating a shared page could be a MD backend decision
for architectures that don't have page level access control for where
the kernel lives.

Things like tc_fill_vdso_timehands() could go away if userland could
be allowed to directly read the kernel's version.  With a little
linker magic, the 'struct timehands' stuff could be marshaled into a
page and the auxinfo point to it.
--=20
Peter Wemm - peter@wemm.org; peter@FreeBSD.org; peter@yahoo-inc.com; KI6FJV
"All of this is for nothing if we don't go to the stars" - JMS/B5
"If Java had true garbage collection, most programs would delete
themselves upon execution." -- Robert Sewell

From owner-freebsd-arch@FreeBSD.ORG  Thu Jun  7 22:56:55 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 2DE631065680;
	Thu,  7 Jun 2012 22:56:55 +0000 (UTC)
	(envelope-from kostikbel@gmail.com)
Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200])
	by mx1.freebsd.org (Postfix) with ESMTP id A1FFA8FC17;
	Thu,  7 Jun 2012 22:56:54 +0000 (UTC)
Received: from skuns.kiev.zoral.com.ua (localhost [127.0.0.1])
	by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id q57MumjP036943;
	Fri, 8 Jun 2012 01:56:48 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1])
	by deviant.kiev.zoral.com.ua (8.14.5/8.14.5) with ESMTP id
	q57MumBX031859; Fri, 8 Jun 2012 01:56:48 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
Received: (from kostik@localhost)
	by deviant.kiev.zoral.com.ua (8.14.5/8.14.5/Submit) id q57MumTS031858; 
	Fri, 8 Jun 2012 01:56:48 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to
	kostikbel@gmail.com using -f
Date: Fri, 8 Jun 2012 01:56:48 +0300
From: Konstantin Belousov <kostikbel@gmail.com>
To: Peter Wemm <peter@wemm.org>
Message-ID: <20120607225648.GC85127@deviant.kiev.zoral.com.ua>
References: <20120606165115.GQ85127@deviant.kiev.zoral.com.ua>
	<201206061423.53179.jhb@freebsd.org>
	<20120606205938.GS85127@deviant.kiev.zoral.com.ua>
	<201206070850.55751.jhb@freebsd.org>
	<20120607172839.GZ85127@deviant.kiev.zoral.com.ua>
	<CAGE5yCrk8E5DikNNVQzEZ7bkj98nxQi+aWsLsi6d4jc8vLg2PA@mail.gmail.com>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="N+dhEFW7Y2Uiel/w"
Content-Disposition: inline
In-Reply-To: <CAGE5yCrk8E5DikNNVQzEZ7bkj98nxQi+aWsLsi6d4jc8vLg2PA@mail.gmail.com>
User-Agent: Mutt/1.4.2.3i
X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua
X-Virus-Status: Clean
X-Spam-Status: No, score=-4.0 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00
	autolearn=ham version=3.2.5
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on
	skuns.kiev.zoral.com.ua
Cc: freebsd-arch@freebsd.org
Subject: Re: Fast gettimeofday(2) and clock_gettime(2)
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 07 Jun 2012 22:56:55 -0000


--N+dhEFW7Y2Uiel/w
Content-Type: text/plain; charset=koi8-r
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Thu, Jun 07, 2012 at 03:47:04PM -0700, Peter Wemm wrote:
> On Thu, Jun 7, 2012 at 10:28 AM, Konstantin Belousov
> <kostikbel@gmail.com> wrote:
> > On Thu, Jun 07, 2012 at 08:50:55AM -0400, John Baldwin wrote:
> >> On Wednesday, June 06, 2012 4:59:38 pm Konstantin Belousov wrote:
> >> > On Wed, Jun 06, 2012 at 02:23:53PM -0400, John Baldwin wrote:
> >> > > On Wednesday, June 06, 2012 12:51:15 pm Konstantin Belousov wrote:
> >> > > > A positive result from the recent flame-bait on arch@ is the wor=
king
> >> > > > implementation of the fast gettimeofday(2) and clock_gettime(2).=
 The
> >> > > > speedup I see is around 6-7x on the 2600K. I think the speedup c=
ould
> >> > > > be even bigger on the previous generation of CPUs, where lock
> >> > > > operations and syscall entry are costlier. A sample test runs of
> >> > > > tools/tools/syscall_timing are presented at the end of message.
> >> > >
> >> > > In general this looks good but I see a few nits / races:
> >> > >
> >> > > 1) You don't follow the model of clearing tk_current to 0 while you
> >> > > =9A =9Aare updating the structure that the in-kernel timecounter c=
ode
> >> > > =9A =9Auses. =9AThis also means you have to avoid using a tk_curre=
nt of 0
> >> > > =9A =9Aand that userland has to keep spinning as long as tk_curren=
t is 0.
> >> > > =9A =9AWithout this I believe userland can read a partially updated
> >> > > =9A =9Astructure.
> >> > I changed the code to be much more similar to the kern_tc.c. I (re)a=
dded
> >> > the generation field, which is set to 0 upon kernel touching timehan=
ds.
> >>
> >> Thank you. =9ABTW, I think we should use atomic_load_acq_int() on both=
 accesses
> >> to th_gen (and the in-kernel binuptime should do the same). =9AI reali=
ze this
> >> requires using rmb before the while condition in userland since we can=
't
> >> use atomic_load_acq_int() here. =9AI think it should also use
> >> atomic_store_rel_int() for both stores to th_gen during the tc_windup()
> >> callback.
> > This is done. On the other hand, I removed a store_rel from updating
> > tk_current, since it is after enabling store to th_gen, and the order
> > there does not matter.
> >
> > I also did some restructuring of the userspace, removing layers that
> > Bruce did not liked. Now top-level functions directly call binuptime().
> > I also shortened the preliminary operations by caching timekeep pointer.
> > Its double-initialization is safe.
> >
> > Latest version is at
> > http://people.freebsd.org/~kib/misc/moronix.4.patch
> >
> > I probably move all shared page helpers to separate file from kern_exec=
.c,
> > but this will happen after moronix is committed.
>=20
> Stepping back for a moment.. why even have a shared page at all, in
> common MI code?
The decision to use shared page is delegated to MD, but MI code handles
most of the details, since there is no much difference if shared page
is used.

>=20
> The AMD64 kernel can simply make a page readable from within kernel
> space since it's page level protected.
All arches which use shared page use it this way now. See below.

>=20
> The i386 kernel needs the same treatment.  We can save one clock cycle
> from address generation by switching to page protection for the kernel
> and using a full 4GB %cs/%ds/etc.  With that fix we could do the same
> there.  I've been meaning to "fix" this for about 8 years now.
Sorry, I do not follow. Aren't we already use 4GB segments on i386 ?

>=20
> There would have been no need to allocate "space" in userland for
> things like signal trampolines because it could be executed directly
> from a kernel page by unprivileged user code.
This is how it is done already. But the shared page is mapped at the
fixed location at the usermode, which simplifies things for debugging at
least.

>=20
> Things like allocating a shared page could be a MD backend decision
> for architectures that don't have page level access control for where
> the kernel lives.
This is exactly how it is done now. Per-ABI struct sysentvec has a flag
indicating were the shared page is needed for ABI, and where to map it.

>=20
> Things like tc_fill_vdso_timehands() could go away if userland could
> be allowed to directly read the kernel's version.  With a little
> linker magic, the 'struct timehands' stuff could be marshaled into a
> page and the auxinfo point to it.
I dislike the idea of directly exporting a kernel structure into
userland, since this makes it impossible to modify kernel side of the
things. IMO rarely executed translation is not a problem, and I can
control the ABI. At least until I find time to implement VDSO, where
the problem of ABI stability for kernel->user transport will be solved
completely.

--N+dhEFW7Y2Uiel/w
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (FreeBSD)

iEYEARECAAYFAk/RMa8ACgkQC3+MBN1Mb4hAIwCgioJKGPnE7gfckztJYNCQJONj
PZYAn0rdxvVdcGmz7iM5SYF8R67ivu7G
=b1NG
-----END PGP SIGNATURE-----

--N+dhEFW7Y2Uiel/w--

From owner-freebsd-arch@FreeBSD.ORG  Thu Jun  7 22:30:55 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 54CDC106566C
	for <freebsd-arch@freebsd.org>; Thu,  7 Jun 2012 22:30:55 +0000 (UTC)
	(envelope-from peter@wemm.org)
Received: from mail-ob0-f182.google.com (mail-ob0-f182.google.com
	[209.85.214.182])
	by mx1.freebsd.org (Postfix) with ESMTP id 01EA18FC14
	for <freebsd-arch@freebsd.org>; Thu,  7 Jun 2012 22:30:54 +0000 (UTC)
Received: by obcni5 with SMTP id ni5so1932921obc.13
	for <freebsd-arch@freebsd.org>; Thu, 07 Jun 2012 15:30:54 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=wemm.org; s=google;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type:content-transfer-encoding;
	bh=i0JQgQZ9xYJMjfa7QQfX3eHxrZLrBlf9C2/DZE1+xNw=;
	b=Qx3F3jdux4RfHWpVEb+oJCyAwiTfn3EvPwaA+/PybhA/wZzAn7euLLjNkE/gHpLWgn
	luEGvpPtKkRKrw5bZISBSxlxLUQLb5tEI3CwXxUb4bRval0uw8gr35K1viseU3onNDj/
	lpfogLsqS6QrwaEHraXgzr8kxuMmDDUWIqSD4=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=google.com; s=20120113;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type:content-transfer-encoding:x-gm-message-state;
	bh=i0JQgQZ9xYJMjfa7QQfX3eHxrZLrBlf9C2/DZE1+xNw=;
	b=B+AhmlLpCfJH+7ouyzEWQ00B8AwvzdY68PGkrbwksgncxsvovtsSs9v5BmZXg0SLyR
	DgXEPoi5JKppkIgulOft7PCNo6bKdEIgQAczxuzZ7pXr0F1LJ8BIPnCQqIwMDyBNO80V
	Bj6xgm2KfoI0WzfIAI1vZasEQHy/otmwLBWJpjYzjk5AVPRZ4/wtL0kz86ssPSnEyOzv
	RQr+QSDCjDqMYw7D3/Kagonhf4jl3nl5Y18jM8wGTcZf4FHu+w4Ffp1pQwyWzNwPyZnZ
	GOIzMqYfmXuLp+OuHXL6IMruOsv4Xp7Je8KQoM4mPhBf1Me8AooemSPjHgvUbofIfCRc
	7rVA==
MIME-Version: 1.0
Received: by 10.182.115.7 with SMTP id jk7mr3954542obb.9.1339108254334; Thu,
	07 Jun 2012 15:30:54 -0700 (PDT)
Received: by 10.182.115.35 with HTTP; Thu, 7 Jun 2012 15:30:54 -0700 (PDT)
In-Reply-To: <86sje7sf31.fsf@ds4.des.no>
References: <CACfq090r1tWhuDkxdSZ24fwafbVKU0yduu1yV2+oYo+wwT4ipA@mail.gmail.com>
	<201206051008.29568.jhb@freebsd.org> <86haupvk4a.fsf@ds4.des.no>
	<201206051222.12627.jhb@freebsd.org>
	<20120605171446.GA28387@onelab2.iet.unipi.it>
	<20120606040931.F1050@besplex.bde.org> <864nqovoek.fsf@ds4.des.no>
	<20120607064951.C1106@besplex.bde.org> <86sje7sf31.fsf@ds4.des.no>
Date: Thu, 7 Jun 2012 15:30:54 -0700
Message-ID: <CAGE5yCp0VUwcPh1_L2uU=wmCh96pkrrpuZWNMNw6RuMnYPyXQw@mail.gmail.com>
From: Peter Wemm <peter@wemm.org>
To: =?ISO-8859-1?Q?Dag=2DErling_Sm=F8rgrav?= <des@des.no>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
X-Gm-Message-State: ALoCoQn4quNIkLzAfusqu3DVMtJHj7w619QSrnyXc6rIPAMjx9uWJOnRHKIDGOstWgL6MoMmbBqS
X-Mailman-Approved-At: Thu, 07 Jun 2012 23:14:33 +0000
Cc: Gianni <gianni@freebsd.org>, Alan Cox <alc@rice.edu>,
	Alexander Kabaev <kan@freebsd.org>, Attilio Rao <attilio@freebsd.org>,
	Konstantin Belousov <kib@freebsd.org>, freebsd-arch@freebsd.org,
	Konstantin Belousov <kostikbel@gmail.com>
Subject: Re: Fast vs slow syscalls (Re: Fwd: [RFC] Kernel shared variables)
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 07 Jun 2012 22:30:55 -0000

On Thu, Jun 7, 2012 at 1:26 AM, Dag-Erling Sm=F8rgrav <des@des.no> wrote:
> Bruce Evans <brde@optusnet.com.au> writes:
>> Now 2.44 nsec/call makes sense, but you really should add some volatiles
>> here to ensure that getpid() is not optimized away.
>
> As you can see from the disassembly I provided, it isn't.
>
>> SO it loops OK, but we can't see what getpid() does. =A0It must not be
>> doing much.
>
> Umm, yes, that's the whole point of this conversation. =A0Linux's getpid(=
)
> is not a syscall, but a library function that returns a constant from a
> page shared by the kernel.

It might be worth taking a peek at what they do before going too far
down the rabbit hole.  They've had to deal with the whole ABI
stability vs kernel layout thing already.

As I recall, they literally embed a userland style .so shared object
into the kernel and make it available to the user.  The dynamic linker
"finds" it via elf auxinfo and inserts it into the symbol search
order.

That way, the shared page layout is kernel specific.  If they chose to
provide getpid() or gettimeofday() or whatever, its a matter of
adjusting the shared page and inserting code into the .so file.  If
the page changes, the code changes.

Think of what we do with signal trampolines except in a way
ld-elf.so.1 can pull it into user space and gdb "sees" it as a .so
file with debug info.

I think I remember that they did the shared page thing and then
switched to providing a stub .so file.

--=20
Peter Wemm - peter@wemm.org; peter@FreeBSD.org; peter@yahoo-inc.com; KI6FJV
"All of this is for nothing if we don't go to the stars" - JMS/B5
"If Java had true garbage collection, most programs would delete
themselves upon execution." -- Robert Sewell

From owner-freebsd-arch@FreeBSD.ORG  Fri Jun  8 07:48:23 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 6BC4C106566B;
	Fri,  8 Jun 2012 07:48:23 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail16.syd.optusnet.com.au (mail16.syd.optusnet.com.au
	[211.29.132.197])
	by mx1.freebsd.org (Postfix) with ESMTP id 905258FC1B;
	Fri,  8 Jun 2012 07:48:22 +0000 (UTC)
Received: from c122-106-171-232.carlnfd1.nsw.optusnet.com.au
	(c122-106-171-232.carlnfd1.nsw.optusnet.com.au [122.106.171.232])
	by mail16.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	q587mCaB005326
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Fri, 8 Jun 2012 17:48:13 +1000
Date: Fri, 8 Jun 2012 17:48:12 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Konstantin Belousov <kostikbel@gmail.com>
In-Reply-To: <20120607172839.GZ85127@deviant.kiev.zoral.com.ua>
Message-ID: <20120608155521.S1201@besplex.bde.org>
References: <20120606165115.GQ85127@deviant.kiev.zoral.com.ua>
	<201206061423.53179.jhb@freebsd.org>
	<20120606205938.GS85127@deviant.kiev.zoral.com.ua>
	<201206070850.55751.jhb@freebsd.org>
	<20120607172839.GZ85127@deviant.kiev.zoral.com.ua>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-arch@freebsd.org
Subject: Re: Fast gettimeofday(2) and clock_gettime(2)
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 08 Jun 2012 07:48:23 -0000

On Thu, 7 Jun 2012, Konstantin Belousov wrote:

> On Thu, Jun 07, 2012 at 08:50:55AM -0400, John Baldwin wrote:
>> On Wednesday, June 06, 2012 4:59:38 pm Konstantin Belousov wrote:
>>> On Wed, Jun 06, 2012 at 02:23:53PM -0400, John Baldwin wrote:
>>>> In general this looks good but I see a few nits / races:
>>>>
>>>> 1) You don't follow the model of clearing tk_current to 0 while you
>>>>    are updating the structure that the in-kernel timecounter code
>>>>    uses.  This also means you have to avoid using a tk_current of 0
>>>>    and that userland has to keep spinning as long as tk_current is 0.
>>>>    Without this I believe userland can read a partially updated
>>>>    structure.
>>> I changed the code to be much more similar to the kern_tc.c. I (re)added
>>> the generation field, which is set to 0 upon kernel touching timehands.
>>
>> Thank you.  BTW, I think we should use atomic_load_acq_int() on both accesses
>> to th_gen (and the in-kernel binuptime should do the same).  I realize this
>> requires using rmb before the while condition in userland since we can't
>> use atomic_load_acq_int() here.  I think it should also use
>> atomic_store_rel_int() for both stores to th_gen during the tc_windup()
>> callback.

The atomic_load_acq_int() (or rmb()) would completely defeat one of
the main points in the design of binuptime(), which was to be lock-free
so as to be efficient (the atomic_store_rel_int() is rarely done so
fixing it doesn't affect efficiency, especially on x86 after kib's
recent changes removed the serialization from it).  However, I now think
the acq part of the load is needed even on x86.  x86 allows loads out of
order, except in the case where the load is from the same address of a
previous store.  So no explicit memory barrier is needed (on x86) for
loads of th_generation to be ordered relative to stores to th_generation.
But read barriers seem to be needed for loads of the variables protected
by th_generation to be ordered relative to loads of th_generation.  An
acq barrier for th_generation works somewhat bogusly (on x86) by supplying
a barrier for the one variable that doesn't need it for ordering.

The correct fix seems to be to use time-domain locking even more: set the
timehands pointer to the previous generation instead of the current one.
Modulo other bugs, this gives >= 1 msec for the previous generation to
stabilize.  Speculative loads would have to be more than 1 msec in the
past to cause problems.  But they can't be, since the thread must have
been preempted for its speculative load to live that long, and the
preemption would/should have issued a barrier instruction.  Except when
the speculative load reaches a register before the preemption -- that case
is handled by the generation count: since the timehands being used must
be more than 1 generation behind for its th_generation to change, the
memory barrier instruction for the preemption ensures that the change to
th_generation is seen, so the new timehands is loaded.

Second thoughts about whether x86 needs the acq barrier: stores to all
the variables in tc_windup() are ordered by x86 memory semantics.  This
gives them a good ordering relative to the stores to th_generation, or
at least can do this.  A similar ordering is then forced for the loads
in binuptime() etc, since x86 memory semantics ensure that each load
occurs after the corresponding store to the same address.  Maybe this
is enough, or can be made to be enough with a more careful ordering of
the stores.  This is MD and hard to understand.

> This is done. On the other hand, I removed a store_rel from updating
> tk_current, since it is after enabling store to th_gen, and the order
> there does not matter.

Sigh.  The extremeness of some locking pessimizations on an Athlon64 i386
UP are:

     rdtsc                                takes  6.5 cycles
     rdtsc; movl mem,%ecx                 takes  6.5 cycles
     xchgl mem,%ecx                       takes 32 cycles
     rdtsc; lfence; movl mem,%ecx         takes 34 cycles
     rdtsc; xchgl mem,%ecx                takes 38 cycles
     xchgl mem,%ecx; rdtsc                takes 40 cycles
     xchgl mem,%eax; rdtsc                takes 40 cycles
     rdtsc; xchgl mem,%eax                takes 44 cycles
     rdtsc; mfence; movl mem,%ecx         takes 52 cycles

So the software overheads are 5-8 times larger than the hardware overheads
for a TSC timecounter, even when we only lock a single load.  Later CPUs
have much slower rdtsc, taking 40+ cycles, so the software overheads are
relatively smaller, especially since they are mostly in parallel with
the slow rdtsc.  On core2 i386 SMP:

     rdtsc                                takes 65 cycles (yes, 10x slower)
     rdtsc movl mem,%ecx                  takes 65 cycles
     xchgl mem,%ecx                       takes 25 cycles
     rdtsc; lfence; movl mem,%ecx         takes 73 cycles
     rdtsc; xchgl mem,%ecx                takes 74 cycles
     xchgl mem,%ecx; rdtsc                takes 74 cycles
     xchgl mem,%eax; rdtsc                takes 74 cycles
     rdtsc; xchgl mem,%eax                takes 74 cycles
     rdtsc; mfence; movl mem,%ecx         takes 69 cycles (yes, beats lfence)

Note that the get*() APIs have identical locking issues, so if you fix
them by adding memory barriers they will become slower than the current
non-get*() APIs are without locking, so their existence will be more
bogus than before (except with very slow timecounter hardware).

> I also did some restructuring of the userspace, removing layers that
> Bruce did not liked. Now top-level functions directly call binuptime().
> I also shortened the preliminary operations by caching timekeep pointer.
> Its double-initialization is safe.
>
> Latest version is at
> http://people.freebsd.org/~kib/misc/moronix.4.patch

Thanks.  I didn't look at the patch.  To be happy with it, I would require:
- about 1/4 the size of the first version (6K) for at least the pure
   timecounter parts
- fix old kernel bugs:
   - figure out what needs to be done for time-domain locking
   - fix the bug reported by jhb, that times can go backwards due to old
     timehands having a slightly different frequency.
       (I tried to duplicate this in the kernel, but couldn't.  I used
       adjtime(2) with hacks to make it adjust the clock by +-0.5
       seconds/second.  A loop with "adjtime 1000; adjtime" -1000 then
       gives huge swings in the frequency.  But clock_gettime() didn't
       show any negative differences.  I think the negative difference
       can't be smaller than ~100 nsec, and since the syscall takes
       longer than that even clock wrong by a factor of 2 due to the
       hacked adjtime, it can't see negative differences.)
   - figure out what TSC-low is fixing and fix it properly.  rdtsc is
     a non-serializing instruction.  Thus it can probably appear to go
     backwards.  TSC-low normally discards a factor of 128 of the precision
     of the TSC.  At 2GHz, this reduces the precision from 0.5 nsec to
     64 nsec.  Most negative differences would become 0.  I wonder if
     TSC-low is "working" just by hiding most negative differences.
     But it can only hide most (about 127/128) of them.  E.g., if the
     out-of-order times are 64 nsec and 63.5 nsec, then after discarding
     128 low bits, the negative difference expands from -0.5 nsec to
     -64 nsec.

     Note that the large syscall overhead prevents seeing any small
     negative time differences from userland in the same way as above.
     But without that overhead, either in the kernel or in moronix
     userland, small negative time differences might be visible,
     depending on how small they are and on how efficient the functions
     are.

     TSC-low also breaks seeing small positive differences.  This
     breakage if it is not hidden by syscall overhead or inefficient
     functions.  For some uses, truncation small positive differences
     to 0 is just as bad as negative differences -- you can't distinguish
     separate events using their timestamps.  Unfortunately, timecounters
     with low resolution have this problem unavoidably.  A TSC should
     at least be able to distinguish events that are separate at the
     cycle level, though since the x86 TSC is non-serializing it has a
     different tyoe of fuzziness.  This fuzziness shouldn't be fixed
     by adding serialization instructions for it (though one for acq
     may do this accidentally), since that woukld make it much slower.
     rdtscp should rarely be used since it is serializing so it gives
     similar slowness.  Does it do any more than "cpuid; rdtsc"?

> I probably move all shared page helpers to separate file from kern_exec.c,
> but this will happen after moronix is committed.

It's still moronix?  Why would we want that? :-)

Bruce

From owner-freebsd-arch@FreeBSD.ORG  Fri Jun  8 08:03:52 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 6235D106566B;
	Fri,  8 Jun 2012 08:03:52 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail04.syd.optusnet.com.au (mail04.syd.optusnet.com.au
	[211.29.132.185])
	by mx1.freebsd.org (Postfix) with ESMTP id E7F178FC0C;
	Fri,  8 Jun 2012 08:03:51 +0000 (UTC)
Received: from c122-106-171-232.carlnfd1.nsw.optusnet.com.au
	(c122-106-171-232.carlnfd1.nsw.optusnet.com.au [122.106.171.232])
	by mail04.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	q5883gvL010688
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Fri, 8 Jun 2012 18:03:44 +1000
Date: Fri, 8 Jun 2012 18:03:42 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Konstantin Belousov <kostikbel@gmail.com>
In-Reply-To: <20120607091243.GV85127@deviant.kiev.zoral.com.ua>
Message-ID: <20120608174919.S1594@besplex.bde.org>
References: <20120606165115.GQ85127@deviant.kiev.zoral.com.ua>
	<201206061423.53179.jhb@freebsd.org>
	<20120606205938.GS85127@deviant.kiev.zoral.com.ua>
	<20120607130029.K1962@besplex.bde.org>
	<20120607091243.GV85127@deviant.kiev.zoral.com.ua>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-arch@freebsd.org
Subject: Re: Fast gettimeofday(2) and clock_gettime(2)
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 08 Jun 2012 08:03:52 -0000

On Thu, 7 Jun 2012, Konstantin Belousov wrote:

> On Thu, Jun 07, 2012 at 01:00:34PM +1000, Bruce Evans wrote:
>>
>> tc_windup()'s close in succession are bugs, since they cycle the timehands
>> faster than they were designed to be.  We already have too many of these
>> bugs (where tc_setclock() calls tc_windup().  I didn't notice this
>> particular problem with it before).  Now I will point out that version
>> 2 of your patch adds more of these calls, apparently to get changes to
>> happen sooner.  But in sysctl_kern_timecounter_hardware(), such a call
>> was intentionaly left out since it is not needed.  Note that tc_tick
>> prevents calls to tc_windup() more often than about once per msec if
>> hz > 1000.
> No, I did not added more tc_windup calls. I added a recalculation
> of the shared page content on the timecounter change, which is not
> the same as tc_windup() call. This is exactly to handle a disable
> of usermode rdtsc use when kernel timecounter hardware changes.

Oops.  I saw a parameter named tc_windup and didn't look too closely
at the event handler for this.  Please use a slightly different name.

Frequent updates of the shared page may cause the same too-fast cycling
as frequent calls to tc_windup().  Are event handlers rate-limited?
If not, then someone changing the timecounter hardware from a loop
in userland could cause similar problems to a settimeofday() loop.
Both are privileged operations so this is not a large problem, but it
is a stress test that should pass.

>>  [jhb wrote]
>>> There was apparently another issue with version 2. The bcopy() is not
>>> atomic, so potentially libc could read wrong tk_current. I redid
>>> the interface to write to the shared page to allow use of real atomics.
>>
>> Timecounter code is supposed to be lock-free except for some time-domain
>> locking.  I only see 1 problem with this: where tc_windup() writes the
>> generation count and other things without asking for these writes to
>> be ordered.  In most cases, the time-domain locking prevents problems.
> In fact, on x86 the ordering is strong enough that no barriers are needed,
> this is why the problem goes unnoticed so far.

Only the x86 write ordering is clearly strong enough (see another reply).

Bruce

From owner-freebsd-arch@FreeBSD.ORG  Fri Jun  8 08:39:33 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id D41A1106564A;
	Fri,  8 Jun 2012 08:39:33 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail07.syd.optusnet.com.au (mail07.syd.optusnet.com.au
	[211.29.132.188])
	by mx1.freebsd.org (Postfix) with ESMTP id 66C748FC0A;
	Fri,  8 Jun 2012 08:39:33 +0000 (UTC)
Received: from c122-106-171-232.carlnfd1.nsw.optusnet.com.au
	(c122-106-171-232.carlnfd1.nsw.optusnet.com.au [122.106.171.232])
	by mail07.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	q588dUPu023160
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Fri, 8 Jun 2012 18:39:31 +1000
Date: Fri, 8 Jun 2012 18:39:30 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: John Baldwin <jhb@freebsd.org>
In-Reply-To: <201206070810.08166.jhb@freebsd.org>
Message-ID: <20120608180723.S1641@besplex.bde.org>
References: <20120606165115.GQ85127@deviant.kiev.zoral.com.ua>
	<201206061423.53179.jhb@freebsd.org>
	<20120607084229.C1474@besplex.bde.org>
	<201206070810.08166.jhb@freebsd.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: Konstantin Belousov <kostikbel@gmail.com>, freebsd-arch@freebsd.org
Subject: Re: Fast gettimeofday(2) and clock_gettime(2)
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 08 Jun 2012 08:39:33 -0000

On Thu, 7 Jun 2012, John Baldwin wrote:

> On Wednesday, June 06, 2012 9:35:49 pm Bruce Evans wrote:
>> On Wed, 6 Jun 2012, John Baldwin wrote:

>>> 1) You don't follow the model of clearing tk_current to 0 while you
>>>   are updating the structure that the in-kernel timecounter code
>>>   uses.  This also means you have to avoid using a tk_current of 0
>>>   and that userland has to keep spinning as long as tk_current is 0.
>>>   Without this I believe userland can read a partially updated
>>>   structure.
>>
>> I thought that too at first, but after looking at the patch decided
>> that it may be correct, but is too hard for me to understand.
>> Urk, we both missed that tk_current is an index into the timehands
>> array, so it cannot act as a generation count and it seems to be harder
>> to lock.
>
> Ugh, so it goes a long way to emulate the timehands array in userland.  As I
> mentioned previously, I consider the timehands array to be a bug.  However, I
> do think the generation count in the in-kernel timehands structure is useful
> and should be kept (and follow the same model of setting it to 0 before doing
> updates, then updating the structure, then setting the new generation).

Without the timehands array you will need slow atomic ops or maybe MD magic
to make them unnecessary.

I think 3 generations are necessary and sufficient: one pointed to by
timehands for normal use; another that used to be pointed to be
timehands and that remains valid for 1 more generation time after
timehands was switched away from it, and one
invalid/unready/being_updated one that will become the one pointed to
by timehands 1 generation after it was updated.  2 generations are
needed instead of 1 to allow updating one while the other remains
usable, and 3 generations are needed instead of 1 to ensure that the
one pointed to by timehands remains valid for a full generation time
(average 1.5 generation times) after any read of timehands.

Bruce

From owner-freebsd-arch@FreeBSD.ORG  Fri Jun  8 09:16:24 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 9F329106574B;
	Fri,  8 Jun 2012 09:16:24 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail09.syd.optusnet.com.au (mail09.syd.optusnet.com.au
	[211.29.132.190])
	by mx1.freebsd.org (Postfix) with ESMTP id 2809B8FC17;
	Fri,  8 Jun 2012 09:16:23 +0000 (UTC)
Received: from c122-106-171-232.carlnfd1.nsw.optusnet.com.au
	(c122-106-171-232.carlnfd1.nsw.optusnet.com.au [122.106.171.232])
	by mail09.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	q589G717025135
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Fri, 8 Jun 2012 19:16:09 +1000
Date: Fri, 8 Jun 2012 19:16:07 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Konstantin Belousov <kostikbel@gmail.com>
In-Reply-To: <20120607100401.GW85127@deviant.kiev.zoral.com.ua>
Message-ID: <20120608185204.T1708@besplex.bde.org>
References: <CACfq090r1tWhuDkxdSZ24fwafbVKU0yduu1yV2+oYo+wwT4ipA@mail.gmail.com>
	<201206051008.29568.jhb@freebsd.org> <86haupvk4a.fsf@ds4.des.no>
	<201206051222.12627.jhb@freebsd.org>
	<20120605171446.GA28387@onelab2.iet.unipi.it>
	<20120606040931.F1050@besplex.bde.org> <864nqovoek.fsf@ds4.des.no>
	<20120607064951.C1106@besplex.bde.org> <86sje7sf31.fsf@ds4.des.no>
	<20120607100401.GW85127@deviant.kiev.zoral.com.ua>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: Dag-Erling Sm??rgrav <des@des.no>, freebsd-arch@freebsd.org
Subject: Re: Fast vs slow syscalls (Re: Fwd: [RFC] Kernel shared variables)
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 08 Jun 2012 09:16:24 -0000

On Thu, 7 Jun 2012, Konstantin Belousov wrote:

> On Thu, Jun 07, 2012 at 10:26:10AM +0200, Dag-Erling Sm??rgrav wrote:
>> Bruce Evans <brde@optusnet.com.au> writes:
>>> Now 2.44 nsec/call makes sense, but you really should add some volatiles
>>> here to ensure that getpid() is not optimized away.
>>
>> As you can see from the disassembly I provided, it isn't.
>>
>>> SO it loops OK, but we can't see what getpid() does.  It must not be
>>> doing much.
>>
>> Umm, yes, that's the whole point of this conversation.  Linux's getpid()
>> is not a syscall, but a library function that returns a constant from a
>> page shared by the kernel.

Of course, but were down to nearly single-cycle times, so the difference
between the libary function using 1 or 2 instructions to load the value
may be significant.

>>> 5.4104 nsec/call for gettimeofday() is impossible if there is any
>>> rdtsc() hardware call or much layering.
>>
>> It's gettimeofday(0, 0), actually, so it doesn't need to read the clock.
>> If I pass a struct timeval as the first argument - so it *does* need to
>> read the clock - it's a little bit slower but still faster than an
>> actual system call.  Here's another run that demonstrates this - a
>> little bit slower than previous runs because I have other processes
>> running:
>>
>> getpid(): 10,000,000 iterations in 30,377 us
>> gettimeofday(0, 0): 10,000,000 iterations in 55,571 us
>> gettimeofday(&tv, 0): 10,000,000 iterations in 302,634 us
> So this timing seems to be approximately same by the order of magnitude
> as the times I get for the patch, around 25 vs. 30ns/per gettimeofday()
> call.

Not great.  I get 6.97 nsec for a slightly reduced version of FreeBSD's
1998 version of microtime(), which was written in i386 asm.  (This depends
on rdtsc taking only 6.5 cycles = 3.25 nsec on the test CPU (Athlon64)).
>From rev.1.40 of microtime.s:

% #include <machine/asm.h>
% 
% ENTRY(microtime)
% 	movl	tsc_freq, %ecx
% 	testl	%ecx, %ecx
% 	je	i8254_microtime

This branch is predicted perfectly but costs 0.24 nsec (0.5 cycles).

% 	rdtsc
% 	subl	tsc_bias, %eax
% 	mull	tsc_multiplier
% 	movl	%edx, %eax
% 	addl	timeoff+4, %eax	/* usec += time.tv_sec */
% 	movl	timeoff, %edx	/* sec = time.tv_sec */

Similar to binuptime().  To convert from the old microtime.s, I just
converted the variable names from aout to elf (and supplied dummy
variables), and removed locking instructions, which were pushfl/cli/popfl).

% 
% 	cmpl	$1000000, %eax	/* usec valid? */
% 	jb	1f
% 	subl	$1000000, %eax	/* adjust usec */
% 	incl	%edx		/* bump sec */

Probably faster with bintimes (can be branch-free then (?)), but by
converting directly to the final format we avoid a scaling step.  The
branch in it is predicted too perfectly by my dummy variables.

% 1:
% 	movl	4(%esp), %ecx	/* load timeval pointer arg */
% 	movl	%edx, (%ecx)	/* tvp->tv_sec = sec */
% 	movl	%eax, 4(%ecx)	/* tvp->tv_usec = usec */
% 
% 	ret
% 
% i8254_microtime:
% 	ret			/* XXX garbage */

>
> Linux seems slower probably due to slower CPU ? Mine is 3.4Ghz, while
> des used 3.1Ghz for Linux box.

If it is a different CPU model, the the speed of rdtsc can vary a lot.

Bruce

From owner-freebsd-arch@FreeBSD.ORG  Fri Jun  8 11:29:01 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id A17321065674;
	Fri,  8 Jun 2012 11:29:01 +0000 (UTC)
	(envelope-from kostikbel@gmail.com)
Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200])
	by mx1.freebsd.org (Postfix) with ESMTP id 0DBA08FC12;
	Fri,  8 Jun 2012 11:29:00 +0000 (UTC)
Received: from skuns.kiev.zoral.com.ua (localhost [127.0.0.1])
	by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id q58BSoh7000719;
	Fri, 8 Jun 2012 14:28:50 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1])
	by deviant.kiev.zoral.com.ua (8.14.5/8.14.5) with ESMTP id
	q58BSotU035924; Fri, 8 Jun 2012 14:28:50 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
Received: (from kostik@localhost)
	by deviant.kiev.zoral.com.ua (8.14.5/8.14.5/Submit) id q58BSolw035923; 
	Fri, 8 Jun 2012 14:28:50 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to
	kostikbel@gmail.com using -f
Date: Fri, 8 Jun 2012 14:28:50 +0300
From: Konstantin Belousov <kostikbel@gmail.com>
To: Bruce Evans <brde@optusnet.com.au>
Message-ID: <20120608112850.GE85127@deviant.kiev.zoral.com.ua>
References: <20120606165115.GQ85127@deviant.kiev.zoral.com.ua>
	<201206061423.53179.jhb@freebsd.org>
	<20120606205938.GS85127@deviant.kiev.zoral.com.ua>
	<201206070850.55751.jhb@freebsd.org>
	<20120607172839.GZ85127@deviant.kiev.zoral.com.ua>
	<20120608155521.S1201@besplex.bde.org>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="mu4SaHkdL1Az71rA"
Content-Disposition: inline
In-Reply-To: <20120608155521.S1201@besplex.bde.org>
User-Agent: Mutt/1.4.2.3i
X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua
X-Virus-Status: Clean
X-Spam-Status: No, score=-4.0 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00
	autolearn=ham version=3.2.5
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on
	skuns.kiev.zoral.com.ua
Cc: freebsd-arch@freebsd.org
Subject: Re: Fast gettimeofday(2) and clock_gettime(2)
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 08 Jun 2012 11:29:01 -0000


--mu4SaHkdL1Az71rA
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Fri, Jun 08, 2012 at 05:48:12PM +1000, Bruce Evans wrote:
> On Thu, 7 Jun 2012, Konstantin Belousov wrote:
>=20
> >On Thu, Jun 07, 2012 at 08:50:55AM -0400, John Baldwin wrote:
> >>On Wednesday, June 06, 2012 4:59:38 pm Konstantin Belousov wrote:
> >>>On Wed, Jun 06, 2012 at 02:23:53PM -0400, John Baldwin wrote:
> >>>>In general this looks good but I see a few nits / races:
> >>>>
> >>>>1) You don't follow the model of clearing tk_current to 0 while you
> >>>>   are updating the structure that the in-kernel timecounter code
> >>>>   uses.  This also means you have to avoid using a tk_current of 0
> >>>>   and that userland has to keep spinning as long as tk_current is 0.
> >>>>   Without this I believe userland can read a partially updated
> >>>>   structure.
> >>>I changed the code to be much more similar to the kern_tc.c. I (re)add=
ed
> >>>the generation field, which is set to 0 upon kernel touching timehands.
> >>
> >>Thank you.  BTW, I think we should use atomic_load_acq_int() on both=20
> >>accesses
> >>to th_gen (and the in-kernel binuptime should do the same).  I realize=
=20
> >>this
> >>requires using rmb before the while condition in userland since we can't
> >>use atomic_load_acq_int() here.  I think it should also use
> >>atomic_store_rel_int() for both stores to th_gen during the tc_windup()
> >>callback.
>=20
> The atomic_load_acq_int() (or rmb()) would completely defeat one of
> the main points in the design of binuptime(), which was to be lock-free
> so as to be efficient (the atomic_store_rel_int() is rarely done so
> fixing it doesn't affect efficiency, especially on x86 after kib's
> recent changes removed the serialization from it).  However, I now think
> the acq part of the load is needed even on x86.  x86 allows loads out of
> order, except in the case where the load is from the same address of a
> previous store.  So no explicit memory barrier is needed (on x86) for
> loads of th_generation to be ordered relative to stores to th_generation.
> But read barriers seem to be needed for loads of the variables protected
> by th_generation to be ordered relative to loads of th_generation.  An
> acq barrier for th_generation works somewhat bogusly (on x86) by supplying
> a barrier for the one variable that doesn't need it for ordering.
load_acq is not a lock, it is serialization.

>=20
> The correct fix seems to be to use time-domain locking even more: set the
> timehands pointer to the previous generation instead of the current one.
> Modulo other bugs, this gives >=3D 1 msec for the previous generation to
> stabilize.  Speculative loads would have to be more than 1 msec in the
> past to cause problems.  But they can't be, since the thread must have
> been preempted for its speculative load to live that long, and the
> preemption would/should have issued a barrier instruction.  Except when
> the speculative load reaches a register before the preemption -- that case
> is handled by the generation count: since the timehands being used must
> be more than 1 generation behind for its th_generation to change, the
> memory barrier instruction for the preemption ensures that the change to
> th_generation is seen, so the new timehands is loaded.
>=20
> Second thoughts about whether x86 needs the acq barrier: stores to all
> the variables in tc_windup() are ordered by x86 memory semantics.  This
> gives them a good ordering relative to the stores to th_generation, or
> at least can do this.  A similar ordering is then forced for the loads
> in binuptime() etc, since x86 memory semantics ensure that each load
> occurs after the corresponding store to the same address.  Maybe this
> is enough, or can be made to be enough with a more careful ordering of
> the stores.  This is MD and hard to understand.
The ordering of loads reg. stores to the same address only happens
on the same core. On x86, loads cannot be reordered with other loads,
but potentially this could happen on other arches.

>=20
> >This is done. On the other hand, I removed a store_rel from updating
> >tk_current, since it is after enabling store to th_gen, and the order
> >there does not matter.
>=20
> Sigh.  The extremeness of some locking pessimizations on an Athlon64 i386
> UP are:
>=20
>     rdtsc                                takes  6.5 cycles
>     rdtsc; movl mem,%ecx                 takes  6.5 cycles
>     xchgl mem,%ecx                       takes 32 cycles
>     rdtsc; lfence; movl mem,%ecx         takes 34 cycles
>     rdtsc; xchgl mem,%ecx                takes 38 cycles
>     xchgl mem,%ecx; rdtsc                takes 40 cycles
>     xchgl mem,%eax; rdtsc                takes 40 cycles
>     rdtsc; xchgl mem,%eax                takes 44 cycles
>     rdtsc; mfence; movl mem,%ecx         takes 52 cycles
>=20
> So the software overheads are 5-8 times larger than the hardware overheads
> for a TSC timecounter, even when we only lock a single load.  Later CPUs
> have much slower rdtsc, taking 40+ cycles, so the software overheads are
> relatively smaller, especially since they are mostly in parallel with
> the slow rdtsc.  On core2 i386 SMP:
I suspect that what you measured for fence overhead is actually a time
to retire whole queue or read (and/or write) requests accumulated so far
in the pipeline, and not the overhead of synchronous rdtsc read.

>=20
>     rdtsc                                takes 65 cycles (yes, 10x slower)
>     rdtsc movl mem,%ecx                  takes 65 cycles
>     xchgl mem,%ecx                       takes 25 cycles
>     rdtsc; lfence; movl mem,%ecx         takes 73 cycles
>     rdtsc; xchgl mem,%ecx                takes 74 cycles
>     xchgl mem,%ecx; rdtsc                takes 74 cycles
>     xchgl mem,%eax; rdtsc                takes 74 cycles
>     rdtsc; xchgl mem,%eax                takes 74 cycles
>     rdtsc; mfence; movl mem,%ecx         takes 69 cycles (yes, beats lfen=
ce)
>=20
> Note that the get*() APIs have identical locking issues, so if you fix
> them by adding memory barriers they will become slower than the current
> non-get*() APIs are without locking, so their existence will be more
> bogus than before (except with very slow timecounter hardware).
>=20
> >I also did some restructuring of the userspace, removing layers that
> >Bruce did not liked. Now top-level functions directly call binuptime().
> >I also shortened the preliminary operations by caching timekeep pointer.
> >Its double-initialization is safe.
> >
> >Latest version is at
> >http://people.freebsd.org/~kib/misc/moronix.4.patch
>=20
> Thanks.  I didn't look at the patch.  To be happy with it, I would requir=
e:
> - about 1/4 the size of the first version (6K) for at least the pure
>   timecounter parts
This might already happen, since I removed the layering you did not
liked, from usermode.

> - fix old kernel bugs:
>   - figure out what needs to be done for time-domain locking
>   - fix the bug reported by jhb, that times can go backwards due to old
>     timehands having a slightly different frequency.
>       (I tried to duplicate this in the kernel, but couldn't.  I used
>       adjtime(2) with hacks to make it adjust the clock by +-0.5
>       seconds/second.  A loop with "adjtime 1000; adjtime" -1000 then
>       gives huge swings in the frequency.  But clock_gettime() didn't
>       show any negative differences.  I think the negative difference
>       can't be smaller than ~100 nsec, and since the syscall takes
>       longer than that even clock wrong by a factor of 2 due to the
>       hacked adjtime, it can't see negative differences.)
>   - figure out what TSC-low is fixing and fix it properly.  rdtsc is
>     a non-serializing instruction.  Thus it can probably appear to go
>     backwards.  TSC-low normally discards a factor of 128 of the precision
>     of the TSC.  At 2GHz, this reduces the precision from 0.5 nsec to
>     64 nsec.  Most negative differences would become 0.  I wonder if
>     TSC-low is "working" just by hiding most negative differences.
>     But it can only hide most (about 127/128) of them.  E.g., if the
>     out-of-order times are 64 nsec and 63.5 nsec, then after discarding
>     128 low bits, the negative difference expands from -0.5 nsec to
>     -64 nsec.
The goal of the patch is only to move the code from kernel into userspace,
trying not to change algorithms. The potential changes you describe
above should be done both in kernel in usermode after that.

>=20
>     Note that the large syscall overhead prevents seeing any small
>     negative time differences from userland in the same way as above.
>     But without that overhead, either in the kernel or in moronix
>     userland, small negative time differences might be visible,
>     depending on how small they are and on how efficient the functions
>     are.
So far I did not see time going backward in tight gettimeofday() loop.
This indeed is one of my main worries.

>=20
>     TSC-low also breaks seeing small positive differences.  This
>     breakage if it is not hidden by syscall overhead or inefficient
>     functions.  For some uses, truncation small positive differences
>     to 0 is just as bad as negative differences -- you can't distinguish
>     separate events using their timestamps.  Unfortunately, timecounters
>     with low resolution have this problem unavoidably.  A TSC should
>     at least be able to distinguish events that are separate at the
>     cycle level, though since the x86 TSC is non-serializing it has a
>     different tyoe of fuzziness.  This fuzziness shouldn't be fixed
>     by adding serialization instructions for it (though one for acq
>     may do this accidentally), since that woukld make it much slower.
>     rdtscp should rarely be used since it is serializing so it gives
>     similar slowness.  Does it do any more than "cpuid; rdtsc"?
rdtscp allows to atomically get current package tsc counter and obtain
some reference to current core identifier. If we produce 'skew tables'
to compensate different tsc initial values and possible drift, then
we could use tsc counter on wider range of hardware, by adjusting
returned value from rdtsc by skew table repair addendum. Rdtscp is
atomic in this respect.

>=20
> >I probably move all shared page helpers to separate file from kern_exec.=
c,
> >but this will happen after moronix is committed.
>=20
> It's still moronix?  Why would we want that? :-)
>=20
> Bruce

--mu4SaHkdL1Az71rA
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (FreeBSD)

iEYEARECAAYFAk/R4fIACgkQC3+MBN1Mb4hUfwCfWVsDpo5c5y89qYoQ8fjjnZcJ
NZEAoMdzFuDdVtdE4xlacrcpES1fJ5Zr
=94F2
-----END PGP SIGNATURE-----

--mu4SaHkdL1Az71rA--

From owner-freebsd-arch@FreeBSD.ORG  Fri Jun  8 12:43:42 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id A003B1065688;
	Fri,  8 Jun 2012 12:43:42 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail04.syd.optusnet.com.au (mail04.syd.optusnet.com.au
	[211.29.132.185])
	by mx1.freebsd.org (Postfix) with ESMTP id 1A36A8FC14;
	Fri,  8 Jun 2012 12:43:41 +0000 (UTC)
Received: from c122-106-171-232.carlnfd1.nsw.optusnet.com.au
	(c122-106-171-232.carlnfd1.nsw.optusnet.com.au [122.106.171.232])
	by mail04.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	q58Chc8a011143
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Fri, 8 Jun 2012 22:43:39 +1000
Date: Fri, 8 Jun 2012 22:43:38 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Konstantin Belousov <kostikbel@gmail.com>
In-Reply-To: <20120608112850.GE85127@deviant.kiev.zoral.com.ua>
Message-ID: <20120608215043.Q2736@besplex.bde.org>
References: <20120606165115.GQ85127@deviant.kiev.zoral.com.ua>
	<201206061423.53179.jhb@freebsd.org>
	<20120606205938.GS85127@deviant.kiev.zoral.com.ua>
	<201206070850.55751.jhb@freebsd.org>
	<20120607172839.GZ85127@deviant.kiev.zoral.com.ua>
	<20120608155521.S1201@besplex.bde.org>
	<20120608112850.GE85127@deviant.kiev.zoral.com.ua>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-arch@freebsd.org
Subject: Re: Fast gettimeofday(2) and clock_gettime(2)
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 08 Jun 2012 12:43:42 -0000

On Fri, 8 Jun 2012, Konstantin Belousov wrote:

> On Fri, Jun 08, 2012 at 05:48:12PM +1000, Bruce Evans wrote:
>> On Thu, 7 Jun 2012, Konstantin Belousov wrote:
>>
>>> On Thu, Jun 07, 2012 at 08:50:55AM -0400, John Baldwin wrote:
>>>> 
>>>> Thank you.  BTW, I think we should use atomic_load_acq_int() on both
>>>> accesses
>>>> to th_gen (and the in-kernel binuptime should do the same).  I realize
>>>> ...
>>
>> The atomic_load_acq_int() (or rmb()) would completely defeat one of
>> the main points in the design of binuptime(), which was to be lock-free
>> ...
>> by th_generation to be ordered relative to loads of th_generation.  An
>> acq barrier for th_generation works somewhat bogusly (on x86) by supplying
>> a barrier for the one variable that doesn't need it for ordering.
> load_acq is not a lock, it is serialization.

By "lock-free", I meant "free of all types of locks and atomic ops,
including for example the x86 lock prefix which is not a lock but is
often used to implement locks via its serialization properties".  Then
I wrote "barrier", but noted that this is acting strangely by turning
th_generation into a locking gate that locks other variables.  I think
this depends on th_generation being loaded first (in program order) in
binuptime() etc.  It would be more natural to put the read barrier before
the first read of another variable.

>> Second thoughts about whether x86 needs the acq barrier: stores to all
>> the variables in tc_windup() are ordered by x86 memory semantics.  This
>> gives them a good ordering relative to the stores to th_generation, or
>> at least can do this.  A similar ordering is then forced for the loads
>> in binuptime() etc, since x86 memory semantics ensure that each load
>> occurs after the corresponding store to the same address.  Maybe this
>> is enough, or can be made to be enough with a more careful ordering of
>> the stores.  This is MD and hard to understand.
> The ordering of loads reg. stores to the same address only happens
> on the same core.

So my first thoughts were better.

> On x86, loads cannot be reordered with other loads,
> but potentially this could happen on other arches.

I think you mean stores cannot be reordered with other stores.

>>> This is done. On the other hand, I removed a store_rel from updating
>>> tk_current, since it is after enabling store to th_gen, and the order
>>> there does not matter.
>>
>> Sigh.  The extremeness of some locking pessimizations on an Athlon64 i386
>> UP are:
>>
>>     rdtsc                                takes  6.5 cycles
>>     rdtsc; movl mem,%ecx                 takes  6.5 cycles
>>     xchgl mem,%ecx                       takes 32 cycles
>>     rdtsc; lfence; movl mem,%ecx         takes 34 cycles
>>     rdtsc; xchgl mem,%ecx                takes 38 cycles
>>     xchgl mem,%ecx; rdtsc                takes 40 cycles
>>     xchgl mem,%eax; rdtsc                takes 40 cycles
>>     rdtsc; xchgl mem,%eax                takes 44 cycles
>>     rdtsc; mfence; movl mem,%ecx         takes 52 cycles

All except the first 2 here are twice as high as they should be.

>> So the software overheads are 5-8 times larger than the hardware overheads
>> for a TSC timecounter, even when we only lock a single load.  Later CPUs

2.5-4 times.

>> have much slower rdtsc, taking 40+ cycles, so the software overheads are
>> relatively smaller, especially since they are mostly in parallel with
>> the slow rdtsc.  On core2 i386 SMP:
> I suspect that what you measured for fence overhead is actually a time
> to retire whole queue or read (and/or write) requests accumulated so far
> in the pipeline, and not the overhead of synchronous rdtsc read.

Yes, full serialization probably takes much longer.  I don't know of any
better serialization instruction than cpuid (if rdtscp is not available).
More times for Athlon64:

     rdtsc                                takes  6.5 cycles
     lfence; rdtsc                        takes 17 cycles
     rdtsc; lfence; movl mem,%ecx         takes 17 cycles (correction of above)
     cpuid; rdtsc                         takes 63 cycles

>>     rdtsc                                takes 65 cycles (yes, 10x slower)
>>     rdtsc movl mem,%ecx                  takes 65 cycles
>>     xchgl mem,%ecx                       takes 25 cycles
>>     rdtsc; lfence; movl mem,%ecx         takes 73 cycles
>>     rdtsc; xchgl mem,%ecx                takes 74 cycles
>>     xchgl mem,%ecx; rdtsc                takes 74 cycles
>>     xchgl mem,%eax; rdtsc                takes 74 cycles
>>     rdtsc; xchgl mem,%eax                takes 74 cycles
>>     rdtsc; mfence; movl mem,%ecx         takes 69 cycles (yes, beats lfence)

These times (for core2) are correct.  Now with cpuid:

     rdtsc                                takes  6.5 cycles
     lfence; rdtsc                        takes 75 cycles
     rdtsc; lfence; movl mem,%ecx         takes 73 cycles (correct above)
     cpuid; rdtsc                         takes 298 cycles (gak!)

>> - fix old kernel bugs:
>> ...
> The goal of the patch is only to move the code from kernel into userspace,
> trying not to change algorithms. The potential changes you describe
> above should be done both in kernel in usermode after that.

I think being more efficient might expose more races.  With syscalls,
small and negative time differences can't be seen since the syscall
takes longer.  With kernel calls, small and negative time differences
shouldn't happen since the kernel shouldn't be silly enough to spin
calling a timecounter function.

>>     Note that the large syscall overhead prevents seeing any small
>>     negative time differences from userland in the same way as above.
>>     But without that overhead, either in the kernel or in moronix
>>     userland, small negative time differences might be visible,
>>     depending on how small they are and on how efficient the functions
>>     are.
> So far I did not see time going backward in tight gettimeofday() loop.
> This indeed is one of my main worries.

Try taking out the shift.  I plan to try to get out of order loads using
cache misses.  Not sure how that would give out of order rdtsc's.

>> ...
>>     different tyoe of fuzziness.  This fuzziness shouldn't be fixed
>>     by adding serialization instructions for it (though one for acq
>>     may do this accidentally), since that woukld make it much slower.
>>     rdtscp should rarely be used since it is serializing so it gives
>>     similar slowness.  Does it do any more than "cpuid; rdtsc"?
> rdtscp allows to atomically get current package tsc counter and obtain
> some reference to current core identifier. If we produce 'skew tables'
> to compensate different tsc initial values and possible drift, then
> we could use tsc counter on wider range of hardware, by adjusting
> returned value from rdtsc by skew table repair addendum. Rdtscp is
> atomic in this respect.

rdtscp would be too slow if it is as slow as the above for cpuid; rdtsc.
But at least early phenom docs say that both rdtsc and rdtscp take 41+6
cycles.  I read a bit more of its documentation.  It seems to be exactly
rdtsc with the serialization and core number load, and without the
register clobbering and extra overhead of the cpuid instruction.  I
only noticed the other day, when someone fixed it, that the kernel
already has this skew adjustment in dtrace code.  The adjustment was
backwards...  The index to the skew table is curcpu.  There is a
sched_pin() in the initialization code, but none in the timer read
code, so I don't see how the latter can work right even if the adjustment
is forwards.  Unless the caller always does the sched_pin(), but that
would be slow and probably undocumented.

Bruce