From owner-svn-src-all@FreeBSD.ORG  Tue Oct  4 11:06:00 2011
Return-Path: <owner-svn-src-all@FreeBSD.ORG>
Delivered-To: svn-src-all@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id BF920106564A;
	Tue,  4 Oct 2011 11:06:00 +0000 (UTC)
	(envelope-from asmrookie@gmail.com)
Received: from mail-ey0-f182.google.com (mail-ey0-f182.google.com
	[209.85.215.182])
	by mx1.freebsd.org (Postfix) with ESMTP id E74188FC19;
	Tue,  4 Oct 2011 11:05:59 +0000 (UTC)
Received: by eyg7 with SMTP id 7so476399eyg.13
	for <multiple recipients>; Tue, 04 Oct 2011 04:05:58 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=mime-version:sender:in-reply-to:references:date
	:x-google-sender-auth:message-id:subject:from:to:cc:content-type
	:content-transfer-encoding;
	bh=85G6l7cbX+bnfWj91twVejGLLqSpj4zu17ILLiM7b70=;
	b=HZ4NW9D7mIz8x8wpVpfwbP6tjPqhQg1nwMBMpafq4oLiDKKn5kxHTD8VEOqXH15qCx
	/JbH0s9A+CGlfB9OZVNXIHre5+cIla61P53a+ukZGpC5OJUTuzsksLPCPoAay8Xctrww
	/nDUbnch/8hk6NkgoFFUnXsoUsobJby3mw5tM=
MIME-Version: 1.0
Received: by 10.216.134.201 with SMTP id s51mr753440wei.27.1317726358183; Tue,
	04 Oct 2011 04:05:58 -0700 (PDT)
Sender: asmrookie@gmail.com
Received: by 10.216.182.3 with HTTP; Tue, 4 Oct 2011 04:05:58 -0700 (PDT)
In-Reply-To: <20111004043232.K11186@besplex.bde.org>
References: <201109041307.p84D72GY092462@svn.freebsd.org>
	<CAJ-FndDa=xmvrcn9CdgMvDZ_vG3pjUdFqLH=Q+VK+SWjvGFO9g@mail.gmail.com>
	<20110905023251.C832@besplex.bde.org>
	<CAJ-FndD+fWPW_XFAChG5UbOR3iAvU5i7Lo35=8J_1gdRWmGcLg@mail.gmail.com>
	<20111004043232.K11186@besplex.bde.org>
Date: Tue, 4 Oct 2011 13:05:58 +0200
X-Google-Sender-Auth: FGn8NYRK9qDQVJIYm0Sgu-_qH7Q
Message-ID: <CAJ-FndDU2yTXLuFi+6ARz-0ws1E=x3=PNnoFH=gdJ7QEzKAetA@mail.gmail.com>
From: Attilio Rao <attilio@freebsd.org>
To: Bruce Evans <brde@optusnet.com.au>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org,
	src-committers@freebsd.org
Subject: Re: svn commit: r225372 - head/sys/kern
X-BeenThere: svn-src-all@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "SVN commit messages for the entire src tree \(except for &quot;
	user&quot; and &quot; projects&quot; \)" <svn-src-all.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/svn-src-all>,
	<mailto:svn-src-all-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/svn-src-all>
List-Post: <mailto:svn-src-all@freebsd.org>
List-Help: <mailto:svn-src-all-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/svn-src-all>,
	<mailto:svn-src-all-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 04 Oct 2011 11:06:00 -0000

2011/10/3 Bruce Evans <brde@optusnet.com.au>:
> On Mon, 26 Sep 2011, Attilio Rao wrote:
>
>> 2011/9/4 Bruce Evans <brde@optusnet.com.au>:
>>>
>>> On Sun, 4 Sep 2011, Attilio Rao wrote:
>>>
>>>> Also please notice that intr enable/disable happens in the wrong way
>>>> as it is done via the MD (x86 specific likely) interface. This is
>>>> wrong for 2 reasons:
>>>
>>> No, intr_disable() is MI. =C2=A0It is also used by witness. =C2=A0disab=
le_intr()
>>> is the corresponding x86 interface that you may be thinking of. =C2=A0T=
he MI
>>> interface intr_disable() was introduced to avoid the MD'ness of
>>> intr_disable().
>>
>> I was a bit surprised to verify that you are right but
>> spinlock_enter() has the big difference besides disable_intr() of also
>> explicitly disabling preemption via critical_enter() which some
>> codepath can trigger without even noticing it.
>> This means it is more safer in presence of PREEMPTION option on and
>> thus should be preferred to the normal intr_disable(), in particular
>> for convoluted codepaths.
>
> I think this is another implementation detail which shouldn't be depended
> on. =C2=A0Spinlocks may or may not need either interrupts disabled or a c=
ritical
> section to work. =C2=A0Now I'm a little surprised to remember that they u=
se a
> critical section. =C2=A0This is to prevent context switching. =C2=A0It is=
 useful
> behaviour, but not strictly necessary.
>
> Since disabling interrupts also prevents context switching (excep by bugg=
y
> trap handlers including NMI), it is safe to use hard interrupt disabling
> instead of critical_enter() to prevent context switching. =C2=A0This is s=
afe
> because code that has interrupts disabled cannot wander off into other
> code that doesn't understand this and does context switching! (unless it
> is broken). =C2=A0But for preventing context switching, critical_enter() =
is
> better now that it doesn't hard-disable interrupts internally.

This is not entirely correct, infact you may have preemption even with
interrupts disabled by calling code that schedule threads. This is why
spinlock_enter() disables interrupts _and_ preemption altogether.
Look for example at hardclock() and its internal magic (this is what I
meant, earlier, with "non convoluted codepaths").

>> Can you please explain more about the 'h/w interrupts not disabled' in
>> X86?
>> Are you speaking about NMIs? For those the only way to effectively
>> mask them would be to reprogram the LAPIC entry, but I don't really
>> think we may want that.
>
> This is in my version of x86. =C2=A0mtx_lock_spin() is entirely in softwa=
re
> (and deconvoluted -- no macros -- for at least the non-LOCK_DEBUG case):
>
> % void
> % mtx_lock_spin(struct mtx *mp)
> % {
> % =C2=A0 =C2=A0 =C2=A0 struct thread *td;
> % % =C2=A0 =C2=A0 td =3D curthread;
> % =C2=A0 =C2=A0 =C2=A0 td->td_critnest++;
>
> The previous line is the entire critical_enter() manually inlined
> (except the full critical_enter() has lots of instrumentation cruft).
> -current has spinlock_enter() here.
>
> -current used to have critical_enter() here (actually in the macro
> correspoding to this) instead. =C2=A0This depended on critical_enter() be=
ing
> pessimal and always doing a hard interrupt disable. =C2=A0The hard interr=
upt
> disable is needed as an implementation detail for -current in the !SMP
> case, but for most other uses it is not needed. =C2=A0The pessimization w=
as
> rediced in -current by moving this hard interrupt disable from
> critical_enter() to the spinlock_enter() cases that need it (currently
> all?). =C2=A0This optimized critical_enter() to just an increment of
> td_critnest, exactly the same as in my version except for instrumentation
> (mine has lots of messy timing stuff but -current has a single CTR4()).
> My version also optimizes away the hard interrupt disable.
>
> The resulting critical_enter() really should be an inline. =C2=A0I only d=
id
> this in the manual inlining above. =C2=A0This handles most cases of inter=
est.
> By un-inlining (un-macroizing) mtx_lock_spin(), but inlining
> critical_enter(), I get the same number of function calls but much smalle=
r
> code since it is the tiny critical_enter() function and not the big
> mtx_lock_spin() one that is inlined.

I'm not entirely sure I follow.

In -CURRENT, right now mtx_lock_spin() just yields
_mtx_lock_spin_flags() which is not inlined.
So the improvement here is just to have inlined critical_enter()? How
this can lead to smaller code?

> % =C2=A0 =C2=A0 =C2=A0 if (!_obtain_lock(mp, td)) {
> % =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 if (mp->mtx_lock =3D=
=3D (uintptr_t)td)
> % =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 (mp)->mtx_recurse++;
> % =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 else
> % =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 _mtx_lock_spin(mp, 0, NULL, 0);
> % =C2=A0 =C2=A0 =C2=A0 }
>
> Essentially the same as in -current.
>
> % }
>
> The complications are mainly in critical_exit():
> - in my version, when td_critnest is decremented to 0, a MD function is
> =C2=A0called to "unpend" any pending interrupts that have accumulated whi=
le
> =C2=A0in the critical region. =C2=A0Since mtx_lock_spin() doesn't hard-di=
sable
> =C2=A0interrupts, they may occur when a spinlock is held. =C2=A0Fast inte=
rrupts
> =C2=A0proceed. =C2=A0Others are blocked until critical_exit() unpends the=
m.
> =C2=A0This includes software interrupts. =C2=A0The only really complicate=
d part
> =C2=A0is letting fast interrupts proceed. =C2=A0Fast interrupt handlers c=
annot
> =C2=A0use or be blocked by any normal locking, since they don't respect
> =C2=A0normal spinlocks. =C2=A0So for example, hardclock() cannot be a fas=
t interrupt
> =C2=A0handler.

I don't think this is a good idea.
We hardly rely on interrupts disabling during spinlock helding in
order to get them be used by fast handlers and then avoid deadlocks
with code running in interrupt/kernel context.

>>> =C2=A0 =C2=A0(The interface here is slightly broken (non-MI). =C2=A0It =
returns
>>> register_t.
>>> =C2=A0 =C2=A0This assumes that the interrupt state can be represented i=
n a single
>>> =C2=A0 =C2=A0register. =C2=A0The type critical_t exists to avoid the sa=
me bug in an
>>> =C2=A0 =C2=A0old version of critical_enter(). =C2=A0Now this type is ju=
st bogus.
>>> =C2=A0 =C2=A0critical_enter() no longer returns it. =C2=A0Instead, spin=
lock_enter() uses
>>> =C2=A0 =C2=A0a non-reentrant interface which stores what used to be the=
 return
>>> value
>>> =C2=A0 =C2=A0of critical_enter() in a per-thread MD data structure (md_=
saved_pil
>>> =C2=A0 =C2=A0in the above). =C2=A0Most or all arches use register_t for=
 this. =C2=A0This
>>> =C2=A0 =C2=A0leaves critical_t as pure garbage -- the only remaining re=
ferences to
>>> =C2=A0 =C2=A0it are for its definition.)
>>
>> I mostly agree, I think we should have an MD specified type to replace
>> register_t for this (it could alias it, if it is suitable, but this
>> interface smells a lot like x86-centric).
>
> Really vax-centric. =C2=A0spl "levels" are from vax or earlier CPUs. =C2=
=A0x86
> doesn't really have levels (the AT PIC has masks and precedences. =C2=A0T=
he
> precedences correspond to levels are but rarely depended on or programmed
> specifically). =C2=A0alpha and sparc seem to have levels much closer to
> vax.
>
> With only levels, even an 8-bit interface for the level is enough (255
> levels should be enough for anyone). =C2=A0With masks, even a 64-bit inte=
rface
> for the mask might not be enough. =C2=A0When masks were mapped to levels
> for FreeBSD on i386, the largish set of possible mask values was mapped
> into < 8 standard levels (tty, net, bio, etc). =C2=A0Related encoding of
> MD details as cookies would probably work well enough in general.

For cookie you just mean a void * ptr?

Attilio



--=20
Peace can only be achieved by understanding - A. Einstein