From owner-svn-src-head@FreeBSD.ORG  Tue Oct  4 16:52:18 2011
Return-Path: <owner-svn-src-head@FreeBSD.ORG>
Delivered-To: svn-src-head@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id B0E5B1065678;
	Tue,  4 Oct 2011 16:52:18 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail05.syd.optusnet.com.au (mail05.syd.optusnet.com.au
	[211.29.132.186])
	by mx1.freebsd.org (Postfix) with ESMTP id 2EA898FC20;
	Tue,  4 Oct 2011 16:52:17 +0000 (UTC)
Received: from c122-106-165-191.carlnfd1.nsw.optusnet.com.au
	(c122-106-165-191.carlnfd1.nsw.optusnet.com.au [122.106.165.191])
	by mail05.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	p94GqE14022513
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Wed, 5 Oct 2011 03:52:15 +1100
Date: Wed, 5 Oct 2011 03:52:14 +1100 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Attilio Rao <attilio@freebsd.org>
In-Reply-To: <CAJ-FndDU2yTXLuFi+6ARz-0ws1E=x3=PNnoFH=gdJ7QEzKAetA@mail.gmail.com>
Message-ID: <20111005023142.G8617@besplex.bde.org>
References: <201109041307.p84D72GY092462@svn.freebsd.org>
	<CAJ-FndDa=xmvrcn9CdgMvDZ_vG3pjUdFqLH=Q+VK+SWjvGFO9g@mail.gmail.com>
	<20110905023251.C832@besplex.bde.org>
	<CAJ-FndD+fWPW_XFAChG5UbOR3iAvU5i7Lo35=8J_1gdRWmGcLg@mail.gmail.com>
	<20111004043232.K11186@besplex.bde.org>
	<CAJ-FndDU2yTXLuFi+6ARz-0ws1E=x3=PNnoFH=gdJ7QEzKAetA@mail.gmail.com>
MIME-Version: 1.0
Content-Type: MULTIPART/MIXED; BOUNDARY="0-1554855478-1317747134=:8617"
Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org,
	src-committers@freebsd.org, Bruce Evans <brde@optusnet.com.au>
Subject: Re: svn commit: r225372 - head/sys/kern
X-BeenThere: svn-src-head@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: SVN commit messages for the src tree for head/-current
	<svn-src-head.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/svn-src-head>,
	<mailto:svn-src-head-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/svn-src-head>
List-Post: <mailto:svn-src-head@freebsd.org>
List-Help: <mailto:svn-src-head-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/svn-src-head>,
	<mailto:svn-src-head-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 04 Oct 2011 16:52:18 -0000

  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.

--0-1554855478-1317747134=:8617
Content-Type: TEXT/PLAIN; charset=X-UNKNOWN; format=flowed
Content-Transfer-Encoding: QUOTED-PRINTABLE

On Tue, 4 Oct 2011, Attilio Rao wrote:

> 2011/10/3 Bruce Evans <brde@optusnet.com.au>:
>> On Mon, 26 Sep 2011, Attilio Rao wrote:
>>
>>> 2011/9/4 Bruce Evans <brde@optusnet.com.au>:
>>>>
>>>> On Sun, 4 Sep 2011, Attilio Rao wrote:
>>>>
>>>>> Also please notice that intr enable/disable happens in the wrong way
>>>>> as it is done via the MD (x86 specific likely) interface. This is
>>>>> wrong for 2 reasons:
>>>>
>>>> No, intr_disable() is MI. =C2=A0It is also used by witness. =C2=A0disa=
ble_intr()
>>>> is the corresponding x86 interface that you may be thinking of. =C2=A0=
The MI
>>>> interface intr_disable() was introduced to avoid the MD'ness of
>>>> intr_disable().
>>>
>>> I was a bit surprised to verify that you are right but
>>> spinlock_enter() has the big difference besides disable_intr() of also
>>> explicitly disabling preemption via critical_enter() which some
>>> codepath can trigger without even noticing it.
>>> This means it is more safer in presence of PREEMPTION option on and
>>> thus should be preferred to the normal intr_disable(), in particular
>>> for convoluted codepaths.
>>
>> I think this is another implementation detail which shouldn't be depende=
d
>> on. =C2=A0Spinlocks may or may not need either interrupts disabled or a =
critical
>> section to work. =C2=A0Now I'm a little surprised to remember that they =
use a
>> critical section. =C2=A0This is to prevent context switching. =C2=A0It i=
s useful
>> behaviour, but not strictly necessary.
>>
>> Since disabling interrupts also prevents context switching (excep by bug=
gy
>> trap handlers including NMI), it is safe to use hard interrupt disabling
>> instead of critical_enter() to prevent context switching. =C2=A0This is =
safe
>> because code that has interrupts disabled cannot wander off into other
>> code that doesn't understand this and does context switching! (unless it
>> is broken). =C2=A0But for preventing context switching, critical_enter()=
 is
>> better now that it doesn't hard-disable interrupts internally.
>
> This is not entirely correct, infact you may have preemption even with
> interrupts disabled by calling code that schedule threads. This is why
> spinlock_enter() disables interrupts _and_ preemption altogether.
> Look for example at hardclock() and its internal magic (this is what I
> meant, earlier, with "non convoluted codepaths").

That is a bug in -current.  As I said, only broken code can wander off
into other code that doesn't understand the caller's context.  This is
one of the things that prevents hardclock() being a non-broken fast
interrupt handler.  hardclock() wants to call scheduling code, but non-
broken fast interrupt handlers can't do that.

>> By un-inlining (un-macroizing) mtx_lock_spin(), but inlining
>> critical_enter(), I get the same number of function calls but much small=
er
>> code since it is the tiny critical_enter() function and not the big
>> mtx_lock_spin() one that is inlined.
>
> I'm not entirely sure I follow.
>
> In -CURRENT, right now mtx_lock_spin() just yields
> _mtx_lock_spin_flags() which is not inlined.

That is the debugging version.  <sys/mutex.h> is obfuscated as follows:
- mtx_lock_spin(m) is mtx_lock_spin_flags((m), 0)
- if LOCK_DEBUG > 0 or defined(MUTEX_NOINLINE)
     mtx_lock_spin_flags((m), 0) is _mtx_lock_spin_flags((m), curthread, (0=
),
 =09LOCK_FILE, LOCK_LINE)
     This gives the version that you described.
     if LOCK_DEBUG > 0
       LOCK_FILE is __FILE__ and LOCK_LINE is __LINE__
     else
       LOCK_FILE is NULL and LOCK_LINE is 0
   else
     mtx_lock_spin_flags((m), 0) is __mtx_lock_spin((m), curthread, (0),
 =09NULL, 0)
     This gives the version that I described.
     __mtx_lock_spin() is passed a dummy LOCK_FILE and LOCK_LINE although
     it doesn't use them in this case.  It is convoluted so that it can
     be used both in this case and in the non-inlined case, where it is
     expanded in _mtx_lock_spin_flags(), where it is passed __FILE__
     and __LINE__ iff LOCK_DEBUG > 0.  But this reuse is not so good since
     it gives a further obfuscations:
       __mtx_lock_spin() takes flags args but doesn't have `flags' in its
       name like some other mtx functions.
       In the non-inlined case:
         mtx_lock_spin(...) is _mtx_lock_spin_flags() as described above
 =09_mtx_lock_spin_flags(...) invokes __mtx_lock_spin(...) as desc. above
         __mtx_lock_spin(...) invokes _mtx_lock_spin(...)
 =09The macro obfuscations end at this point -- _mtx_lock_spin() is
 =09always a function.
         _mtx_lock_spin(), like __mtx_lock_spin(), takes flags args but
 =09doesn't have `flags' in its name.
   end if

> So the improvement here is just to have inlined critical_enter()? How
> this can lead to smaller code?

This should be obvious now.  Another detail is that my mtx_lock_spin()
takes only 1 arg (this is the normal API).  It doesn't need to support
passing (td, opt, file, line).  critical_enter() takes no args at all
(td =3D curthread is implicit for it, as it is for mtx_lock_spin()).  So
mtx_lock_spin(mp) calls expand to slightly more object code than
critical_enter() calls.  In -current for the inlined case, mtx_lock_spin()
expands to critical_enter() plus about 40 bytes (on i386) for the atomic
op on the mutex followed by the _mtx_lock_function call.

>> The complications are mainly in critical_exit():
>> - in my version, when td_critnest is decremented to 0, a MD function is
>> =C2=A0called to "unpend" any pending interrupts that have accumulated wh=
ile
>> =C2=A0in the critical region. =C2=A0Since mtx_lock_spin() doesn't hard-d=
isable
>> =C2=A0interrupts, they may occur when a spinlock is held. =C2=A0Fast int=
errupts
>> =C2=A0proceed. =C2=A0Others are blocked until critical_exit() unpends th=
em.
>> =C2=A0This includes software interrupts. =C2=A0The only really complicat=
ed part
>> =C2=A0is letting fast interrupts proceed. =C2=A0Fast interrupt handlers =
cannot
>> =C2=A0use or be blocked by any normal locking, since they don't respect
>> =C2=A0normal spinlocks. =C2=A0So for example, hardclock() cannot be a fa=
st interrupt
>> =C2=A0handler.
>
> I don't think this is a good idea.
> We hardly rely on interrupts disabling during spinlock helding in
> order to get them be used by fast handlers and then avoid deadlocks
> with code running in interrupt/kernel context.

I think you mean "We strongly rely on...".  -current certainly relies on
this.  IMO this is mostly a bug.  There is the implementation detail that
spinlocks hard-disable interrupts to avoid a deadlock problem in the UP
case.  Too much code has come to depend on this IMO.

>>> [register_t intr_disable() interface not being quite NMI]
>>>
>>> I mostly agree, I think we should have an MD specified type to replace
>>> register_t for this (it could alias it, if it is suitable, but this
>>> interface smells a lot like x86-centric).
>>
>> Really vax-centric. =C2=A0spl "levels" are from vax or earlier CPUs. =C2=
=A0x86
>> doesn't really have levels (the AT PIC has masks and precedences. =C2=A0=
The
>> precedences correspond to levels are but rarely depended on or programme=
d
>> specifically). =C2=A0alpha and sparc seem to have levels much closer to
>> vax.
>>
>> With only levels, even an 8-bit interface for the level is enough (255
>> levels should be enough for anyone). =C2=A0With masks, even a 64-bit int=
erface
>> for the mask might not be enough. =C2=A0When masks were mapped to levels
>> for FreeBSD on i386, the largish set of possible mask values was mapped
>> into < 8 standard levels (tty, net, bio, etc). =C2=A0Related encoding of
>> MD details as cookies would probably work well enough in general.
>
> For cookie you just mean a void * ptr?

Either a pointer, or an integer that indexes a table of pointers, or an
an integer that encodes all the info in the integer's bits, possibly
with an identity encoding.

Bruce
--0-1554855478-1317747134=:8617--