Date: Tue, 4 Oct 2011 13:05:58 +0200 From: Attilio Rao <attilio@freebsd.org> To: Bruce Evans <brde@optusnet.com.au> Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org Subject: Re: svn commit: r225372 - head/sys/kern Message-ID: <CAJ-FndDU2yTXLuFi%2B6ARz-0ws1E=x3=PNnoFH=gdJ7QEzKAetA@mail.gmail.com> In-Reply-To: <20111004043232.K11186@besplex.bde.org> References: <201109041307.p84D72GY092462@svn.freebsd.org> <CAJ-FndDa=xmvrcn9CdgMvDZ_vG3pjUdFqLH=Q%2BVK%2BSWjvGFO9g@mail.gmail.com> <20110905023251.C832@besplex.bde.org> <CAJ-FndD%2BfWPW_XFAChG5UbOR3iAvU5i7Lo35=8J_1gdRWmGcLg@mail.gmail.com> <20111004043232.K11186@besplex.bde.org>
next in thread | previous in thread | raw e-mail | index | archive | help
2011/10/3 Bruce Evans <brde@optusnet.com.au>: > On Mon, 26 Sep 2011, Attilio Rao wrote: > >> 2011/9/4 Bruce Evans <brde@optusnet.com.au>: >>> >>> On Sun, 4 Sep 2011, Attilio Rao wrote: >>> >>>> Also please notice that intr enable/disable happens in the wrong way >>>> as it is done via the MD (x86 specific likely) interface. This is >>>> wrong for 2 reasons: >>> >>> No, intr_disable() is MI. =C2=A0It is also used by witness. =C2=A0disab= le_intr() >>> is the corresponding x86 interface that you may be thinking of. =C2=A0T= he MI >>> interface intr_disable() was introduced to avoid the MD'ness of >>> intr_disable(). >> >> I was a bit surprised to verify that you are right but >> spinlock_enter() has the big difference besides disable_intr() of also >> explicitly disabling preemption via critical_enter() which some >> codepath can trigger without even noticing it. >> This means it is more safer in presence of PREEMPTION option on and >> thus should be preferred to the normal intr_disable(), in particular >> for convoluted codepaths. > > I think this is another implementation detail which shouldn't be depended > on. =C2=A0Spinlocks may or may not need either interrupts disabled or a c= ritical > section to work. =C2=A0Now I'm a little surprised to remember that they u= se a > critical section. =C2=A0This is to prevent context switching. =C2=A0It is= useful > behaviour, but not strictly necessary. > > Since disabling interrupts also prevents context switching (excep by bugg= y > trap handlers including NMI), it is safe to use hard interrupt disabling > instead of critical_enter() to prevent context switching. =C2=A0This is s= afe > because code that has interrupts disabled cannot wander off into other > code that doesn't understand this and does context switching! (unless it > is broken). =C2=A0But for preventing context switching, critical_enter() = is > better now that it doesn't hard-disable interrupts internally. This is not entirely correct, infact you may have preemption even with interrupts disabled by calling code that schedule threads. This is why spinlock_enter() disables interrupts _and_ preemption altogether. Look for example at hardclock() and its internal magic (this is what I meant, earlier, with "non convoluted codepaths"). >> Can you please explain more about the 'h/w interrupts not disabled' in >> X86? >> Are you speaking about NMIs? For those the only way to effectively >> mask them would be to reprogram the LAPIC entry, but I don't really >> think we may want that. > > This is in my version of x86. =C2=A0mtx_lock_spin() is entirely in softwa= re > (and deconvoluted -- no macros -- for at least the non-LOCK_DEBUG case): > > % void > % mtx_lock_spin(struct mtx *mp) > % { > % =C2=A0 =C2=A0 =C2=A0 struct thread *td; > % % =C2=A0 =C2=A0 td =3D curthread; > % =C2=A0 =C2=A0 =C2=A0 td->td_critnest++; > > The previous line is the entire critical_enter() manually inlined > (except the full critical_enter() has lots of instrumentation cruft). > -current has spinlock_enter() here. > > -current used to have critical_enter() here (actually in the macro > correspoding to this) instead. =C2=A0This depended on critical_enter() be= ing > pessimal and always doing a hard interrupt disable. =C2=A0The hard interr= upt > disable is needed as an implementation detail for -current in the !SMP > case, but for most other uses it is not needed. =C2=A0The pessimization w= as > rediced in -current by moving this hard interrupt disable from > critical_enter() to the spinlock_enter() cases that need it (currently > all?). =C2=A0This optimized critical_enter() to just an increment of > td_critnest, exactly the same as in my version except for instrumentation > (mine has lots of messy timing stuff but -current has a single CTR4()). > My version also optimizes away the hard interrupt disable. > > The resulting critical_enter() really should be an inline. =C2=A0I only d= id > this in the manual inlining above. =C2=A0This handles most cases of inter= est. > By un-inlining (un-macroizing) mtx_lock_spin(), but inlining > critical_enter(), I get the same number of function calls but much smalle= r > code since it is the tiny critical_enter() function and not the big > mtx_lock_spin() one that is inlined. I'm not entirely sure I follow. In -CURRENT, right now mtx_lock_spin() just yields _mtx_lock_spin_flags() which is not inlined. So the improvement here is just to have inlined critical_enter()? How this can lead to smaller code? > % =C2=A0 =C2=A0 =C2=A0 if (!_obtain_lock(mp, td)) { > % =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 if (mp->mtx_lock =3D= =3D (uintptr_t)td) > % =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 (mp)->mtx_recurse++; > % =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 else > % =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 _mtx_lock_spin(mp, 0, NULL, 0); > % =C2=A0 =C2=A0 =C2=A0 } > > Essentially the same as in -current. > > % } > > The complications are mainly in critical_exit(): > - in my version, when td_critnest is decremented to 0, a MD function is > =C2=A0called to "unpend" any pending interrupts that have accumulated whi= le > =C2=A0in the critical region. =C2=A0Since mtx_lock_spin() doesn't hard-di= sable > =C2=A0interrupts, they may occur when a spinlock is held. =C2=A0Fast inte= rrupts > =C2=A0proceed. =C2=A0Others are blocked until critical_exit() unpends the= m. > =C2=A0This includes software interrupts. =C2=A0The only really complicate= d part > =C2=A0is letting fast interrupts proceed. =C2=A0Fast interrupt handlers c= annot > =C2=A0use or be blocked by any normal locking, since they don't respect > =C2=A0normal spinlocks. =C2=A0So for example, hardclock() cannot be a fas= t interrupt > =C2=A0handler. I don't think this is a good idea. We hardly rely on interrupts disabling during spinlock helding in order to get them be used by fast handlers and then avoid deadlocks with code running in interrupt/kernel context. >>> =C2=A0 =C2=A0(The interface here is slightly broken (non-MI). =C2=A0It = returns >>> register_t. >>> =C2=A0 =C2=A0This assumes that the interrupt state can be represented i= n a single >>> =C2=A0 =C2=A0register. =C2=A0The type critical_t exists to avoid the sa= me bug in an >>> =C2=A0 =C2=A0old version of critical_enter(). =C2=A0Now this type is ju= st bogus. >>> =C2=A0 =C2=A0critical_enter() no longer returns it. =C2=A0Instead, spin= lock_enter() uses >>> =C2=A0 =C2=A0a non-reentrant interface which stores what used to be the= return >>> value >>> =C2=A0 =C2=A0of critical_enter() in a per-thread MD data structure (md_= saved_pil >>> =C2=A0 =C2=A0in the above). =C2=A0Most or all arches use register_t for= this. =C2=A0This >>> =C2=A0 =C2=A0leaves critical_t as pure garbage -- the only remaining re= ferences to >>> =C2=A0 =C2=A0it are for its definition.) >> >> I mostly agree, I think we should have an MD specified type to replace >> register_t for this (it could alias it, if it is suitable, but this >> interface smells a lot like x86-centric). > > Really vax-centric. =C2=A0spl "levels" are from vax or earlier CPUs. =C2= =A0x86 > doesn't really have levels (the AT PIC has masks and precedences. =C2=A0T= he > precedences correspond to levels are but rarely depended on or programmed > specifically). =C2=A0alpha and sparc seem to have levels much closer to > vax. > > With only levels, even an 8-bit interface for the level is enough (255 > levels should be enough for anyone). =C2=A0With masks, even a 64-bit inte= rface > for the mask might not be enough. =C2=A0When masks were mapped to levels > for FreeBSD on i386, the largish set of possible mask values was mapped > into < 8 standard levels (tty, net, bio, etc). =C2=A0Related encoding of > MD details as cookies would probably work well enough in general. For cookie you just mean a void * ptr? Attilio --=20 Peace can only be achieved by understanding - A. Einstein
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ-FndDU2yTXLuFi%2B6ARz-0ws1E=x3=PNnoFH=gdJ7QEzKAetA>