Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 29 May 2013 10:40:33 +0300
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Alfred Perlstein <bright@mu.org>
Cc:        freebsd-hackers@freebsd.org
Subject:   Re: please review, patch for lost camisr
Message-ID:  <20130529074033.GS3047@kib.kiev.ua>
In-Reply-To: <51A5ADF2.3070004@mu.org>
References:  <51A44B0C.8010908@ixsystems.com> <201305281204.14146.jhb@freebsd.org> <51A505A5.7030105@ixsystems.com> <201305281613.32414.jhb@freebsd.org> <51A514F5.9050405@ixsystems.com> <20130529050835.GP3047@kib.kiev.ua> <51A592BC.1050100@mu.org> <20130529071645.GR3047@kib.kiev.ua> <51A5ADF2.3070004@mu.org>

next in thread | previous in thread | raw e-mail | index | archive | help

--aN4sSdpAXbmVieFt
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Wed, May 29, 2013 at 12:27:46AM -0700, Alfred Perlstein wrote:
> On 5/29/13 12:16 AM, Konstantin Belousov wrote:
> > On Tue, May 28, 2013 at 10:31:40PM -0700, Alfred Perlstein wrote:
> >> On 5/28/13 10:08 PM, Konstantin Belousov wrote:
> >>> On Tue, May 28, 2013 at 01:35:01PM -0700, Alfred Perlstein wrote:
> >>>> [[  moved to hackers@ from private mail. ]]
> >>>>
> >>>> On 5/28/13 1:13 PM, John Baldwin wrote:
> >>>>> On Tuesday, May 28, 2013 3:29:41 pm Alfred Perlstein wrote:
> >>>>>> On 5/28/13 9:04 AM, John Baldwin wrote:
> >>>>>>> On Tuesday, May 28, 2013 2:13:32 am Alfred Perlstein wrote:
> >>>>>>>> Hey folks,
> >>>>>>>>
> >>>>>>>> I had a talk with Nathan Whitehorn about the camisr issue.  The =
issue we
> >>>>>>>> are seeing we mostly know, but to summarize, we are losing the c=
amisr
> >>>>>>>> signal and the camisr is not being run.
> >>>>>>>>
> >>>>>>>> I gave him a summary of what we have been seeing and pointed him=
 to the
> >>>>>>>> code I am concerned about here:
> >>>>>>>> http://pastebin.com/tLKr7mCV  (this is inside of kern_intr.c).
> >>>>>>>>
> >>>>>>>> What I think that is happening is that the setting of it_need to=
 0
> >>>>>>>> inside of sys/kern/kern_intr.c:ithread_loop() is not being sched=
uled
> >>>>>>>> correctly and it is being delayed until AFTER the call to
> >>>>>>>> ithread_execute_handlers() right below the atomic_store_rel_int(=
).
> >>>>>>> This seems highly unlikely, to the extent that if this were true =
all our
> >>>>>>> locking primitives would be broken.  The store_rel is actually a =
release
> >>>>>>> barrier which acts like more a read/write fence.  No memory acces=
ses (read or
> >>>>>>> write) from before the release can be scheduled after the associa=
ted store,
> >>>>>>> either by the compiler or CPU.  That is what Konstantin is referr=
ing to in his
> >>>>>>> commit when he says "release" semantics".
> >>>>>> Yes, that makes sense, however does it specify that the writes *mu=
st*
> >>>>>> occur at that *point*?  If it only enforces ordering then we may h=
ave
> >>>>>> some issue, specifically because the setting of it to '1' inside of
> >>>>>> intr_event_schedule_thread has no barrier other than the acq seman=
tics
> >>>>>> of the thread lock.  I am wondering what is forcing out the '1' th=
ere.
> >>>>> Nothing ever forces writes.  You would have to invalidate the cache=
 to do that
> >>>>> and that is horribly expensive.  It is always only about ordering a=
nd knowing
> >>>>> that if you can complete another operation on the same "cookie" var=
iable with
> >>>>> acquire semantics that earlier writes will be visible.
> >>>> By cookie, you mean a specific memory address, basically a lock? Thi=
s is
> >>>> starting to reinforce my suspicions as the setting of it_need is done
> >>>> with release semantics, however the acq on the other CPU is done on =
the
> >>>> thread lock.  Maybe that is irrelevant.  We will find out shortly.
> >>>>
> >>>>>> See below as I think we have proof that this is somehow happening.
> >>>>> Having ih_need of 1 and it_need of 0 is certainly busted.  The simp=
lest fix
> >>>>> is probably to stop using atomics on it_need and just grab the thre=
ad lock
> >>>>> in the main ithread loop and hold it while checking and clearing it=
_need.
> >>>>>
> >>>> OK, we have some code that will either prove this, or perturb the me=
mory
> >>>> ordering enough to make the bug go away, or prove this assertion wro=
ng.
> >>>>
> >>>> We will update on our findings hopefully in the next few days.
> >>> IMO the read of it_need in the 'while (ithd->it_need)' should
> >>> have acquire semantic, otherwise the future reads in the
> >>> ithread_execute_handlers(), in particular, of the ih_need, could pass
> >>> the read of it_need and cause the situation you reported.  I do not
> >>> see any acquire barrier between a condition in the while() statement
> >>> and the read of ih_need in the execute_handlers().
> >>>
> >>> It is probably true that the issue you see was caused by r236456, in =
the
> >>> sense that implicitely locked xchgl instruction on x86 has a full bar=
rier
> >>> semantic.  As result, the store_rel() was actually an acquire too, ma=
king
> >>> this reordering impossible.  I argue that this is not a bug in r23645=
6,
> >>> but the issue in the kern_intr.c.
> >> If I remember the code correctly that would probably explain why we see
> >> it only on 9.1 system.
> >>> On the other hand, the John' suggestion to move the manipulations of
> >>> it_need under the lock is probably the best anyway.
> >>>
> >> I was wondering if it would be lower latency to maintain it_need,
> >> however to keep another variable it_needlocked under the thread lock.
> >> This would result in potential superfluous interrupts, however under
> >> load you would allow the ithread to loop without taking the thread lock
> >> some number of times.
> >>
> >> I am not really sure if this is really worth the optimization
> >> (especially since it can result in superfluous interrupts) however it
> >> may reduce latency and that might be important.
> >>
> >> Is there some people that I can pass the patch onto for help with
> >> performance once we confirm that this is the actual bug?   We can do
> >> internal testing, but I am worried about regressing performance of any
> >> form of IO for the kernel.
> >>
> >> I'll show the patch soon.
> >>
> >> Thank you for the information.  This is promising.
> > Well, if you and I are right, the minimal patch should be
> >
> > diff --git a/sys/kern/kern_intr.c b/sys/kern/kern_intr.c
> > index 8d63c9b..7c21015 100644
> > --- a/sys/kern/kern_intr.c
> > +++ b/sys/kern/kern_intr.c
> > @@ -1349,7 +1349,7 @@ ithread_loop(void *arg)
> >   		 * we are running, it will set it_need to note that we
> >   		 * should make another pass.
> >   		 */
> > -		while (ithd->it_need) {
> > +		while (atomic_load_acq_int(&ithd->it_need)) {
> >   			/*
> >   			 * This might need a full read and write barrier
> >   			 * to make sure that this write posts before any
> >
>=20
> OK we can try this.
>=20
> I've been pretty good at locking when using mutexes, but when we get=20
> into the atomic ops like this it gets a little tough for me to follow=20
> without extensive research.  I know that the signalling thread=20
> (swi_sched caller) does not use any atomic ops... is this OK?

The sequence of the actions there is
	atomic_store_rel_int(&ih->ih_need, 1);
later in intr_event_schedule_thread():
	it->it_need =3D 1;
	thread_lock(td);
	sched_add(td);
	thread_unlock(td);

There are two things to worry about, as I see:
1. The possibiblity of seeing it_need =3D=3D 1 but ih_need =3D=3D 0.  This =
could
   happen on some architectures which allow the write reordering, so
   it might make sense to move the store_rel from ih_need to it_need.
   Effectively, we would need to have rel in both places, since=20
   the call to intr_event_schedule_thread() is conditional.

   But the reordering is impossible on x86.

2. A possibility of scheduling the interrupt thread without CPU running
   it noticing the it_need =3D 1 write.  Since the thread lock is locked
   and released, the release barrier is applied, so the write should
   become visible before thread is scheduled on processor.

I would start with the only addition of load_acq() for now and see if
it helps.

--aN4sSdpAXbmVieFt
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.20 (FreeBSD)

iQIcBAEBAgAGBQJRpbDwAAoJEJDCuSvBvK1BCYEP/3rXtRdLZe2SGCBYhIMuSOJU
uYhNnd9rFRhgm6GMzSenbIMr/mEGfvsAFWkQMBZ0SQ1XHQjJ1oS90rVcutdMFdDl
mv8CxAG+dsebpd75e9RZXEcsVze6NmA2Iqv79JE9eCo8Cgw0Mpks+Jo2ys9w0D+E
CkOOTi2Ga6VuR0+jEMWuJRsgLXKzWxoKuIJzDdZfXQiTrh45s87EXExbF/QgYDXz
51Fcqd01BSLeDDnfrAozy2zxHkxnOq69pxhRr/SSoG/1VHJnxL5QhOT8fFVZ8hJq
KfcXj6dCROPP4nsn17EAmQuxYMJCewsi6Ub5573wpSw6JWtg4tgyi+GQfrSUXX4e
Czhj+fCsp+HPTstYLOAGm2Bwyzv31PBUAbPLNi5vaVQIlTXs3jl0IiCpkVgLrTbO
F6JjctxMO6giSc73UmEi1miQv2aSGRFSwBq4PpAqHo6cdCeSC7E11Jv4nnPO9TWo
Wlw7KlsKSTaZksBG4dhAe8y9is02SqjLOMRaFDwd3NwzTZTqZ60eQjxQDkf57qy2
MR2h3thrDwl/Q9oRXPSJd9MBhVY2mDidwHy3VTfTmpCxmCVBp7w0tffNpGRQ8p/w
PtW9VQhDtBFeaJ+Jw0AFM+j9+bkA//FlpnhjJKXmL1fxJiPjqbx0Qc74uR6H3Tw9
PoWTUACr3VWqaxmwaw9Y
=R2ZG
-----END PGP SIGNATURE-----

--aN4sSdpAXbmVieFt--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130529074033.GS3047>