Date: Wed, 29 May 2013 10:40:33 +0300 From: Konstantin Belousov <kostikbel@gmail.com> To: Alfred Perlstein <bright@mu.org> Cc: freebsd-hackers@freebsd.org Subject: Re: please review, patch for lost camisr Message-ID: <20130529074033.GS3047@kib.kiev.ua> In-Reply-To: <51A5ADF2.3070004@mu.org> References: <51A44B0C.8010908@ixsystems.com> <201305281204.14146.jhb@freebsd.org> <51A505A5.7030105@ixsystems.com> <201305281613.32414.jhb@freebsd.org> <51A514F5.9050405@ixsystems.com> <20130529050835.GP3047@kib.kiev.ua> <51A592BC.1050100@mu.org> <20130529071645.GR3047@kib.kiev.ua> <51A5ADF2.3070004@mu.org>
next in thread | previous in thread | raw e-mail | index | archive | help
--aN4sSdpAXbmVieFt Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, May 29, 2013 at 12:27:46AM -0700, Alfred Perlstein wrote: > On 5/29/13 12:16 AM, Konstantin Belousov wrote: > > On Tue, May 28, 2013 at 10:31:40PM -0700, Alfred Perlstein wrote: > >> On 5/28/13 10:08 PM, Konstantin Belousov wrote: > >>> On Tue, May 28, 2013 at 01:35:01PM -0700, Alfred Perlstein wrote: > >>>> [[ moved to hackers@ from private mail. ]] > >>>> > >>>> On 5/28/13 1:13 PM, John Baldwin wrote: > >>>>> On Tuesday, May 28, 2013 3:29:41 pm Alfred Perlstein wrote: > >>>>>> On 5/28/13 9:04 AM, John Baldwin wrote: > >>>>>>> On Tuesday, May 28, 2013 2:13:32 am Alfred Perlstein wrote: > >>>>>>>> Hey folks, > >>>>>>>> > >>>>>>>> I had a talk with Nathan Whitehorn about the camisr issue. The = issue we > >>>>>>>> are seeing we mostly know, but to summarize, we are losing the c= amisr > >>>>>>>> signal and the camisr is not being run. > >>>>>>>> > >>>>>>>> I gave him a summary of what we have been seeing and pointed him= to the > >>>>>>>> code I am concerned about here: > >>>>>>>> http://pastebin.com/tLKr7mCV (this is inside of kern_intr.c). > >>>>>>>> > >>>>>>>> What I think that is happening is that the setting of it_need to= 0 > >>>>>>>> inside of sys/kern/kern_intr.c:ithread_loop() is not being sched= uled > >>>>>>>> correctly and it is being delayed until AFTER the call to > >>>>>>>> ithread_execute_handlers() right below the atomic_store_rel_int(= ). > >>>>>>> This seems highly unlikely, to the extent that if this were true = all our > >>>>>>> locking primitives would be broken. The store_rel is actually a = release > >>>>>>> barrier which acts like more a read/write fence. No memory acces= ses (read or > >>>>>>> write) from before the release can be scheduled after the associa= ted store, > >>>>>>> either by the compiler or CPU. That is what Konstantin is referr= ing to in his > >>>>>>> commit when he says "release" semantics". > >>>>>> Yes, that makes sense, however does it specify that the writes *mu= st* > >>>>>> occur at that *point*? If it only enforces ordering then we may h= ave > >>>>>> some issue, specifically because the setting of it to '1' inside of > >>>>>> intr_event_schedule_thread has no barrier other than the acq seman= tics > >>>>>> of the thread lock. I am wondering what is forcing out the '1' th= ere. > >>>>> Nothing ever forces writes. You would have to invalidate the cache= to do that > >>>>> and that is horribly expensive. It is always only about ordering a= nd knowing > >>>>> that if you can complete another operation on the same "cookie" var= iable with > >>>>> acquire semantics that earlier writes will be visible. > >>>> By cookie, you mean a specific memory address, basically a lock? Thi= s is > >>>> starting to reinforce my suspicions as the setting of it_need is done > >>>> with release semantics, however the acq on the other CPU is done on = the > >>>> thread lock. Maybe that is irrelevant. We will find out shortly. > >>>> > >>>>>> See below as I think we have proof that this is somehow happening. > >>>>> Having ih_need of 1 and it_need of 0 is certainly busted. The simp= lest fix > >>>>> is probably to stop using atomics on it_need and just grab the thre= ad lock > >>>>> in the main ithread loop and hold it while checking and clearing it= _need. > >>>>> > >>>> OK, we have some code that will either prove this, or perturb the me= mory > >>>> ordering enough to make the bug go away, or prove this assertion wro= ng. > >>>> > >>>> We will update on our findings hopefully in the next few days. > >>> IMO the read of it_need in the 'while (ithd->it_need)' should > >>> have acquire semantic, otherwise the future reads in the > >>> ithread_execute_handlers(), in particular, of the ih_need, could pass > >>> the read of it_need and cause the situation you reported. I do not > >>> see any acquire barrier between a condition in the while() statement > >>> and the read of ih_need in the execute_handlers(). > >>> > >>> It is probably true that the issue you see was caused by r236456, in = the > >>> sense that implicitely locked xchgl instruction on x86 has a full bar= rier > >>> semantic. As result, the store_rel() was actually an acquire too, ma= king > >>> this reordering impossible. I argue that this is not a bug in r23645= 6, > >>> but the issue in the kern_intr.c. > >> If I remember the code correctly that would probably explain why we see > >> it only on 9.1 system. > >>> On the other hand, the John' suggestion to move the manipulations of > >>> it_need under the lock is probably the best anyway. > >>> > >> I was wondering if it would be lower latency to maintain it_need, > >> however to keep another variable it_needlocked under the thread lock. > >> This would result in potential superfluous interrupts, however under > >> load you would allow the ithread to loop without taking the thread lock > >> some number of times. > >> > >> I am not really sure if this is really worth the optimization > >> (especially since it can result in superfluous interrupts) however it > >> may reduce latency and that might be important. > >> > >> Is there some people that I can pass the patch onto for help with > >> performance once we confirm that this is the actual bug? We can do > >> internal testing, but I am worried about regressing performance of any > >> form of IO for the kernel. > >> > >> I'll show the patch soon. > >> > >> Thank you for the information. This is promising. > > Well, if you and I are right, the minimal patch should be > > > > diff --git a/sys/kern/kern_intr.c b/sys/kern/kern_intr.c > > index 8d63c9b..7c21015 100644 > > --- a/sys/kern/kern_intr.c > > +++ b/sys/kern/kern_intr.c > > @@ -1349,7 +1349,7 @@ ithread_loop(void *arg) > > * we are running, it will set it_need to note that we > > * should make another pass. > > */ > > - while (ithd->it_need) { > > + while (atomic_load_acq_int(&ithd->it_need)) { > > /* > > * This might need a full read and write barrier > > * to make sure that this write posts before any > > >=20 > OK we can try this. >=20 > I've been pretty good at locking when using mutexes, but when we get=20 > into the atomic ops like this it gets a little tough for me to follow=20 > without extensive research. I know that the signalling thread=20 > (swi_sched caller) does not use any atomic ops... is this OK? The sequence of the actions there is atomic_store_rel_int(&ih->ih_need, 1); later in intr_event_schedule_thread(): it->it_need =3D 1; thread_lock(td); sched_add(td); thread_unlock(td); There are two things to worry about, as I see: 1. The possibiblity of seeing it_need =3D=3D 1 but ih_need =3D=3D 0. This = could happen on some architectures which allow the write reordering, so it might make sense to move the store_rel from ih_need to it_need. Effectively, we would need to have rel in both places, since=20 the call to intr_event_schedule_thread() is conditional. But the reordering is impossible on x86. 2. A possibility of scheduling the interrupt thread without CPU running it noticing the it_need =3D 1 write. Since the thread lock is locked and released, the release barrier is applied, so the write should become visible before thread is scheduled on processor. I would start with the only addition of load_acq() for now and see if it helps. --aN4sSdpAXbmVieFt Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.20 (FreeBSD) iQIcBAEBAgAGBQJRpbDwAAoJEJDCuSvBvK1BCYEP/3rXtRdLZe2SGCBYhIMuSOJU uYhNnd9rFRhgm6GMzSenbIMr/mEGfvsAFWkQMBZ0SQ1XHQjJ1oS90rVcutdMFdDl mv8CxAG+dsebpd75e9RZXEcsVze6NmA2Iqv79JE9eCo8Cgw0Mpks+Jo2ys9w0D+E CkOOTi2Ga6VuR0+jEMWuJRsgLXKzWxoKuIJzDdZfXQiTrh45s87EXExbF/QgYDXz 51Fcqd01BSLeDDnfrAozy2zxHkxnOq69pxhRr/SSoG/1VHJnxL5QhOT8fFVZ8hJq KfcXj6dCROPP4nsn17EAmQuxYMJCewsi6Ub5573wpSw6JWtg4tgyi+GQfrSUXX4e Czhj+fCsp+HPTstYLOAGm2Bwyzv31PBUAbPLNi5vaVQIlTXs3jl0IiCpkVgLrTbO F6JjctxMO6giSc73UmEi1miQv2aSGRFSwBq4PpAqHo6cdCeSC7E11Jv4nnPO9TWo Wlw7KlsKSTaZksBG4dhAe8y9is02SqjLOMRaFDwd3NwzTZTqZ60eQjxQDkf57qy2 MR2h3thrDwl/Q9oRXPSJd9MBhVY2mDidwHy3VTfTmpCxmCVBp7w0tffNpGRQ8p/w PtW9VQhDtBFeaJ+Jw0AFM+j9+bkA//FlpnhjJKXmL1fxJiPjqbx0Qc74uR6H3Tw9 PoWTUACr3VWqaxmwaw9Y =R2ZG -----END PGP SIGNATURE----- --aN4sSdpAXbmVieFt--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130529074033.GS3047>