Date: Wed, 29 May 2013 10:16:45 +0300 From: Konstantin Belousov <kostikbel@gmail.com> To: Alfred Perlstein <bright@mu.org> Cc: freebsd-hackers@freebsd.org Subject: Re: please review, patch for lost camisr Message-ID: <20130529071645.GR3047@kib.kiev.ua> In-Reply-To: <51A592BC.1050100@mu.org> References: <51A44B0C.8010908@ixsystems.com> <201305281204.14146.jhb@freebsd.org> <51A505A5.7030105@ixsystems.com> <201305281613.32414.jhb@freebsd.org> <51A514F5.9050405@ixsystems.com> <20130529050835.GP3047@kib.kiev.ua> <51A592BC.1050100@mu.org>
next in thread | previous in thread | raw e-mail | index | archive | help
--Kp2K9UuanVOCcgkO Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, May 28, 2013 at 10:31:40PM -0700, Alfred Perlstein wrote: > On 5/28/13 10:08 PM, Konstantin Belousov wrote: > > On Tue, May 28, 2013 at 01:35:01PM -0700, Alfred Perlstein wrote: > >> [[ moved to hackers@ from private mail. ]] > >> > >> On 5/28/13 1:13 PM, John Baldwin wrote: > >>> On Tuesday, May 28, 2013 3:29:41 pm Alfred Perlstein wrote: > >>>> On 5/28/13 9:04 AM, John Baldwin wrote: > >>>>> On Tuesday, May 28, 2013 2:13:32 am Alfred Perlstein wrote: > >>>>>> Hey folks, > >>>>>> > >>>>>> I had a talk with Nathan Whitehorn about the camisr issue. The is= sue we > >>>>>> are seeing we mostly know, but to summarize, we are losing the cam= isr > >>>>>> signal and the camisr is not being run. > >>>>>> > >>>>>> I gave him a summary of what we have been seeing and pointed him t= o the > >>>>>> code I am concerned about here: > >>>>>> http://pastebin.com/tLKr7mCV (this is inside of kern_intr.c). > >>>>>> > >>>>>> What I think that is happening is that the setting of it_need to 0 > >>>>>> inside of sys/kern/kern_intr.c:ithread_loop() is not being schedul= ed > >>>>>> correctly and it is being delayed until AFTER the call to > >>>>>> ithread_execute_handlers() right below the atomic_store_rel_int(). > >>>>> This seems highly unlikely, to the extent that if this were true al= l our > >>>>> locking primitives would be broken. The store_rel is actually a re= lease > >>>>> barrier which acts like more a read/write fence. No memory accesse= s (read or > >>>>> write) from before the release can be scheduled after the associate= d store, > >>>>> either by the compiler or CPU. That is what Konstantin is referrin= g to in his > >>>>> commit when he says "release" semantics". > >>>> Yes, that makes sense, however does it specify that the writes *must* > >>>> occur at that *point*? If it only enforces ordering then we may have > >>>> some issue, specifically because the setting of it to '1' inside of > >>>> intr_event_schedule_thread has no barrier other than the acq semanti= cs > >>>> of the thread lock. I am wondering what is forcing out the '1' ther= e. > >>> Nothing ever forces writes. You would have to invalidate the cache t= o do that > >>> and that is horribly expensive. It is always only about ordering and= knowing > >>> that if you can complete another operation on the same "cookie" varia= ble with > >>> acquire semantics that earlier writes will be visible. > >> By cookie, you mean a specific memory address, basically a lock? This = is > >> starting to reinforce my suspicions as the setting of it_need is done > >> with release semantics, however the acq on the other CPU is done on the > >> thread lock. Maybe that is irrelevant. We will find out shortly. > >> > >>>> See below as I think we have proof that this is somehow happening. > >>> Having ih_need of 1 and it_need of 0 is certainly busted. The simple= st fix > >>> is probably to stop using atomics on it_need and just grab the thread= lock > >>> in the main ithread loop and hold it while checking and clearing it_n= eed. > >>> > >> OK, we have some code that will either prove this, or perturb the memo= ry > >> ordering enough to make the bug go away, or prove this assertion wrong. > >> > >> We will update on our findings hopefully in the next few days. > > IMO the read of it_need in the 'while (ithd->it_need)' should > > have acquire semantic, otherwise the future reads in the > > ithread_execute_handlers(), in particular, of the ih_need, could pass > > the read of it_need and cause the situation you reported. I do not > > see any acquire barrier between a condition in the while() statement > > and the read of ih_need in the execute_handlers(). > > > > It is probably true that the issue you see was caused by r236456, in the > > sense that implicitely locked xchgl instruction on x86 has a full barri= er > > semantic. As result, the store_rel() was actually an acquire too, maki= ng > > this reordering impossible. I argue that this is not a bug in r236456, > > but the issue in the kern_intr.c. > If I remember the code correctly that would probably explain why we see= =20 > it only on 9.1 system. > > > > On the other hand, the John' suggestion to move the manipulations of > > it_need under the lock is probably the best anyway. > > > I was wondering if it would be lower latency to maintain it_need,=20 > however to keep another variable it_needlocked under the thread lock. =20 > This would result in potential superfluous interrupts, however under=20 > load you would allow the ithread to loop without taking the thread lock= =20 > some number of times. >=20 > I am not really sure if this is really worth the optimization=20 > (especially since it can result in superfluous interrupts) however it=20 > may reduce latency and that might be important. >=20 > Is there some people that I can pass the patch onto for help with=20 > performance once we confirm that this is the actual bug? We can do=20 > internal testing, but I am worried about regressing performance of any=20 > form of IO for the kernel. >=20 > I'll show the patch soon. >=20 > Thank you for the information. This is promising. Well, if you and I are right, the minimal patch should be diff --git a/sys/kern/kern_intr.c b/sys/kern/kern_intr.c index 8d63c9b..7c21015 100644 --- a/sys/kern/kern_intr.c +++ b/sys/kern/kern_intr.c @@ -1349,7 +1349,7 @@ ithread_loop(void *arg) * we are running, it will set it_need to note that we * should make another pass. */ - while (ithd->it_need) { + while (atomic_load_acq_int(&ithd->it_need)) { /* * This might need a full read and write barrier * to make sure that this write posts before any --Kp2K9UuanVOCcgkO Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.20 (FreeBSD) iQIcBAEBAgAGBQJRpatcAAoJEJDCuSvBvK1BB84P/0yJ0u/G3Ddiw2Psk4UmIW1W oDFF3Z3qDW25qQxyH7ER/35EfayVdugBvR4BwNt//V//nNlEXcS56d+qAG1Fxgd/ cCV/QSxBjb7sLka93FX8Ku5wvwKbeGbunBOG8DSVDL34+MJy+tyriw9LudYmegVf 4oT/TjtsJDAiXfl+vS1EhEoQy49QCQwThc2CA5DalrydF3xJc5RJRitR8f9yfjf5 fhPPm24g0wHYbG2ctUGl1oK7xnJUVVQT0MlEfMoB4F/tamXIeS/hP09dLJuWQK3j wqzxD6KndKtLi5m/JSbpuxdE0nWpdewpLe66VEDW7bcxf98qZ/qzp3bjImYBE6dC SYgMtgTwZfBpV7LuyPlR7lQo2Rc0XKKdJwCtdF0MSSBy8kzLK+WQsiCLJjJth1y/ k2mm+SndZgNKXjsYjUITxm1YZx6obVNiyiwUwoRilwizkMpdFyyO9FvGRBywDGQz bQOjptIEvOzcUVHSFgSWon7+6xbaV7JEDe1ja8ApeLki3Cm699HZia4ZmlC5dNxP OOfHp0CY9lCIdK/ib3I6yPY2g3Q8RWJRtBbhkU2ChxXzB4WiuAyumgWjWmYhpX3t /PdWUXk3IGdmvp9w+vt8HzIqkl7RIGh3T3f3YKWypRhjB/00wmY1UKIGISzTUQc1 5jvmi4ZCGmTwuFhEbp5X =/X81 -----END PGP SIGNATURE----- --Kp2K9UuanVOCcgkO--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130529071645.GR3047>