From owner-freebsd-hackers@FreeBSD.ORG Wed May 29 07:27:48 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 1780470 for ; Wed, 29 May 2013 07:27:48 +0000 (UTC) (envelope-from bright@mu.org) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.freebsd.org (Postfix) with ESMTP id EC161F3 for ; Wed, 29 May 2013 07:27:47 +0000 (UTC) Received: from Alfreds-MacBook-Pro-9.local (c-67-180-208-218.hsd1.ca.comcast.net [67.180.208.218]) by elvis.mu.org (Postfix) with ESMTPSA id 71CB41A3C1D for ; Wed, 29 May 2013 00:27:46 -0700 (PDT) Message-ID: <51A5ADF2.3070004@mu.org> Date: Wed, 29 May 2013 00:27:46 -0700 From: Alfred Perlstein User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:17.0) Gecko/20130509 Thunderbird/17.0.6 MIME-Version: 1.0 To: freebsd-hackers@freebsd.org Subject: Re: please review, patch for lost camisr References: <51A44B0C.8010908@ixsystems.com> <201305281204.14146.jhb@freebsd.org> <51A505A5.7030105@ixsystems.com> <201305281613.32414.jhb@freebsd.org> <51A514F5.9050405@ixsystems.com> <20130529050835.GP3047@kib.kiev.ua> <51A592BC.1050100@mu.org> <20130529071645.GR3047@kib.kiev.ua> In-Reply-To: <20130529071645.GR3047@kib.kiev.ua> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 May 2013 07:27:48 -0000 On 5/29/13 12:16 AM, Konstantin Belousov wrote: > On Tue, May 28, 2013 at 10:31:40PM -0700, Alfred Perlstein wrote: >> On 5/28/13 10:08 PM, Konstantin Belousov wrote: >>> On Tue, May 28, 2013 at 01:35:01PM -0700, Alfred Perlstein wrote: >>>> [[ moved to hackers@ from private mail. ]] >>>> >>>> On 5/28/13 1:13 PM, John Baldwin wrote: >>>>> On Tuesday, May 28, 2013 3:29:41 pm Alfred Perlstein wrote: >>>>>> On 5/28/13 9:04 AM, John Baldwin wrote: >>>>>>> On Tuesday, May 28, 2013 2:13:32 am Alfred Perlstein wrote: >>>>>>>> Hey folks, >>>>>>>> >>>>>>>> I had a talk with Nathan Whitehorn about the camisr issue. The issue we >>>>>>>> are seeing we mostly know, but to summarize, we are losing the camisr >>>>>>>> signal and the camisr is not being run. >>>>>>>> >>>>>>>> I gave him a summary of what we have been seeing and pointed him to the >>>>>>>> code I am concerned about here: >>>>>>>> http://pastebin.com/tLKr7mCV (this is inside of kern_intr.c). >>>>>>>> >>>>>>>> What I think that is happening is that the setting of it_need to 0 >>>>>>>> inside of sys/kern/kern_intr.c:ithread_loop() is not being scheduled >>>>>>>> correctly and it is being delayed until AFTER the call to >>>>>>>> ithread_execute_handlers() right below the atomic_store_rel_int(). >>>>>>> This seems highly unlikely, to the extent that if this were true all our >>>>>>> locking primitives would be broken. The store_rel is actually a release >>>>>>> barrier which acts like more a read/write fence. No memory accesses (read or >>>>>>> write) from before the release can be scheduled after the associated store, >>>>>>> either by the compiler or CPU. That is what Konstantin is referring to in his >>>>>>> commit when he says "release" semantics". >>>>>> Yes, that makes sense, however does it specify that the writes *must* >>>>>> occur at that *point*? If it only enforces ordering then we may have >>>>>> some issue, specifically because the setting of it to '1' inside of >>>>>> intr_event_schedule_thread has no barrier other than the acq semantics >>>>>> of the thread lock. I am wondering what is forcing out the '1' there. >>>>> Nothing ever forces writes. You would have to invalidate the cache to do that >>>>> and that is horribly expensive. It is always only about ordering and knowing >>>>> that if you can complete another operation on the same "cookie" variable with >>>>> acquire semantics that earlier writes will be visible. >>>> By cookie, you mean a specific memory address, basically a lock? This is >>>> starting to reinforce my suspicions as the setting of it_need is done >>>> with release semantics, however the acq on the other CPU is done on the >>>> thread lock. Maybe that is irrelevant. We will find out shortly. >>>> >>>>>> See below as I think we have proof that this is somehow happening. >>>>> Having ih_need of 1 and it_need of 0 is certainly busted. The simplest fix >>>>> is probably to stop using atomics on it_need and just grab the thread lock >>>>> in the main ithread loop and hold it while checking and clearing it_need. >>>>> >>>> OK, we have some code that will either prove this, or perturb the memory >>>> ordering enough to make the bug go away, or prove this assertion wrong. >>>> >>>> We will update on our findings hopefully in the next few days. >>> IMO the read of it_need in the 'while (ithd->it_need)' should >>> have acquire semantic, otherwise the future reads in the >>> ithread_execute_handlers(), in particular, of the ih_need, could pass >>> the read of it_need and cause the situation you reported. I do not >>> see any acquire barrier between a condition in the while() statement >>> and the read of ih_need in the execute_handlers(). >>> >>> It is probably true that the issue you see was caused by r236456, in the >>> sense that implicitely locked xchgl instruction on x86 has a full barrier >>> semantic. As result, the store_rel() was actually an acquire too, making >>> this reordering impossible. I argue that this is not a bug in r236456, >>> but the issue in the kern_intr.c. >> If I remember the code correctly that would probably explain why we see >> it only on 9.1 system. >>> On the other hand, the John' suggestion to move the manipulations of >>> it_need under the lock is probably the best anyway. >>> >> I was wondering if it would be lower latency to maintain it_need, >> however to keep another variable it_needlocked under the thread lock. >> This would result in potential superfluous interrupts, however under >> load you would allow the ithread to loop without taking the thread lock >> some number of times. >> >> I am not really sure if this is really worth the optimization >> (especially since it can result in superfluous interrupts) however it >> may reduce latency and that might be important. >> >> Is there some people that I can pass the patch onto for help with >> performance once we confirm that this is the actual bug? We can do >> internal testing, but I am worried about regressing performance of any >> form of IO for the kernel. >> >> I'll show the patch soon. >> >> Thank you for the information. This is promising. > Well, if you and I are right, the minimal patch should be > > diff --git a/sys/kern/kern_intr.c b/sys/kern/kern_intr.c > index 8d63c9b..7c21015 100644 > --- a/sys/kern/kern_intr.c > +++ b/sys/kern/kern_intr.c > @@ -1349,7 +1349,7 @@ ithread_loop(void *arg) > * we are running, it will set it_need to note that we > * should make another pass. > */ > - while (ithd->it_need) { > + while (atomic_load_acq_int(&ithd->it_need)) { > /* > * This might need a full read and write barrier > * to make sure that this write posts before any > OK we can try this. I've been pretty good at locking when using mutexes, but when we get into the atomic ops like this it gets a little tough for me to follow without extensive research. I know that the signalling thread (swi_sched caller) does not use any atomic ops... is this OK? -Alfred