From owner-freebsd-scsi@FreeBSD.ORG Wed Mar 23 16:25:07 2011 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8984B1065672 for ; Wed, 23 Mar 2011 16:25:07 +0000 (UTC) (envelope-from weiler@soe.ucsc.edu) Received: from mail-01.cse.ucsc.edu (mail-01.cse.ucsc.edu [128.114.48.32]) by mx1.freebsd.org (Postfix) with ESMTP id 58CD38FC13 for ; Wed, 23 Mar 2011 16:25:00 +0000 (UTC) Received: from wraith.cse.ucsc.edu (wraith.cse.ucsc.edu [128.114.56.35]) by mail-01.cse.ucsc.edu (Postfix) with ESMTPSA id E1E1B1009C09; Wed, 23 Mar 2011 09:24:59 -0700 (PDT) Message-ID: <4D8A1EDB.50206@soe.ucsc.edu> Date: Wed, 23 Mar 2011 09:24:59 -0700 From: Erich Weiler User-Agent: Thunderbird 2.0.0.24 (X11/20100318) MIME-Version: 1.0 To: Neil Schelly References: <20169999.131828.1300819424872.JavaMail.root@mail.corp> <4D89459B.6090506@soe.ucsc.edu> In-Reply-To: <4D89459B.6090506@soe.ucsc.edu> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-scsi@freebsd.org Subject: Re: Serious Dell Sadness - H200, H700, and H800 X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 Mar 2011 16:25:07 -0000 Well after letting it run all night, the patch appears to be working as expected. Fantastic! I'm putting the machine into production just so the users can bang away at it in their own way, they'll find any way of crashing it, if it is possible, that I did not. ;) Neil, you mentioned that there may be a performance hit from the extra read operation the patch executes. Does that mean for every single read or write operation, there is an extra read operation? Such that the number of I/Os to the disk is multiplied by two? Or is it only an extra read operation at the end of an interrupt or something (forgive my ignorance, I'm not fully versed on how interrupts affect the bus)? If the latter, would the performance hit only be like 1-2% in practice? If the former, would that mean a 50% performance hit? On 03/22/11 17:58, Erich Weiler wrote: > This is great news! I've patched my kernel (8.2-PRERELEASE) and am > testing it now by running two concurrent looping iozone runs and also > rsyncing 1TB of data to my two SAS chained MD1200s at the same time (via > my Perc H800 controller). The disks are definitely busy but hanging in > there, but then again it's only been an hour. If it's still going in > the morning and I see no TIMEOUT messages in my logs I'll call it a win. > I'll let you guys know how that works for me. > > Thanks Scott and Neil! > > If this is blessed by whoever blesses such things, can it be pushed into > 8-STABLE? > > On 3/22/11 11:43 AM, Neil Schelly wrote: >> We have reached some conclusion on this issue, and a positive one at >> that. Big Credit here goes to Scott Long, who was able to help us >> debug the issue with a patch to the driver that has completely >> resolved the issue for us. He gave permission for me to >> post/distribute this patch, and sees no reason it couldn't be made a >> part of the MFI driver base. I've pasted it at the bottom of this >> message. >> >> His explanation centers around out-of-band interrupt synchronization >> on the PCI bus. Interrupts associated with the completion of I/O >> operations from the card to the CPU are getting lost/ignored. By >> issuing a dummy read operation (thus forcing a flush of data buffers), >> this issue is largely averted. He strongly suspects that the >> controller firmware is de-asserting an interrupt prematurely, so that >> the OS never responds to the I/O operation and things just hang. Once >> something like mfiutil is run, it reads from the device, unlocking the >> bus, and things continue as normal. The patch adds extraneous read >> operations into the end of the interrupt handler, which keeps things >> flowing more normally, albeit with a slight performance hit by having >> the extra read operations. >> >> I am unsure if this completely eliminates the race condition, but it >> will at least have to happen in a much smaller window of time with >> this patch. We have been unable to reproduce the problem while >> running this version. From the sound of his explanation, it's also >> possible this problem doesn't exist except when accessing the card via >> PCI semantics. If the device were operating in MSI mode (PCI >> Express), where interrupt handling is significantly different, this >> may not come up at all. >> >> Thanks again to Scott Long for the help. Here's patch: >> >> Index: mfi.c >> =================================================================== >> RCS file: /usr/ncvs/src/sys/dev/mfi/mfi.c,v >> retrieving revision 1.54 >> diff -u -r1.54 mfi.c >> --- mfi.c 7 Dec 2009 21:24:07 -0000 1.54 >> +++ mfi.c 13 Mar 2011 04:12:35 -0000 >> @@ -928,6 +928,12 @@ >> if (sc->mfi_check_clear_intr(sc)) >> return; >> >> + /* >> + * Do a dummy read to flush the interrupt ACK that we just performed, >> + * ensuring that everything is really, truly consistent. >> + */ >> + (void)sc->mfi_read_fw_status(sc); >> + >> pi = sc->mfi_comms->hw_pi; >> ci = sc->mfi_comms->hw_ci; >> mtx_lock(&sc->mfi_io_lock); >> >> -- >> Neil Schelly >> Director of Uptime >> Dynamic Network Services, Inc. >> W: 603-296-1581 >> M: 508-410-4776 >> http://www.dyndns.com >> _______________________________________________ >> freebsd-scsi@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-scsi >> To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org" > _______________________________________________ > freebsd-scsi@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-scsi > To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org"