From owner-freebsd-scsi@FreeBSD.ORG Mon Mar 21 11:07:06 2011 Return-Path: Delivered-To: freebsd-scsi@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2A335106566B for ; Mon, 21 Mar 2011 11:07:06 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 1AE468FC0A for ; Mon, 21 Mar 2011 11:07:06 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p2LB75hC086107 for ; Mon, 21 Mar 2011 11:07:05 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p2LB75XW086105 for freebsd-scsi@FreeBSD.org; Mon, 21 Mar 2011 11:07:05 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 21 Mar 2011 11:07:05 GMT Message-Id: <201103211107.p2LB75XW086105@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-scsi@FreeBSD.org Cc: Subject: Current problem reports assigned to freebsd-scsi@FreeBSD.org X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 21 Mar 2011 11:07:06 -0000 Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/154432 scsi [xpt] run_interrupt_driven_hooks: still waiting after o kern/153361 scsi [ciss] Smart Array 5300 boot/detect drive problem o kern/152250 scsi [ciss] [patch] Kernel panic when hw.ciss.expose_hidden o kern/151564 scsi [ciss] ciss(4) should increase CISS_MAX_LOGICAL to 10 o docs/151336 scsi Missing documentation of scsi_ and ata_ functions in c s kern/149927 scsi [cam] hard drive not stopped before removing power dur o kern/148083 scsi [aac] Strange device reporting o kern/147704 scsi [mpt] sys/dev/mpt: new chip revision, partially unsupp o kern/146287 scsi [ciss] ciss(4) cannot see more than one SmartArray con o kern/145768 scsi [mpt] can't perform I/O on SAS based SAN disk in freeb o kern/144648 scsi [aac] Strange values of speed and bus width in dmesg o kern/144301 scsi [ciss] [hang] HP proliant server locks when using ciss o kern/142351 scsi [mpt] LSILogic driver performance problems o kern/141934 scsi [cam] [patch] add support for SEAGATE DAT Scopion 130 o kern/134488 scsi [mpt] MPT SCSI driver probes max. 8 LUNs per device o kern/132250 scsi [ciss] ciss driver does not support more then 15 drive o kern/132206 scsi [mpt] system panics on boot when mirroring and 2nd dri o kern/130621 scsi [mpt] tranfer rate is inscrutable slow when use lsi213 o kern/129602 scsi [ahd] ahd(4) gets confused and wedges SCSI bus o kern/128452 scsi [sa] [panic] Accessing SCSI tape drive randomly crashe o kern/128245 scsi [scsi] "inquiry data fails comparison at DV1 step" [re o kern/127927 scsi [isp] isp(4) target driver crashes kernel when set up o kern/127717 scsi [ata] [patch] [request] - support write cache toggling o kern/124667 scsi [amd] [panic] FreeBSD-7 kernel page faults at amd-scsi o kern/123674 scsi [ahc] ahc driver dumping o kern/123520 scsi [ahd] unable to boot from net while using ahd o sparc/121676 scsi [iscsi] iscontrol do not connect iscsi-target on sparc o kern/120487 scsi [sg] scsi_sg incompatible with scanners o kern/120247 scsi [mpt] FreeBSD 6.3 and LSI Logic 1030 = only 3.300MB/s o kern/114597 scsi [sym] System hangs at SCSI bus reset with dual HBAs o kern/110847 scsi [ahd] Tyan U320 onboard problem with more than 3 disks o kern/99954 scsi [ahc] reading from DVD failes on 6.x [regression] f kern/94838 scsi Kernel panic while mounting SD card with lock switch o o kern/92798 scsi [ahc] SCSI problem with timeouts o kern/90282 scsi [sym] SCSI bus resets cause loss of ch device o kern/76178 scsi [ahd] Problem with ahd and large SCSI Raid system o kern/74627 scsi [ahc] [hang] Adaptec 2940U2W Can't boot 5.3 s kern/61165 scsi [panic] kernel page fault after calling cam_send_ccb o kern/60641 scsi [sym] Sporadic SCSI bus resets with 53C810 under load o kern/60598 scsi wire down of scsi devices conflicts with config s kern/57398 scsi [mly] Current fails to install on mly(4) based RAID di o bin/57088 scsi [cam] [patch] for a possible fd leak in libcam.c o kern/52638 scsi [panic] SCSI U320 on SMP server won't run faster than o kern/44587 scsi dev/dpt/dpt.h is missing defines required for DPT_HAND f kern/40895 scsi wierd kernel / device driver bug o kern/39388 scsi ncr/sym drivers fail with 53c810 and more than 256MB m o kern/35234 scsi World access to /dev/pass? (for scanner) requires acce 47 problems total. From owner-freebsd-scsi@FreeBSD.ORG Mon Mar 21 22:20:34 2011 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7C3EC106564A for ; Mon, 21 Mar 2011 22:20:34 +0000 (UTC) (envelope-from olavgg@gmail.com) Received: from mail-qw0-f54.google.com (mail-qw0-f54.google.com [209.85.216.54]) by mx1.freebsd.org (Postfix) with ESMTP id 320828FC14 for ; Mon, 21 Mar 2011 22:20:33 +0000 (UTC) Received: by qwc9 with SMTP id 9so4998105qwc.13 for ; Mon, 21 Mar 2011 15:20:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:date:message-id:subject:from:to :content-type; bh=Emfjdu5uMFL0seLLxzc2pVPRWfImJUUdsJ1kgQc3MwE=; b=uHPnP6nhvnc6SaAjAYnQzToPxVea4OXW6woML1qZg5ksWJ/reJUh8HKq+jP391GC5q S4nSottS2FqogCdp6NQkx92V/i0wVKkXH+l0IpPBhq+cJA7I6bfwUMFtiFG1dxK0wSO9 C397x2UhHo1xOUp2cpi0+nRsCvQDutNXpIfT0= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; b=mD9OM/jQkf7s2vGRgGEi2FysLZtl1q+1q4DMALpMbrgn9K4ESUU9k/rIUrDPMdIYZX S1WF/1vNc0O2sZrktMEtCjReFxQAIoEga+oEidtC2sWPfG6PSAUDEGpJ/osmmzAn5BBw FqPrR88FRtrtEwZZ8P10FtCVX/k2CLRuSjD4M= MIME-Version: 1.0 Received: by 10.229.78.32 with SMTP id i32mr3892668qck.5.1300744267644; Mon, 21 Mar 2011 14:51:07 -0700 (PDT) Received: by 10.229.130.131 with HTTP; Mon, 21 Mar 2011 14:51:07 -0700 (PDT) Date: Mon, 21 Mar 2011 22:51:07 +0100 Message-ID: From: Olav Gjerde To: freebsd-scsi@freebsd.org X-Mailman-Approved-At: Mon, 21 Mar 2011 22:23:42 +0000 Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: The iscsi initiator on one machine doesn't work X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 21 Mar 2011 22:20:34 -0000 The target runs istgt and is running FreeBSD 8.2- Stable I see this in /var/log/messages when the initiator connects Mar 21 22:40:29 zpool istgt[3440]: istgt_iscsi.c:4328:istgt_iscsi_send_nopin: ***ERROR*** before Full Feature Mar 21 22:40:29 zpool istgt[3440]: istgt_iscsi.c:4817:worker: ***ERROR*** iscsi_send_nopin() failed I get following error message on the initators screen: recvpdu: Socket is not connected recvpdu failed Why do the initiator fail? It's a new minimal installation of FreeBSD 8.2-Release I tried to use the initiator on the same system as the target runs on and that worked great. From owner-freebsd-scsi@FreeBSD.ORG Tue Mar 22 18:43:46 2011 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3FB21106564A for ; Tue, 22 Mar 2011 18:43:46 +0000 (UTC) (envelope-from nschelly@dyn.com) Received: from dynmail-01-mht.dyndns.com (dynmail-01-mht.dyndns.com [216.146.45.13]) by mx1.freebsd.org (Postfix) with ESMTP id 03BC68FC0A for ; Tue, 22 Mar 2011 18:43:45 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by dynmail-01-mht.dyndns.com (Postfix) with ESMTP id 473B9234042 for ; Tue, 22 Mar 2011 14:43:45 -0400 (EDT) X-Virus-Scanned: amavisd-new at dynmail-01-mht.dyndns.com Received: from dynmail-01-mht.dyndns.com ([127.0.0.1]) by localhost (dynmail-01-mht.dyndns.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id siwmGD1zRB9X for ; Tue, 22 Mar 2011 14:43:44 -0400 (EDT) Received: from mail.corp.dyndns.com (mail.corp.dyndns.com [216.146.45.14]) by dynmail-01-mht.dyndns.com (Postfix) with ESMTP id EC49123403F for ; Tue, 22 Mar 2011 14:43:44 -0400 (EDT) Date: Tue, 22 Mar 2011 14:43:44 -0400 (EDT) From: Neil Schelly To: freebsd-scsi@freebsd.org Message-ID: <20169999.131828.1300819424872.JavaMail.root@mail.corp> In-Reply-To: <28269840.97080.1299630735538.JavaMail.root@mail.corp> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.16.12.87] X-Mailer: Zimbra 6.0.7_GA_2473.UBUNTU8 (ZimbraWebClient - SAF3 (Linux)/6.0.7_GA_2473.UBUNTU8) Subject: Re: Serious Dell Sadness - H200, H700, and H800 X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 22 Mar 2011 18:43:46 -0000 We have reached some conclusion on this issue, and a positive one at that. Big Credit here goes to Scott Long, who was able to help us debug the issue with a patch to the driver that has completely resolved the issue for us. He gave permission for me to post/distribute this patch, and sees no reason it couldn't be made a part of the MFI driver base. I've pasted it at the bottom of this message. His explanation centers around out-of-band interrupt synchronization on the PCI bus. Interrupts associated with the completion of I/O operations from the card to the CPU are getting lost/ignored. By issuing a dummy read operation (thus forcing a flush of data buffers), this issue is largely averted. He strongly suspects that the controller firmware is de-asserting an interrupt prematurely, so that the OS never responds to the I/O operation and things just hang. Once something like mfiutil is run, it reads from the device, unlocking the bus, and things continue as normal. The patch adds extraneous read operations into the end of the interrupt handler, which keeps things flowing more normally, albeit with a slight performance hit by having the extra read operations. I am unsure if this completely eliminates the race condition, but it will at least have to happen in a much smaller window of time with this patch. We have been unable to reproduce the problem while running this version. From the sound of his explanation, it's also possible this problem doesn't exist except when accessing the card via PCI semantics. If the device were operating in MSI mode (PCI Express), where interrupt handling is significantly different, this may not come up at all. Thanks again to Scott Long for the help. Here's patch: Index: mfi.c =================================================================== RCS file: /usr/ncvs/src/sys/dev/mfi/mfi.c,v retrieving revision 1.54 diff -u -r1.54 mfi.c --- mfi.c 7 Dec 2009 21:24:07 -0000 1.54 +++ mfi.c 13 Mar 2011 04:12:35 -0000 @@ -928,6 +928,12 @@ if (sc->mfi_check_clear_intr(sc)) return; + /* + * Do a dummy read to flush the interrupt ACK that we just performed, + * ensuring that everything is really, truly consistent. + */ + (void)sc->mfi_read_fw_status(sc); + pi = sc->mfi_comms->hw_pi; ci = sc->mfi_comms->hw_ci; mtx_lock(&sc->mfi_io_lock); -- Neil Schelly Director of Uptime Dynamic Network Services, Inc. W: 603-296-1581 M: 508-410-4776 http://www.dyndns.com From owner-freebsd-scsi@FreeBSD.ORG Wed Mar 23 00:58:04 2011 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BC388106564A for ; Wed, 23 Mar 2011 00:58:04 +0000 (UTC) (envelope-from weiler@soe.ucsc.edu) Received: from mail-01.cse.ucsc.edu (mail-01.cse.ucsc.edu [128.114.48.32]) by mx1.freebsd.org (Postfix) with ESMTP id A88348FC17 for ; Wed, 23 Mar 2011 00:58:04 +0000 (UTC) Received: from erich-weilers-macbook-pro.local (hgfw-01.soe.ucsc.edu [128.114.58.17]) by mail-01.cse.ucsc.edu (Postfix) with ESMTPSA id 1FBFD10083D4; Tue, 22 Mar 2011 17:58:04 -0700 (PDT) Message-ID: <4D89459B.6090506@soe.ucsc.edu> Date: Tue, 22 Mar 2011 17:58:03 -0700 From: Erich Weiler User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.15) Gecko/20110303 Thunderbird/3.1.9 MIME-Version: 1.0 To: Neil Schelly References: <20169999.131828.1300819424872.JavaMail.root@mail.corp> In-Reply-To: <20169999.131828.1300819424872.JavaMail.root@mail.corp> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-scsi@freebsd.org Subject: Re: Serious Dell Sadness - H200, H700, and H800 X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 Mar 2011 00:58:04 -0000 This is great news! I've patched my kernel (8.2-PRERELEASE) and am testing it now by running two concurrent looping iozone runs and also rsyncing 1TB of data to my two SAS chained MD1200s at the same time (via my Perc H800 controller). The disks are definitely busy but hanging in there, but then again it's only been an hour. If it's still going in the morning and I see no TIMEOUT messages in my logs I'll call it a win. I'll let you guys know how that works for me. Thanks Scott and Neil! If this is blessed by whoever blesses such things, can it be pushed into 8-STABLE? On 3/22/11 11:43 AM, Neil Schelly wrote: > We have reached some conclusion on this issue, and a positive one at that. Big Credit here goes to Scott Long, who was able to help us debug the issue with a patch to the driver that has completely resolved the issue for us. He gave permission for me to post/distribute this patch, and sees no reason it couldn't be made a part of the MFI driver base. I've pasted it at the bottom of this message. > > His explanation centers around out-of-band interrupt synchronization on the PCI bus. Interrupts associated with the completion of I/O operations from the card to the CPU are getting lost/ignored. By issuing a dummy read operation (thus forcing a flush of data buffers), this issue is largely averted. He strongly suspects that the controller firmware is de-asserting an interrupt prematurely, so that the OS never responds to the I/O operation and things just hang. Once something like mfiutil is run, it reads from the device, unlocking the bus, and things continue as normal. The patch adds extraneous read operations into the end of the interrupt handler, which keeps things flowing more normally, albeit with a slight performance hit by having the extra read operations. > > I am unsure if this completely eliminates the race condition, but it will at least have to happen in a much smaller window of time with this patch. We have been unable to reproduce the problem while running this version. From the sound of his explanation, it's also possible this problem doesn't exist except when accessing the card via PCI semantics. If the device were operating in MSI mode (PCI Express), where interrupt handling is significantly different, this may not come up at all. > > Thanks again to Scott Long for the help. Here's patch: > > Index: mfi.c > =================================================================== > RCS file: /usr/ncvs/src/sys/dev/mfi/mfi.c,v > retrieving revision 1.54 > diff -u -r1.54 mfi.c > --- mfi.c 7 Dec 2009 21:24:07 -0000 1.54 > +++ mfi.c 13 Mar 2011 04:12:35 -0000 > @@ -928,6 +928,12 @@ > if (sc->mfi_check_clear_intr(sc)) > return; > > + /* > + * Do a dummy read to flush the interrupt ACK that we just performed, > + * ensuring that everything is really, truly consistent. > + */ > + (void)sc->mfi_read_fw_status(sc); > + > pi = sc->mfi_comms->hw_pi; > ci = sc->mfi_comms->hw_ci; > mtx_lock(&sc->mfi_io_lock); > > -- > Neil Schelly > Director of Uptime > Dynamic Network Services, Inc. > W: 603-296-1581 > M: 508-410-4776 > http://www.dyndns.com > _______________________________________________ > freebsd-scsi@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-scsi > To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org" From owner-freebsd-scsi@FreeBSD.ORG Wed Mar 23 16:25:07 2011 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8984B1065672 for ; Wed, 23 Mar 2011 16:25:07 +0000 (UTC) (envelope-from weiler@soe.ucsc.edu) Received: from mail-01.cse.ucsc.edu (mail-01.cse.ucsc.edu [128.114.48.32]) by mx1.freebsd.org (Postfix) with ESMTP id 58CD38FC13 for ; Wed, 23 Mar 2011 16:25:00 +0000 (UTC) Received: from wraith.cse.ucsc.edu (wraith.cse.ucsc.edu [128.114.56.35]) by mail-01.cse.ucsc.edu (Postfix) with ESMTPSA id E1E1B1009C09; Wed, 23 Mar 2011 09:24:59 -0700 (PDT) Message-ID: <4D8A1EDB.50206@soe.ucsc.edu> Date: Wed, 23 Mar 2011 09:24:59 -0700 From: Erich Weiler User-Agent: Thunderbird 2.0.0.24 (X11/20100318) MIME-Version: 1.0 To: Neil Schelly References: <20169999.131828.1300819424872.JavaMail.root@mail.corp> <4D89459B.6090506@soe.ucsc.edu> In-Reply-To: <4D89459B.6090506@soe.ucsc.edu> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-scsi@freebsd.org Subject: Re: Serious Dell Sadness - H200, H700, and H800 X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 Mar 2011 16:25:07 -0000 Well after letting it run all night, the patch appears to be working as expected. Fantastic! I'm putting the machine into production just so the users can bang away at it in their own way, they'll find any way of crashing it, if it is possible, that I did not. ;) Neil, you mentioned that there may be a performance hit from the extra read operation the patch executes. Does that mean for every single read or write operation, there is an extra read operation? Such that the number of I/Os to the disk is multiplied by two? Or is it only an extra read operation at the end of an interrupt or something (forgive my ignorance, I'm not fully versed on how interrupts affect the bus)? If the latter, would the performance hit only be like 1-2% in practice? If the former, would that mean a 50% performance hit? On 03/22/11 17:58, Erich Weiler wrote: > This is great news! I've patched my kernel (8.2-PRERELEASE) and am > testing it now by running two concurrent looping iozone runs and also > rsyncing 1TB of data to my two SAS chained MD1200s at the same time (via > my Perc H800 controller). The disks are definitely busy but hanging in > there, but then again it's only been an hour. If it's still going in > the morning and I see no TIMEOUT messages in my logs I'll call it a win. > I'll let you guys know how that works for me. > > Thanks Scott and Neil! > > If this is blessed by whoever blesses such things, can it be pushed into > 8-STABLE? > > On 3/22/11 11:43 AM, Neil Schelly wrote: >> We have reached some conclusion on this issue, and a positive one at >> that. Big Credit here goes to Scott Long, who was able to help us >> debug the issue with a patch to the driver that has completely >> resolved the issue for us. He gave permission for me to >> post/distribute this patch, and sees no reason it couldn't be made a >> part of the MFI driver base. I've pasted it at the bottom of this >> message. >> >> His explanation centers around out-of-band interrupt synchronization >> on the PCI bus. Interrupts associated with the completion of I/O >> operations from the card to the CPU are getting lost/ignored. By >> issuing a dummy read operation (thus forcing a flush of data buffers), >> this issue is largely averted. He strongly suspects that the >> controller firmware is de-asserting an interrupt prematurely, so that >> the OS never responds to the I/O operation and things just hang. Once >> something like mfiutil is run, it reads from the device, unlocking the >> bus, and things continue as normal. The patch adds extraneous read >> operations into the end of the interrupt handler, which keeps things >> flowing more normally, albeit with a slight performance hit by having >> the extra read operations. >> >> I am unsure if this completely eliminates the race condition, but it >> will at least have to happen in a much smaller window of time with >> this patch. We have been unable to reproduce the problem while >> running this version. From the sound of his explanation, it's also >> possible this problem doesn't exist except when accessing the card via >> PCI semantics. If the device were operating in MSI mode (PCI >> Express), where interrupt handling is significantly different, this >> may not come up at all. >> >> Thanks again to Scott Long for the help. Here's patch: >> >> Index: mfi.c >> =================================================================== >> RCS file: /usr/ncvs/src/sys/dev/mfi/mfi.c,v >> retrieving revision 1.54 >> diff -u -r1.54 mfi.c >> --- mfi.c 7 Dec 2009 21:24:07 -0000 1.54 >> +++ mfi.c 13 Mar 2011 04:12:35 -0000 >> @@ -928,6 +928,12 @@ >> if (sc->mfi_check_clear_intr(sc)) >> return; >> >> + /* >> + * Do a dummy read to flush the interrupt ACK that we just performed, >> + * ensuring that everything is really, truly consistent. >> + */ >> + (void)sc->mfi_read_fw_status(sc); >> + >> pi = sc->mfi_comms->hw_pi; >> ci = sc->mfi_comms->hw_ci; >> mtx_lock(&sc->mfi_io_lock); >> >> -- >> Neil Schelly >> Director of Uptime >> Dynamic Network Services, Inc. >> W: 603-296-1581 >> M: 508-410-4776 >> http://www.dyndns.com >> _______________________________________________ >> freebsd-scsi@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-scsi >> To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org" > _______________________________________________ > freebsd-scsi@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-scsi > To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org" From owner-freebsd-scsi@FreeBSD.ORG Fri Mar 25 12:49:36 2011 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6770D1065670 for ; Fri, 25 Mar 2011 12:49:36 +0000 (UTC) (envelope-from nschelly@dyn.com) Received: from dynmail-01-mht.dyndns.com (dynmail-01-mht.dyndns.com [216.146.45.13]) by mx1.freebsd.org (Postfix) with ESMTP id AE0D58FC17 for ; Fri, 25 Mar 2011 12:49:35 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by dynmail-01-mht.dyndns.com (Postfix) with ESMTP id 09F5F17CE004; Fri, 25 Mar 2011 08:49:35 -0400 (EDT) X-Virus-Scanned: amavisd-new at dynmail-01-mht.dyndns.com Received: from dynmail-01-mht.dyndns.com ([127.0.0.1]) by localhost (dynmail-01-mht.dyndns.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id pFcayHg4ReZv; Fri, 25 Mar 2011 08:49:34 -0400 (EDT) Received: from mail.corp.dyndns.com (mail.corp.dyndns.com [216.146.45.14]) by dynmail-01-mht.dyndns.com (Postfix) with ESMTP id B29AD23404D; Fri, 25 Mar 2011 08:49:34 -0400 (EDT) Date: Fri, 25 Mar 2011 08:49:34 -0400 (EDT) From: Neil Schelly To: Erich Weiler Message-ID: <8606490.140484.1301057374592.JavaMail.root@mail.corp> In-Reply-To: <4D8A1EDB.50206@soe.ucsc.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.16.12.87] X-Mailer: Zimbra 6.0.7_GA_2473.UBUNTU8 (ZimbraWebClient - SAF3 (Linux)/6.0.7_GA_2473.UBUNTU8) Cc: freebsd-scsi@freebsd.org Subject: Re: Serious Dell Sadness - H200, H700, and H800 X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 25 Mar 2011 12:49:36 -0000 > Neil, you mentioned that there may be a performance hit from the extra > read operation the patch executes. Does that mean for every single > read > or write operation, there is an extra read operation? Such that the > number of I/Os to the disk is multiplied by two? Or is it only an > extra > read operation at the end of an interrupt or something (forgive my > ignorance, I'm not fully versed on how interrupts affect the bus)? If > the latter, would the performance hit only be like 1-2% in practice? > If > the former, would that mean a 50% performance hit? Scott's off the cuff estimate of the performance hit was 1-5%. Here's his description of what the patch actually accomplishes. > What my patch does is to re-flush the bus at the end of the interrupt > handler and check for any new command completions that have happened > while the handler was running. This isn't a perfect solution, > unfortunately. First, it adds cost through extra PCI bus reads needed > for the flush. Second, and most importantly, it doesn't completely > close the race; even after the recheck is complete, an > interrupt+completion could be transmitted from the controller in > between the driver doing that re-check and then returning to the OS. > So a race could still exist, albeit a lot smaller than it was when no > recheck was done. The only real way to close the race is to have > interrupt latching work properly so that interrupts don't get lost. Ultimately, it appears that the PCI emulation of the controller firmwares doesn't quite handle the interrupt latching properly, causing lost interrupts. I suspect most other (other OS) implementations of this driver are using MSI to request PCI Express semantics, and that the firmware has been more thoroughly tested using the edge-triggered interrupts there. While I wouldn't doubt that this patch could go into the driver code and make it a better driver, it's worth mentioning that the "right" way to fix it may be to switch to using the more robust and better performing PCI Express semantics. -- Neil Schelly Director of Uptime Dynamic Network Services, Inc. W: 603-296-1581 M: 508-410-4776 http://www.dyndns.com