From owner-freebsd-stable@FreeBSD.ORG Tue Nov 8 19:50:36 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 877F91065797 for ; Tue, 8 Nov 2011 19:50:36 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 33C208FC0C for ; Tue, 8 Nov 2011 19:50:36 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id A796246B0A; Tue, 8 Nov 2011 14:50:35 -0500 (EST) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 2FC198A02F; Tue, 8 Nov 2011 14:50:35 -0500 (EST) From: John Baldwin To: freebsd-stable@freebsd.org Date: Tue, 8 Nov 2011 14:50:34 -0500 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p8; KDE/4.5.5; amd64; ; ) References: <4EA9E0C3.5080306@unsane.co.uk> <992755CA-6479-4B9A-A3D5-DD5C1871089A@transactionware.com> <4EB1BA7A.2000307@unsane.co.uk> In-Reply-To: <4EB1BA7A.2000307@unsane.co.uk> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201111081450.34686.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (bigwig.baldwin.cx); Tue, 08 Nov 2011 14:50:35 -0500 (EST) Cc: Jan Mikkelsen , Jeremy Chadwick , Vincent Hoffman Subject: Re: mfi timeouts X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Nov 2011 19:50:36 -0000 On Wednesday, November 02, 2011 5:47:38 pm Vincent Hoffman wrote: > On 28/10/2011 04:14, Jan Mikkelsen wrote: > > Hi, > > > > There is a patch linked to from this PR, which seems very similar: > > > > http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/140416 > > > > http://lists.freebsd.org/pipermail/freebsd-scsi/2011-March/004839.html > > > > The problem is also consistent with running mfiutil clearing the problem. > > > > I'm about to deploy mfi controllers in a similar configuration, so I'd be very curious about whether the patch fixes the problem for you. > The patch you linked to seems to have removed the stalls, although I > have only had it running for a day. I'll post if it stalls again though. > > I did manage to scrounge the use of a Dell r410 with a > LSI Logic / Symbios Logic MegaRAID SAS 2108 [Liberator] (rev 05) > Badged as Dell PERC H700 Adapter > > to test out the patch I originally found but had the same issue as this post > > http://lists.freebsd.org/pipermail/freebsd-stable/2011-September/063821.html > > > I couldnt get the dell to stall in the first place either though so it > could be a specific firmware version that the issue. > > Anyway thanks for the pointers. Hmm, did you try the patch I had posted from that earlier thread? It had two changes in it, one was similar to the patch in the PR, the second added MSI-X support. I've since tweaked it to make the MSI-X support off by default but possible to enable via loader.conf. Would you be willing to try the updated patch at www.freebsd.org/~jhb/patches/mfi.patch? > Vince > > > > > Regards, > > > > Jan Mikkelsen > > > > > > On 28/10/2011, at 10:39 AM, Vincent Hoffman wrote: > > > >> On 28/10/2011 00:04, Jeremy Chadwick wrote: > >>> On Thu, Oct 27, 2011 at 11:52:51PM +0100, Vincent Hoffman wrote: > >>>> I've recently installed a new NAS at work which uses a rebranded LSI > >>>> megaraid sas > >>>> [root@banshee ~]# mfiutil show adapter > >>>> mfi0 Adapter: > >>>> Product Name: Supermicro SMC2108 > >>>> Serial Number: > >>>> Firmware: 12.12.0-0047 > >>>> RAID Levels: JBOD, RAID0, RAID1, RAID5, RAID6, RAID10, RAID50 > >>>> Battery Backup: present > >>>> NVRAM: 32K > >>>> Onboard Memory: 512M > >>>> Minimum Stripe: 8k > >>>> Maximum Stripe: 1M > >>>> > >>>> I'm running 8-STABLE as of 2011-10-23 (for zfs v28 as is got 26 3Tb drives) > >>>> > >>>> I'm seeing a lot of messages like > >>>> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 60 SECONDS > >>>> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 90 SECONDS > >>>> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 120 SECONDS > >>>> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 150 SECONDS > >>>> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 180 SECONDS > >>>> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 210 SECONDS > >>>> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 240 SECONDS > >>>> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 271 SECONDS > >>>> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 301 SECONDS > >>>> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 331 SECONDS > >>>> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 361 SECONDS > >>>> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 391 SECONDS > >>>> mfi0: COMMAND 0xffffff8000b21b08 TIMEOUT AFTER 55 SECONDS > >>>> mfi0: COMMAND 0xffffff8000b21b08 TIMEOUT AFTER 85 SECONDS > >>>> > >>>> At which time I'm seeing IO stall on the array connected to the mfi > >>>> adapter, this can continue for > >>>> 20 minutes or so resuming randomly (or so it seems although a little > >>>> more on this later on) > >>>> > >>>>> From pciconf -lv > >>>> mfi0@pci0:5:0:0: class=0x010400 card=0x070015d9 chip=0x00791000 > >>>> rev=0x04 hdr=0x00 > >>>> vendor = 'LSI Logic (Was: Symbios Logic, NCR)' > >>>> class = mass storage > >>>> subclass = RAID > >>>> > >>>>> From dmesg > >>>> mfi0: port 0xe000-0xe0ff mem > >>>> 0xfbd9c000-0xfbd9ffff,0xfbdc0000-0xfbdfffff irq 32 at device 0.0 on pci5 > >>>> mfi0: Megaraid SAS driver Ver 3.00 > >>>> mfi0: 12330 (372962922s/0x0020/info) - Shutdown command received from host > >>>> mfi0: 12331 (boot + 4s/0x0020/info) - Firmware initialization started > >>>> (PCI ID 0079/1000/0700/15d9) > >>>> mfi0: 12332 (boot + 4s/0x0020/info) - Firmware version 2.120.53-1235 > >>>> mfi0: 12333 (boot + 7s/0x0008/info) - Battery Present > >>>> mfi0: 12334 (boot + 7s/0x0020/info) - Package version 12.12.0-0047 > >>>> mfi0: 12335 (boot + 7s/0x0020/info) - Board Revision > >>>> > >>>> I have found this thread from a bit of googleing but it doesnt end too well. > >>>> http://lists.freebsd.org/pipermail/freebsd-stable/2011- September/063821.html > >>>> Was this ever taken further? > >>>> > >>>> One thing I have noticed is that the stall (and timeout messages) seem > >>>> to go away if I query the card using mfiutil, I currently have a cron > >>>> doing this every 2 minutes to see if this has been coincidence or not. > >>>> > >>>> > >>>> Any suggestions welcome and i'm happy to provide more info if i can but > >>>> I dont have a duplicate to do too much debugging on, I'm happy to try > >>>> patches though. > >>>> > >>>> Is this worth filing a PR? > >>> Can you please provide uname -a output? The version of FreeBSD you're > >>> using matters greatly here. > >>> > >> Sure > >> FreeBSD banshee.foobar.net 8.2-STABLE FreeBSD 8.2-STABLE #2: Wed Oct 26 > >> 16:14:09 BST 2011 > >> toor@banshee.foobar.net:/usr/obj/usr/src/sys/BANSHEE amd64 > >> [root@banshee /usr/src]# svn info > >> Path: . > >> Working Copy Root Path: /usr/src > >> URL: http://svn.freebsd.org/base/stable/8 > >> Repository Root: http://svn.freebsd.org/base > >> Repository UUID: ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f > >> Revision: 226708 > >> Node Kind: directory > >> Schedule: normal > >> Last Changed Author: brueffer > >> Last Changed Rev: 226671 > >> Last Changed Date: 2011-10-23 19:37:57 +0100 (Sun, 23 Oct 2011) > >> > >> > >> It's looking like the mfiutil query stopping the stall is not a coincidence > >> the last 2 have lasted less than the every 2 minutes that i set the cron > >> to run, much less than previously. > >> The cron is a simple /usr/sbin/mfiutil show volumes | grep -v OPTIMAL > >> So get at least get an email if the volume breaks ;) > >> Oct 28 00:01:06 banshee mfi0: COMMAND 0xffffff8000b22d18 TIMEOUT AFTER > >> 59 SECONDS > >> Oct 28 00:01:36 banshee mfi0: COMMAND 0xffffff8000b22d18 TIMEOUT AFTER > >> 89 SECONDS > >> Oct 28 00:13:09 banshee mfi0: COMMAND 0xffffff8000b205c8 TIMEOUT AFTER > >> 50 SECONDS > >> Oct 28 00:13:39 banshee mfi0: COMMAND 0xffffff8000b205c8 TIMEOUT AFTER > >> 80 SECONDS > >> > >> I'm guessing this must kick something on the card. > >> > >> Vince > >> > >> _______________________________________________ > >> freebsd-stable@freebsd.org mailing list > >> http://lists.freebsd.org/mailman/listinfo/freebsd-stable > >> To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" > > _______________________________________________ > > freebsd-stable@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" > > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" > -- John Baldwin