From owner-freebsd-stable@FreeBSD.ORG Fri Oct 28 03:41:13 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 83ADE1065674 for ; Fri, 28 Oct 2011 03:41:13 +0000 (UTC) (envelope-from janm@transactionware.com) Received: from midgard.transactionware.com (mail2.transactionware.com [203.14.245.36]) by mx1.freebsd.org (Postfix) with SMTP id E9C898FC0A for ; Fri, 28 Oct 2011 03:41:12 +0000 (UTC) Received: (qmail 45931 invoked by uid 907); 28 Oct 2011 03:14:29 -0000 Received: from jmmacpro.transactionware.com (HELO jmmacpro.transactionware.com) (192.168.1.33) by midgard.transactionware.com (qpsmtpd/0.82) with ESMTP; Fri, 28 Oct 2011 14:14:29 +1100 Mime-Version: 1.0 (Apple Message framework v1251.1) Content-Type: text/plain; charset=iso-8859-1 From: Jan Mikkelsen In-Reply-To: <4EA9EBB5.2090004@unsane.co.uk> Date: Fri, 28 Oct 2011 14:14:28 +1100 Content-Transfer-Encoding: quoted-printable Message-Id: <992755CA-6479-4B9A-A3D5-DD5C1871089A@transactionware.com> References: <4EA9E0C3.5080306@unsane.co.uk> <20111027230452.GA22060@icarus.home.lan> <4EA9EBB5.2090004@unsane.co.uk> To: Vincent Hoffman X-Mailer: Apple Mail (2.1251.1) Cc: FreeBSD Stable Mailing List , Jeremy Chadwick Subject: Re: mfi timeouts X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 28 Oct 2011 03:41:13 -0000 Hi, There is a patch linked to from this PR, which seems very similar: http://www.freebsd.org/cgi/query-pr.cgi?pr=3Dkern/140416 http://lists.freebsd.org/pipermail/freebsd-scsi/2011-March/004839.html The problem is also consistent with running mfiutil clearing the = problem. I'm about to deploy mfi controllers in a similar configuration, so I'd = be very curious about whether the patch fixes the problem for you. Regards, Jan Mikkelsen On 28/10/2011, at 10:39 AM, Vincent Hoffman wrote: > On 28/10/2011 00:04, Jeremy Chadwick wrote: >> On Thu, Oct 27, 2011 at 11:52:51PM +0100, Vincent Hoffman wrote: >>> I've recently installed a new NAS at work which uses a rebranded = LSI >>> megaraid sas >>> [root@banshee ~]# mfiutil show adapter >>> mfi0 Adapter: >>> Product Name: Supermicro SMC2108 >>> Serial Number: >>> Firmware: 12.12.0-0047 >>> RAID Levels: JBOD, RAID0, RAID1, RAID5, RAID6, RAID10, RAID50 >>> Battery Backup: present >>> NVRAM: 32K >>> Onboard Memory: 512M >>> Minimum Stripe: 8k >>> Maximum Stripe: 1M >>>=20 >>> I'm running 8-STABLE as of 2011-10-23 (for zfs v28 as is got 26 3Tb = drives) >>>=20 >>> I'm seeing a lot of messages like >>> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 60 SECONDS >>> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 90 SECONDS >>> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 120 SECONDS >>> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 150 SECONDS >>> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 180 SECONDS >>> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 210 SECONDS >>> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 240 SECONDS >>> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 271 SECONDS >>> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 301 SECONDS >>> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 331 SECONDS >>> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 361 SECONDS >>> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 391 SECONDS >>> mfi0: COMMAND 0xffffff8000b21b08 TIMEOUT AFTER 55 SECONDS >>> mfi0: COMMAND 0xffffff8000b21b08 TIMEOUT AFTER 85 SECONDS >>>=20 >>> At which time I'm seeing IO stall on the array connected to the mfi >>> adapter, this can continue for >>> 20 minutes or so resuming randomly (or so it seems although a little >>> more on this later on) >>>=20 >>>> =46rom pciconf -lv >>> mfi0@pci0:5:0:0: class=3D0x010400 card=3D0x070015d9 = chip=3D0x00791000 >>> rev=3D0x04 hdr=3D0x00 >>> vendor =3D 'LSI Logic (Was: Symbios Logic, NCR)' >>> class =3D mass storage >>> subclass =3D RAID >>>=20 >>>> =46rom dmesg >>> mfi0: port 0xe000-0xe0ff mem >>> 0xfbd9c000-0xfbd9ffff,0xfbdc0000-0xfbdfffff irq 32 at device 0.0 on = pci5 >>> mfi0: Megaraid SAS driver Ver 3.00 >>> mfi0: 12330 (372962922s/0x0020/info) - Shutdown command received = from host >>> mfi0: 12331 (boot + 4s/0x0020/info) - Firmware initialization = started >>> (PCI ID 0079/1000/0700/15d9) >>> mfi0: 12332 (boot + 4s/0x0020/info) - Firmware version 2.120.53-1235 >>> mfi0: 12333 (boot + 7s/0x0008/info) - Battery Present >>> mfi0: 12334 (boot + 7s/0x0020/info) - Package version 12.12.0-0047 >>> mfi0: 12335 (boot + 7s/0x0020/info) - Board Revision >>>=20 >>> I have found this thread from a bit of googleing but it doesnt end = too well. >>> = http://lists.freebsd.org/pipermail/freebsd-stable/2011-September/063821.ht= ml >>> Was this ever taken further? >>>=20 >>> One thing I have noticed is that the stall (and timeout messages) = seem >>> to go away if I query the card using mfiutil, I currently have a = cron >>> doing this every 2 minutes to see if this has been coincidence or = not. >>>=20 >>>=20 >>> Any suggestions welcome and i'm happy to provide more info if i can = but >>> I dont have a duplicate to do too much debugging on, I'm happy to = try >>> patches though. >>>=20 >>> Is this worth filing a PR? >> Can you please provide uname -a output? The version of FreeBSD = you're >> using matters greatly here. >>=20 > Sure > FreeBSD banshee.foobar.net 8.2-STABLE FreeBSD 8.2-STABLE #2: Wed Oct = 26 > 16:14:09 BST 2011 =20 > toor@banshee.foobar.net:/usr/obj/usr/src/sys/BANSHEE amd64 > [root@banshee /usr/src]# svn info > Path: . > Working Copy Root Path: /usr/src > URL: http://svn.freebsd.org/base/stable/8 > Repository Root: http://svn.freebsd.org/base > Repository UUID: ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f > Revision: 226708 > Node Kind: directory > Schedule: normal > Last Changed Author: brueffer > Last Changed Rev: 226671 > Last Changed Date: 2011-10-23 19:37:57 +0100 (Sun, 23 Oct 2011) >=20 >=20 > It's looking like the mfiutil query stopping the stall is not a = coincidence > the last 2 have lasted less than the every 2 minutes that i set the = cron > to run, much less than previously. > The cron is a simple /usr/sbin/mfiutil show volumes | grep -v OPTIMAL=20= > So get at least get an email if the volume breaks ;) > Oct 28 00:01:06 banshee mfi0: COMMAND 0xffffff8000b22d18 TIMEOUT AFTER > 59 SECONDS > Oct 28 00:01:36 banshee mfi0: COMMAND 0xffffff8000b22d18 TIMEOUT AFTER > 89 SECONDS > Oct 28 00:13:09 banshee mfi0: COMMAND 0xffffff8000b205c8 TIMEOUT AFTER > 50 SECONDS > Oct 28 00:13:39 banshee mfi0: COMMAND 0xffffff8000b205c8 TIMEOUT AFTER > 80 SECONDS >=20 > I'm guessing this must kick something on the card. >=20 > Vince >=20 > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to = "freebsd-stable-unsubscribe@freebsd.org"