Date: Thu, 27 Oct 2011 16:04:52 -0700 From: Jeremy Chadwick <freebsd@jdc.parodius.com> To: Vincent Hoffman <vince@unsane.co.uk> Cc: FreeBSD Stable Mailing List <freebsd-stable@freebsd.org> Subject: Re: mfi timeouts Message-ID: <20111027230452.GA22060@icarus.home.lan> In-Reply-To: <4EA9E0C3.5080306@unsane.co.uk> References: <4EA9E0C3.5080306@unsane.co.uk>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Oct 27, 2011 at 11:52:51PM +0100, Vincent Hoffman wrote: > I've recently installed a new NAS at work which uses a rebranded LSI > megaraid sas > [root@banshee ~]# mfiutil show adapter > mfi0 Adapter: > Product Name: Supermicro SMC2108 > Serial Number: > Firmware: 12.12.0-0047 > RAID Levels: JBOD, RAID0, RAID1, RAID5, RAID6, RAID10, RAID50 > Battery Backup: present > NVRAM: 32K > Onboard Memory: 512M > Minimum Stripe: 8k > Maximum Stripe: 1M > > I'm running 8-STABLE as of 2011-10-23 (for zfs v28 as is got 26 3Tb drives) > > I'm seeing a lot of messages like > mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 60 SECONDS > mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 90 SECONDS > mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 120 SECONDS > mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 150 SECONDS > mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 180 SECONDS > mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 210 SECONDS > mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 240 SECONDS > mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 271 SECONDS > mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 301 SECONDS > mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 331 SECONDS > mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 361 SECONDS > mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 391 SECONDS > mfi0: COMMAND 0xffffff8000b21b08 TIMEOUT AFTER 55 SECONDS > mfi0: COMMAND 0xffffff8000b21b08 TIMEOUT AFTER 85 SECONDS > > At which time I'm seeing IO stall on the array connected to the mfi > adapter, this can continue for > 20 minutes or so resuming randomly (or so it seems although a little > more on this later on) > > >From pciconf -lv > mfi0@pci0:5:0:0: class=0x010400 card=0x070015d9 chip=0x00791000 > rev=0x04 hdr=0x00 > vendor = 'LSI Logic (Was: Symbios Logic, NCR)' > class = mass storage > subclass = RAID > > >From dmesg > mfi0: <LSI MegaSAS Gen2> port 0xe000-0xe0ff mem > 0xfbd9c000-0xfbd9ffff,0xfbdc0000-0xfbdfffff irq 32 at device 0.0 on pci5 > mfi0: Megaraid SAS driver Ver 3.00 > mfi0: 12330 (372962922s/0x0020/info) - Shutdown command received from host > mfi0: 12331 (boot + 4s/0x0020/info) - Firmware initialization started > (PCI ID 0079/1000/0700/15d9) > mfi0: 12332 (boot + 4s/0x0020/info) - Firmware version 2.120.53-1235 > mfi0: 12333 (boot + 7s/0x0008/info) - Battery Present > mfi0: 12334 (boot + 7s/0x0020/info) - Package version 12.12.0-0047 > mfi0: 12335 (boot + 7s/0x0020/info) - Board Revision > > I have found this thread from a bit of googleing but it doesnt end too well. > http://lists.freebsd.org/pipermail/freebsd-stable/2011-September/063821.html > Was this ever taken further? > > One thing I have noticed is that the stall (and timeout messages) seem > to go away if I query the card using mfiutil, I currently have a cron > doing this every 2 minutes to see if this has been coincidence or not. > > > Any suggestions welcome and i'm happy to provide more info if i can but > I dont have a duplicate to do too much debugging on, I'm happy to try > patches though. > > Is this worth filing a PR? Can you please provide uname -a output? The version of FreeBSD you're using matters greatly here. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB |
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20111027230452.GA22060>