Date: Fri, 28 Oct 2011 00:39:33 +0100 From: Vincent Hoffman <vince@unsane.co.uk> To: Jeremy Chadwick <freebsd@jdc.parodius.com> Cc: FreeBSD Stable Mailing List <freebsd-stable@freebsd.org> Subject: Re: mfi timeouts Message-ID: <4EA9EBB5.2090004@unsane.co.uk> In-Reply-To: <20111027230452.GA22060@icarus.home.lan> References: <4EA9E0C3.5080306@unsane.co.uk> <20111027230452.GA22060@icarus.home.lan>
next in thread | previous in thread | raw e-mail | index | archive | help
On 28/10/2011 00:04, Jeremy Chadwick wrote: > On Thu, Oct 27, 2011 at 11:52:51PM +0100, Vincent Hoffman wrote: >> I've recently installed a new NAS at work which uses a rebranded LSI >> megaraid sas >> [root@banshee ~]# mfiutil show adapter >> mfi0 Adapter: >> Product Name: Supermicro SMC2108 >> Serial Number: >> Firmware: 12.12.0-0047 >> RAID Levels: JBOD, RAID0, RAID1, RAID5, RAID6, RAID10, RAID50 >> Battery Backup: present >> NVRAM: 32K >> Onboard Memory: 512M >> Minimum Stripe: 8k >> Maximum Stripe: 1M >> >> I'm running 8-STABLE as of 2011-10-23 (for zfs v28 as is got 26 3Tb drives) >> >> I'm seeing a lot of messages like >> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 60 SECONDS >> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 90 SECONDS >> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 120 SECONDS >> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 150 SECONDS >> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 180 SECONDS >> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 210 SECONDS >> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 240 SECONDS >> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 271 SECONDS >> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 301 SECONDS >> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 331 SECONDS >> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 361 SECONDS >> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 391 SECONDS >> mfi0: COMMAND 0xffffff8000b21b08 TIMEOUT AFTER 55 SECONDS >> mfi0: COMMAND 0xffffff8000b21b08 TIMEOUT AFTER 85 SECONDS >> >> At which time I'm seeing IO stall on the array connected to the mfi >> adapter, this can continue for >> 20 minutes or so resuming randomly (or so it seems although a little >> more on this later on) >> >> >From pciconf -lv >> mfi0@pci0:5:0:0: class=0x010400 card=0x070015d9 chip=0x00791000 >> rev=0x04 hdr=0x00 >> vendor = 'LSI Logic (Was: Symbios Logic, NCR)' >> class = mass storage >> subclass = RAID >> >> >From dmesg >> mfi0: <LSI MegaSAS Gen2> port 0xe000-0xe0ff mem >> 0xfbd9c000-0xfbd9ffff,0xfbdc0000-0xfbdfffff irq 32 at device 0.0 on pci5 >> mfi0: Megaraid SAS driver Ver 3.00 >> mfi0: 12330 (372962922s/0x0020/info) - Shutdown command received from host >> mfi0: 12331 (boot + 4s/0x0020/info) - Firmware initialization started >> (PCI ID 0079/1000/0700/15d9) >> mfi0: 12332 (boot + 4s/0x0020/info) - Firmware version 2.120.53-1235 >> mfi0: 12333 (boot + 7s/0x0008/info) - Battery Present >> mfi0: 12334 (boot + 7s/0x0020/info) - Package version 12.12.0-0047 >> mfi0: 12335 (boot + 7s/0x0020/info) - Board Revision >> >> I have found this thread from a bit of googleing but it doesnt end too well. >> http://lists.freebsd.org/pipermail/freebsd-stable/2011-September/063821.html >> Was this ever taken further? >> >> One thing I have noticed is that the stall (and timeout messages) seem >> to go away if I query the card using mfiutil, I currently have a cron >> doing this every 2 minutes to see if this has been coincidence or not. >> >> >> Any suggestions welcome and i'm happy to provide more info if i can but >> I dont have a duplicate to do too much debugging on, I'm happy to try >> patches though. >> >> Is this worth filing a PR? > Can you please provide uname -a output? The version of FreeBSD you're > using matters greatly here. > Sure FreeBSD banshee.foobar.net 8.2-STABLE FreeBSD 8.2-STABLE #2: Wed Oct 26 16:14:09 BST 2011 toor@banshee.foobar.net:/usr/obj/usr/src/sys/BANSHEE amd64 [root@banshee /usr/src]# svn info Path: . Working Copy Root Path: /usr/src URL: http://svn.freebsd.org/base/stable/8 Repository Root: http://svn.freebsd.org/base Repository UUID: ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f Revision: 226708 Node Kind: directory Schedule: normal Last Changed Author: brueffer Last Changed Rev: 226671 Last Changed Date: 2011-10-23 19:37:57 +0100 (Sun, 23 Oct 2011) It's looking like the mfiutil query stopping the stall is not a coincidence the last 2 have lasted less than the every 2 minutes that i set the cron to run, much less than previously. The cron is a simple /usr/sbin/mfiutil show volumes | grep -v OPTIMAL So get at least get an email if the volume breaks ;) Oct 28 00:01:06 banshee mfi0: COMMAND 0xffffff8000b22d18 TIMEOUT AFTER 59 SECONDS Oct 28 00:01:36 banshee mfi0: COMMAND 0xffffff8000b22d18 TIMEOUT AFTER 89 SECONDS Oct 28 00:13:09 banshee mfi0: COMMAND 0xffffff8000b205c8 TIMEOUT AFTER 50 SECONDS Oct 28 00:13:39 banshee mfi0: COMMAND 0xffffff8000b205c8 TIMEOUT AFTER 80 SECONDS I'm guessing this must kick something on the card. Vince
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4EA9EBB5.2090004>