From owner-freebsd-stable@FreeBSD.ORG Thu Oct 27 23:04:55 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4FC291065670 for ; Thu, 27 Oct 2011 23:04:55 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta09.westchester.pa.mail.comcast.net (qmta09.westchester.pa.mail.comcast.net [76.96.62.96]) by mx1.freebsd.org (Postfix) with ESMTP id 129D38FC0A for ; Thu, 27 Oct 2011 23:04:54 +0000 (UTC) Received: from omta19.westchester.pa.mail.comcast.net ([76.96.62.98]) by qmta09.westchester.pa.mail.comcast.net with comcast id pybd1h00627AodY59z4vTt; Thu, 27 Oct 2011 23:04:55 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta19.westchester.pa.mail.comcast.net with comcast id pz4u1h00a1t3BNj3fz4u1w; Thu, 27 Oct 2011 23:04:55 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id EA5EE102C19; Thu, 27 Oct 2011 16:04:52 -0700 (PDT) Date: Thu, 27 Oct 2011 16:04:52 -0700 From: Jeremy Chadwick To: Vincent Hoffman Message-ID: <20111027230452.GA22060@icarus.home.lan> References: <4EA9E0C3.5080306@unsane.co.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4EA9E0C3.5080306@unsane.co.uk> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: FreeBSD Stable Mailing List Subject: Re: mfi timeouts X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 27 Oct 2011 23:04:55 -0000 On Thu, Oct 27, 2011 at 11:52:51PM +0100, Vincent Hoffman wrote: > I've recently installed a new NAS at work which uses a rebranded LSI > megaraid sas > [root@banshee ~]# mfiutil show adapter > mfi0 Adapter: > Product Name: Supermicro SMC2108 > Serial Number: > Firmware: 12.12.0-0047 > RAID Levels: JBOD, RAID0, RAID1, RAID5, RAID6, RAID10, RAID50 > Battery Backup: present > NVRAM: 32K > Onboard Memory: 512M > Minimum Stripe: 8k > Maximum Stripe: 1M > > I'm running 8-STABLE as of 2011-10-23 (for zfs v28 as is got 26 3Tb drives) > > I'm seeing a lot of messages like > mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 60 SECONDS > mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 90 SECONDS > mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 120 SECONDS > mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 150 SECONDS > mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 180 SECONDS > mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 210 SECONDS > mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 240 SECONDS > mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 271 SECONDS > mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 301 SECONDS > mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 331 SECONDS > mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 361 SECONDS > mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 391 SECONDS > mfi0: COMMAND 0xffffff8000b21b08 TIMEOUT AFTER 55 SECONDS > mfi0: COMMAND 0xffffff8000b21b08 TIMEOUT AFTER 85 SECONDS > > At which time I'm seeing IO stall on the array connected to the mfi > adapter, this can continue for > 20 minutes or so resuming randomly (or so it seems although a little > more on this later on) > > >From pciconf -lv > mfi0@pci0:5:0:0: class=0x010400 card=0x070015d9 chip=0x00791000 > rev=0x04 hdr=0x00 > vendor = 'LSI Logic (Was: Symbios Logic, NCR)' > class = mass storage > subclass = RAID > > >From dmesg > mfi0: port 0xe000-0xe0ff mem > 0xfbd9c000-0xfbd9ffff,0xfbdc0000-0xfbdfffff irq 32 at device 0.0 on pci5 > mfi0: Megaraid SAS driver Ver 3.00 > mfi0: 12330 (372962922s/0x0020/info) - Shutdown command received from host > mfi0: 12331 (boot + 4s/0x0020/info) - Firmware initialization started > (PCI ID 0079/1000/0700/15d9) > mfi0: 12332 (boot + 4s/0x0020/info) - Firmware version 2.120.53-1235 > mfi0: 12333 (boot + 7s/0x0008/info) - Battery Present > mfi0: 12334 (boot + 7s/0x0020/info) - Package version 12.12.0-0047 > mfi0: 12335 (boot + 7s/0x0020/info) - Board Revision > > I have found this thread from a bit of googleing but it doesnt end too well. > http://lists.freebsd.org/pipermail/freebsd-stable/2011-September/063821.html > Was this ever taken further? > > One thing I have noticed is that the stall (and timeout messages) seem > to go away if I query the card using mfiutil, I currently have a cron > doing this every 2 minutes to see if this has been coincidence or not. > > > Any suggestions welcome and i'm happy to provide more info if i can but > I dont have a duplicate to do too much debugging on, I'm happy to try > patches though. > > Is this worth filing a PR? Can you please provide uname -a output? The version of FreeBSD you're using matters greatly here. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB |