From owner-freebsd-stable@FreeBSD.ORG Thu Oct 27 22:52:53 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6F1EC106566C for ; Thu, 27 Oct 2011 22:52:53 +0000 (UTC) (envelope-from vince@unsane.co.uk) Received: from unsane.co.uk (unsane-pt.tunnel.tserv5.lon1.ipv6.he.net [IPv6:2001:470:1f08:110::2]) by mx1.freebsd.org (Postfix) with ESMTP id 9D0318FC17 for ; Thu, 27 Oct 2011 22:52:52 +0000 (UTC) Received: from vhoffman-macbooklocal.local ([10.10.10.20]) (authenticated bits=0) by unsane.co.uk (8.14.4/8.14.4) with ESMTP id p9RMqod8026726 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO) for ; Thu, 27 Oct 2011 23:52:51 +0100 (BST) (envelope-from vince@unsane.co.uk) Message-ID: <4EA9E0C3.5080306@unsane.co.uk> Date: Thu, 27 Oct 2011 23:52:51 +0100 From: Vincent Hoffman User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:7.0.1) Gecko/20110929 Thunderbird/7.0.1 MIME-Version: 1.0 To: FreeBSD Stable Mailing List X-Enigmail-Version: 1.3.2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Subject: mfi timeouts X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 27 Oct 2011 22:52:53 -0000 Hi, I've recently installed a new NAS at work which uses a rebranded LSI megaraid sas [root@banshee ~]# mfiutil show adapter mfi0 Adapter: Product Name: Supermicro SMC2108 Serial Number: Firmware: 12.12.0-0047 RAID Levels: JBOD, RAID0, RAID1, RAID5, RAID6, RAID10, RAID50 Battery Backup: present NVRAM: 32K Onboard Memory: 512M Minimum Stripe: 8k Maximum Stripe: 1M I'm running 8-STABLE as of 2011-10-23 (for zfs v28 as is got 26 3Tb drives) I'm seeing a lot of messages like mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 60 SECONDS mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 90 SECONDS mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 120 SECONDS mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 150 SECONDS mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 180 SECONDS mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 210 SECONDS mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 240 SECONDS mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 271 SECONDS mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 301 SECONDS mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 331 SECONDS mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 361 SECONDS mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 391 SECONDS mfi0: COMMAND 0xffffff8000b21b08 TIMEOUT AFTER 55 SECONDS mfi0: COMMAND 0xffffff8000b21b08 TIMEOUT AFTER 85 SECONDS At which time I'm seeing IO stall on the array connected to the mfi adapter, this can continue for 20 minutes or so resuming randomly (or so it seems although a little more on this later on) >From pciconf -lv mfi0@pci0:5:0:0: class=0x010400 card=0x070015d9 chip=0x00791000 rev=0x04 hdr=0x00 vendor = 'LSI Logic (Was: Symbios Logic, NCR)' class = mass storage subclass = RAID >From dmesg mfi0: port 0xe000-0xe0ff mem 0xfbd9c000-0xfbd9ffff,0xfbdc0000-0xfbdfffff irq 32 at device 0.0 on pci5 mfi0: Megaraid SAS driver Ver 3.00 mfi0: 12330 (372962922s/0x0020/info) - Shutdown command received from host mfi0: 12331 (boot + 4s/0x0020/info) - Firmware initialization started (PCI ID 0079/1000/0700/15d9) mfi0: 12332 (boot + 4s/0x0020/info) - Firmware version 2.120.53-1235 mfi0: 12333 (boot + 7s/0x0008/info) - Battery Present mfi0: 12334 (boot + 7s/0x0020/info) - Package version 12.12.0-0047 mfi0: 12335 (boot + 7s/0x0020/info) - Board Revision I have found this thread from a bit of googleing but it doesnt end too well. http://lists.freebsd.org/pipermail/freebsd-stable/2011-September/063821.html Was this ever taken further? One thing I have noticed is that the stall (and timeout messages) seem to go away if I query the card using mfiutil, I currently have a cron doing this every 2 minutes to see if this has been coincidence or not. Any suggestions welcome and i'm happy to provide more info if i can but I dont have a duplicate to do too much debugging on, I'm happy to try patches though. Is this worth filing a PR? Vince