From owner-freebsd-scsi@FreeBSD.ORG Wed Mar 9 00:49:35 2011 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4E1911065675 for ; Wed, 9 Mar 2011 00:49:35 +0000 (UTC) (envelope-from nschelly@dyn.com) Received: from dynmail-01-mht.dyndns.com (dynmail-01-mht.dyndns.com [216.146.45.13]) by mx1.freebsd.org (Postfix) with ESMTP id 976908FC18 for ; Wed, 9 Mar 2011 00:49:34 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by dynmail-01-mht.dyndns.com (Postfix) with ESMTP id 053191752013 for ; Tue, 8 Mar 2011 19:32:16 -0500 (EST) X-Virus-Scanned: amavisd-new at dynmail-01-mht.dyndns.com Received: from dynmail-01-mht.dyndns.com ([127.0.0.1]) by localhost (dynmail-01-mht.dyndns.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 8yh+ogdKoBqP for ; Tue, 8 Mar 2011 19:32:15 -0500 (EST) Received: from mail.corp.dyndns.com (mail.corp.dyndns.com [216.146.45.14]) by dynmail-01-mht.dyndns.com (Postfix) with ESMTP id 954C31752012 for ; Tue, 8 Mar 2011 19:32:15 -0500 (EST) Date: Tue, 8 Mar 2011 19:32:15 -0500 (EST) From: Neil Schelly To: freebsd-scsi@freebsd.org Message-ID: <28269840.97080.1299630735538.JavaMail.root@mail.corp> In-Reply-To: <4187606.97023.1299629795456.JavaMail.root@mail.corp> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.16.252.166] X-Mailer: Zimbra 6.0.7_GA_2473.UBUNTU8 (ZimbraWebClient - SAF3 (Linux)/6.0.7_GA_2473.UBUNTU8) Subject: Re: Serious Dell Sadness - H200, H700, and H800 X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 09 Mar 2011 00:49:35 -0000 We've got some more information about the mpt testing we've been doing here. The setup we're testing is Dell PowerEdge r610 servers with PERC H800 SAS/RAID cards connected to MD1200 shelves full of 12 SAS drives. We've recreated the same problem on other configurations, including combinations of r510s, MD1220 shelves, PERC H700 cards, etc. We've also eliminated any particular piece of hardware as faulty by running these on identical hardware configurations in mirrored setups on different physical pieces of hardware. We've experienced these issues in FreeBSD 7.3, 8.1, and 8.2. We've experienced this issue with either RAID10 logical drive configurations formatted with UFS or 6-disk JBOD configurations setup in a ZFS raidz volume. We've triggered the problem with both bonnie++ and iozone. All machines are runnning the latest firmware on the H700 and H800 cards. The easiest method to reproduce this problem is with a ZFS configuration and using `iozone -a`. We have a 6-disk raidz partition with a ZFS filesystem on it. We just run `iozone -a` from within that filesystem, and I'd say 3 out of 4 times, it will eventually pause. After 45-50 seconds of pausing, you'll start seeing the console and /var/log/messages output that looks something like: mfi0: COMMAND 0xffffff8000db5fe0 TIMEOUT AFTER 105 SECONDS If we let it go for a few days, it may actually "finish" and recover, but it's essentially just stuck and not recovering. The system is responsive and fully operational except the dead controller at this point. We cannot kill the iozone process that is hung on these IO operations, even with `kill -9`. Like others have reported, we can run any of the mfiutil commands and the controller immediately begins to respond normally again. Usually, the iozone test will complete, but sometimes it will even get stuck again on the same run. We compiled mfiutil with debugging symbols so we could run it with gdb and see exactly what was causing the controller to become responsive again. It's the ioctl() call that does it. For example: `mfiutil show volumes` eventually gets to something like: mfi_dcmd_command (fd=7, opcode=50397184, buf=0x7fffffffe4a0, bufsize=1032, mbox=0x0, mboxlen=0, statusp=0x0) at /usr/src/usr.sbin/mfiutil/mfi_cmd.c:257 * fd=7 is /dev/mfi0, where the command will be sent with an ioctl command * opcode=50397184 is the MFI_DCMD_LD_GET_LIST command `mfiutil show battery` eventually gets to something like: mfi_dcmd_command (fd=7, opcode=84017152, buf=0x7fffffffea20, bufsize=48, mbox=0x0, mboxlen=0, statusp=0x7fffffffe9cf "") at /usr/src/usr.sbin/mfiutil/mfi_cmd.c:257 * fd=7 is /dev/mfi0, where the command will be sent with an ioctl command * opcode=84017152 is the MFI_DCMD_BBU_GET_CAPACITY_INFO command I wrote a small self-contained C program that can easily be modified to run any ioctl command you'd like and send it to /dev/mfi0 (attached). Use it if you'd like at your own risk, but it's essentially just running an arbitrary command with ioctl, putting nothing into the memory range normally passed by the *buf pointer. I did try sending random opcodes, and it didn't work, so it does have to be an opcode that the firmware will recognize at least, but it doesn't seem to matter which one. I'm not sure where else we should be looking for a fix. We can reliably reproduce the problem, analyze the system during the issue, and recover the system to a normal state. If there's anyone who can help us troubleshoot this with any information we can gather or even a local login remotely accessible, we're open to ideas. -- Neil Schelly Director of Uptime Dynamic Network Services, Inc. W: 603-296-1581 M: 508-410-4776 http://www.dyndns.com