From owner-freebsd-drivers@FreeBSD.ORG Mon Feb 14 14:56:25 2011 Return-Path: Delivered-To: freebsd-drivers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 15255106566B for ; Mon, 14 Feb 2011 14:56:25 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id C9FA88FC0C for ; Mon, 14 Feb 2011 14:56:24 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 482C146B06; Mon, 14 Feb 2011 09:56:24 -0500 (EST) Received: from jhbbsd.localnet (unknown [209.249.190.10]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 4E0898A01D; Mon, 14 Feb 2011 09:56:23 -0500 (EST) From: John Baldwin To: freebsd-drivers@freebsd.org Date: Mon, 14 Feb 2011 09:27:11 -0500 User-Agent: KMail/1.13.5 (FreeBSD/7.4-CBSD-20110107; KDE/4.4.5; amd64; ; ) References: <4D55BBDC.7000604@soe.ucsc.edu> In-Reply-To: <4D55BBDC.7000604@soe.ucsc.edu> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201102140927.11654.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (bigwig.baldwin.cx); Mon, 14 Feb 2011 09:56:23 -0500 (EST) X-Virus-Scanned: clamav-milter 0.96.3 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=0.5 required=4.2 tests=BAYES_00,MAY_BE_FORGED, RDNS_DYNAMIC autolearn=no version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on bigwig.baldwin.cx Cc: Subject: Re: MFI Driver Behavior X-BeenThere: freebsd-drivers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Writing device drivers for FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Feb 2011 14:56:25 -0000 On Friday, February 11, 2011 5:44:44 pm Erich Weiler wrote: > Hi All - I've posted on the forums but no one seems to have any ideas... > I have an odd lock up issue with a Perc H800 controller under > 8.2-PRERELEASE. > > We have a FreeBSD server running: > > Code: > > FreeBSD 8.2-PRERELEASE (GENERIC) #0: Thu Dec 16 14:59:46 PST 2010 > > It's a Dell R610. It has two MD1200 disk arrays on it, SAS chained > together. The controller that manages them is a Perc H800, with the > latest firmware available. > > I have the disks exported JBOD from the controller. And, the disks are > roped into a ZFS filesystem, which is exported via NFS to the local net. > > Everything works well most of the time, but every once in a while (like > once every few days), the filesystem completely hangs and we see these > errors on the console: > > mfi0: COMMAND 0xffffff80009b5870 TIMEOUT AFTER 61793 SECONDS > mfi0: COMMAND 0xffffff80009b5870 TIMEOUT AFTER 61823 SECONDS > mfi0: COMMAND 0xffffff80009b5870 TIMEOUT AFTER 61853 SECONDS > mfi0: COMMAND 0xffffff80009b5870 TIMEOUT AFTER 61923 SECONDS > > (this is after the filesystem has been hung for a day) > > etc... When I start poking around with mfiutil, it shows everything is > OK, the disks are all OK, the volumes are good, the event logs show no > errors. The "Patrol" feature is disabled. The battery is fine. > > After running "mfiutil show volumes", the lockup magically frees itself. > But, I don't want them to happen in the first place, and I certainly > don't want to have to manually run a "mfiutil show volumes" or whatever > to unlock it every time. Has anyone seen this before? > > I've actually tried another H800 controller we had on the shelf as well, > just to rule out a hardware problem with the first one, but we see the > same behavior on both controllers. > > "zpool status" also shows the disks as all OK, and a "zpool scrub" turns > up no problems. > > Any insight much appreciated!! Since multiple controllers exhibit the > same behavior, I was thinking it's falling more into a driver issue at > this point. I hope I'm right! I emailed the author of the MFI driver > for FreeBSD, but have not heard anything back, so I was hoping someone > here would have an idea of where I could turn next. > > If even there was a way I could determine what the "0xffffff80009b5870" > MFI command is, that would be a big help, so I would have a better idea > of where to continue my investigations. That value is just a pointer to the command structure in the device driver for the command that timed out. It probably is not that useful. The best person to ask about this is probably Scott Long (scottl@FreeBSD.org). The fact that 'show volumes' unsticks the controller sounds quite odd. Are you using MSI? If so, have you tried disabling MSI? -- John Baldwin