From owner-freebsd-fs@FreeBSD.ORG Fri Jan 20 08:50:36 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7C095106566C for ; Fri, 20 Jan 2012 08:50:36 +0000 (UTC) (envelope-from peter.maloney@brockmann-consult.de) Received: from moutng.kundenserver.de (moutng.kundenserver.de [212.227.126.187]) by mx1.freebsd.org (Postfix) with ESMTP id 294E08FC15 for ; Fri, 20 Jan 2012 08:50:35 +0000 (UTC) Received: from [10.3.0.26] ([141.4.215.32]) by mrelayeu.kundenserver.de (node=mrbap3) with ESMTP (Nemesis) id 0Lmufk-1SI0kI46yS-00h5uU; Fri, 20 Jan 2012 09:50:35 +0100 Message-ID: <4F192ADA.5020903@brockmann-consult.de> Date: Fri, 20 Jan 2012 09:50:34 +0100 From: Peter Maloney User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.23) Gecko/20110922 Thunderbird/3.1.15 MIME-Version: 1.0 To: freebsd-fs@freebsd.org References: In-Reply-To: X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Provags-ID: V02:K0:fizP42kM/1YbZmCAU8rF0Ee2XB2d/YMKZp5K9HJS7tq FwSfNlQ5l+cC5jWgE2zyLljNYT3D0EmnsKyyRSMtoJl0oVguav l1O3zQz17h3RDHNhhr49j1XTpH1LxbdbdpCMgKMtQSUidichXb DhSKfKjvD0ZFY1eRiAAs56LrZLgYeq2isK2f5f3yvj4XfocJ3m Hc6I7ScIVztdK30ZjVQ8AqPlGwBNeN4zvpRyNpD++f+dTUhcJo A4dF4+9UqQL/tJrZ9WzF4NImLMjRUjpeVToJeUbIbCxR/rGCJW p/dsEyF038wS9/T6STxSp/Y4NYfgS+wy///S6JAL2HBOtcxHvS 1a6z3PNRNB4Qpa0Ys6mMUNryCvyYh1vIutBa+5DPK Subject: Re: sanity check: is 9211-8i, on 8.3, with IT firmware still "the one" X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 20 Jan 2012 08:50:36 -0000 John, Various people have problems with mps and ZFS. I am using 8-STABLE from October 2011, and on the 9211-8i HBA, I am using 9 IT firmware. In my case, it was the firmware on an SSD that caused problems. Crucial M4-CT256M4SSD2 firmware 0001 Randomly it would fail. Trying to reproduce with heavy IO didn't work. But I found that hot pulling works. Hot pulling the disk a few times while mounted causes the disk to never respond again until rebooting. (causing SCSI timeouts). When running "gpart recover da##" or "camcontrol reset ..." on the disk after it is removed, the kernel panics. The mpslsi driver does not solve the problem with the CT256M4SSD2 and firmware 0001, but firmware 0009 seems to work. Trying the 'lost disk' on another machine works. But FreeBSD needs to be rebooted, maybe for some part of the hardware to reset and forget about the disk. Sebulon with Samsung Spinpoint disks, here is a similar problem in this thread: http://forums.freebsd.org/showthread.php?t=27128 And Beeblebrox, with different Samsung Spinpoint disks: http://forums.freebsd.org/showthread.php?p=162201#post162201 And Jason Wolfe, with Seagate ST91000640SS disks (with mps): http://osdir.com/ml/freebsd-scsi/2011-11/msg00006.html (freebsd-fs list, with original post at 11/01/2011 07:13 PM CET) But with mpslsi, the problems go away he says. I tried reproducing his problem on my system (on my M4-CT256M4SSD2 0001 and my HDS5C3030ALA630), and was able to get a timeout similar to his with mpslsi (one time out of many tries), and it recovered gracefully, as he says his does. So based on that, I would say mpslsi is the safest choice. Perhaps the same problem on mps will cause a crash on any system with any disk, not just ST91000640SS disks. I am using the following disks with no known problems: Hitachi HUA723030ALA640 firmware MKAOA580 (tested with mps and mpslsi, didn't test hot pull) Seagate ST33000650NS firmware 0002 (tested with mps and mpslsi, didn't test hot pull) Hitachi HDS5C3030ALA630 firmware MEAOA580 (tested mostly with mpslsi, and tested hot pull) Crucial M4-CT256M4SSD2 firmware 0009 (tested only with mpslsi; not fully tested yet, but passes the hot pull test; has a URE which it didn't have with firmware 0001) The "hot pull test": -------------- dd if=/dev/random of=/somewhere/on/the/disk bs=128k pull disk wait 1 second put disk back in wait 1 second pull disk wait 1 second put disk back in wait 1 second hit ctrl+c on the dd command wait for messages to stop on tty1 / syslog. gpart show zpool status zpool online zpool status If gpart show does not seg fault, and zpool online causes the disk to resilver, then it is all good. (40% of the time, the bad SSD passes the test if only pulled once, and so far 0% if pulled twice, and one time out of all tests, the red lights blink on all disks on the controller when the bad disk is pulled) -------------- So, I would say that with the right combination of hardware, you have a fine system. So just test your disk however you think works best. If you want to use mps, use the "smartctl -a" loop test to make sure it handles it. If during the test you get no timeouts, I would call the test indeterminate. A pass looks like what Jason Wolfe posted in the mailing list (linked above) "SMID ... finished recovery after aborting TaskMID ...". Peter On 01/20/2012 01:08 AM, John Kozubik wrote: > > We're about to invest heavily in a new ZFS infrastructure, and our > plans are to: > > > - wait for 8.3, with the updated 6gbps mps driver > > - Install and use LSI 9211-8i cards with newest "IT" firmware > > > This appears to be the de facto standard for ZFS HBAs ... > > Is there any reason to consider other cards/vendors ? > > Are these indeed considered solid (provided I use the new mps in 8.3) ? > > Thanks. > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" -- -------------------------------------------- Peter Maloney Brockmann Consult Max-Planck-Str. 2 21502 Geesthacht Germany Tel: +49 4152 889 300 Fax: +49 4152 889 333 E-mail: peter.maloney@brockmann-consult.de Internet: http://www.brockmann-consult.de --------------------------------------------