From owner-freebsd-fs@FreeBSD.ORG  Tue Sep 27 19:17:43 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 300B0106564A
	for <freebsd-fs@freebsd.org>; Tue, 27 Sep 2011 19:17:43 +0000 (UTC)
	(envelope-from dpd@bitgravity.com)
Received: from mail-vx0-f182.google.com (mail-vx0-f182.google.com
	[209.85.220.182])
	by mx1.freebsd.org (Postfix) with ESMTP id D17D68FC08
	for <freebsd-fs@freebsd.org>; Tue, 27 Sep 2011 19:17:42 +0000 (UTC)
Received: by vcbf13 with SMTP id f13so5595004vcb.13
	for <freebsd-fs@freebsd.org>; Tue, 27 Sep 2011 12:17:42 -0700 (PDT)
Received: by 10.68.55.100 with SMTP id r4mr38599801pbp.69.1317151061955;
	Tue, 27 Sep 2011 12:17:41 -0700 (PDT)
Received: from netops-234.sfo1.bitgravity.com (netops-234.sfo1.bitgravity.com.
	[209.131.110.234])
	by mx.google.com with ESMTPS id h5sm7555869pbf.4.2011.09.27.12.17.40
	(version=TLSv1/SSLv3 cipher=OTHER);
	Tue, 27 Sep 2011 12:17:40 -0700 (PDT)
Mime-Version: 1.0 (Apple Message framework v1084)
Content-Type: text/plain; charset=us-ascii
From: David P Discher <dpd@bitgravity.com>
In-Reply-To: <4E7F61A2.5060908@platinum.linux.pl>
Date: Tue, 27 Sep 2011 12:17:38 -0700
Content-Transfer-Encoding: quoted-printable
Message-Id: <299DCA15-FD90-4238-9DD9-C1B8F94CC726@bitgravity.com>
References: <4E7F49A7.1020909@platinum.linux.pl>
	<20110925165946.GA42447@icarus.home.lan>
	<4E7F61A2.5060908@platinum.linux.pl>
To: Adam Nowacki <nowakpl@platinum.linux.pl>
X-Mailer: Apple Mail (2.1084)
Cc: freebsd-fs@freebsd.org
Subject: Re: ZFS and 3ware controller resets
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 27 Sep 2011 19:17:43 -0000

We use a lot of this exact 3ware controller (and firmware) with zfs and =
8.1-RELEASE.  Though I have seen controller resets, I have not seen this =
exact error with zfs and 3ware. We do 2x RAID-1, and a 14-disk RAID 5,50 =
or 10, and the controller seems to survive disk failures in RAID config =
with ZFS. However, sometimes we will hit the "calru" ... time-went =
backwards while the controller resets and the kernel tries to figure =
things out.  Of course this is likely service impacting.

When multiple controller resets are detected, we have typically declared =
the card as bad, and RMA or replaced the card.  So far, our VAR has not =
rejected replacing the card while in the standard 3-years warranty.=20

I would recommend replacing the controller.=20

HOWEVER - I have seen this ZFS behavior with a different controller/HBA =
setup.  We have older Xyratex 5400-series 48 bay what-evers connected to =
the freebsd host via fiber channel and an LSI 7404EP HBA (mpt).  Legacy =
setups exported LUN/arrays from the Xyratex at RAID-5, and then =
gstripe'ed to form single volumes.  Setups upgraded to the ZFS setup, of =
course do away with the gstripe.=20

When gstripe (with ufs2) when a Xyratex controllers crashes and resets, =
geom gets confused, produces read/write errors, and eventually panics.   =
In the ZFS world, these failures are almost silent, zpool never reports =
an error (we're striping the luns in the zpool, no raidz or raidz2 ). =
Eventually all the processes access disk hang is D-state, and the =
machine grinds to halt.=20

The recommendation from the community was to use gmountver(8) from -head =
and use those vdevs in the zpool.  We got it back ported to 8.1.  =
However, there was some issues with geom-tasting order, and what vdevs =
will get picked up by the zpool.  I have since abandoned this testing.  =
We were never able to get multi-pathing working under freebsd.


---
David P. Discher
dpd@bitgravity.com * AIM: bgDavidDPD
BITGRAVITY * http://www.bitgravity.com

On Sep 25, 2011, at 10:15 AM, Adam Nowacki wrote:

> On 2011-09-25 18:59, Jeremy Chadwick wrote:
>> On Sun, Sep 25, 2011 at 05:32:55PM +0200, Adam Nowacki wrote:
>>> I have a 20 disk storage system, every now and then a disk dies and
>>> causes 3ware controller to reset because of disk timeouts. This cuts
>>> out ZFS from all disks, even healthy ones and the system requires a
>>> hard reset.
>>> Two issues here:
>>> 1) Why the controller has to reset? Thats a completely insane way of
>>> dealing with drive timeout.
>>> 2) ZFS not reopening the disk after controller reset.
>>>=20
>>> FreeBSD version: 8.1-RELEASE-p1
>>>=20
>>> /c0 Driver Version =3D 3.80.06.003
>>> /c0 Model =3D 9650SE-16ML
>>> /c0 Available Memory =3D 224MB
>>> /c0 Firmware Version =3D FE9X 4.10.00.007
>>> /c0 Bios Version =3D BE9X 4.08.00.002
>>> /c0 Boot Loader Version =3D BL9X 3.08.00.001

...

>=20
> I mean that not only the timeouting disk is affected but all disks =
that are on the controller. Every single one stops working for ZFS, you =
can see that in the zpool status output, each disk reports read and =
write errors. zpool clear won't fix it, ZFS simply loses access to all =
disks on the controller while for example dd can read from each disk =
just fine. Also on the same controller I have a disk with UFS =
filesystem, mounted when the controller resets, this survives the reset =
as if it didn't even happen. For ZFS the only fix is to hard reset the =
whole system.