Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 27 Sep 2011 12:17:38 -0700
From:      David P Discher <dpd@bitgravity.com>
To:        Adam Nowacki <nowakpl@platinum.linux.pl>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: ZFS and 3ware controller resets
Message-ID:  <299DCA15-FD90-4238-9DD9-C1B8F94CC726@bitgravity.com>
In-Reply-To: <4E7F61A2.5060908@platinum.linux.pl>
References:  <4E7F49A7.1020909@platinum.linux.pl> <20110925165946.GA42447@icarus.home.lan> <4E7F61A2.5060908@platinum.linux.pl>

next in thread | previous in thread | raw e-mail | index | archive | help
We use a lot of this exact 3ware controller (and firmware) with zfs and =
8.1-RELEASE.  Though I have seen controller resets, I have not seen this =
exact error with zfs and 3ware. We do 2x RAID-1, and a 14-disk RAID 5,50 =
or 10, and the controller seems to survive disk failures in RAID config =
with ZFS. However, sometimes we will hit the "calru" ... time-went =
backwards while the controller resets and the kernel tries to figure =
things out.  Of course this is likely service impacting.

When multiple controller resets are detected, we have typically declared =
the card as bad, and RMA or replaced the card.  So far, our VAR has not =
rejected replacing the card while in the standard 3-years warranty.=20

I would recommend replacing the controller.=20

HOWEVER - I have seen this ZFS behavior with a different controller/HBA =
setup.  We have older Xyratex 5400-series 48 bay what-evers connected to =
the freebsd host via fiber channel and an LSI 7404EP HBA (mpt).  Legacy =
setups exported LUN/arrays from the Xyratex at RAID-5, and then =
gstripe'ed to form single volumes.  Setups upgraded to the ZFS setup, of =
course do away with the gstripe.=20

When gstripe (with ufs2) when a Xyratex controllers crashes and resets, =
geom gets confused, produces read/write errors, and eventually panics.   =
In the ZFS world, these failures are almost silent, zpool never reports =
an error (we're striping the luns in the zpool, no raidz or raidz2 ). =
Eventually all the processes access disk hang is D-state, and the =
machine grinds to halt.=20

The recommendation from the community was to use gmountver(8) from -head =
and use those vdevs in the zpool.  We got it back ported to 8.1.  =
However, there was some issues with geom-tasting order, and what vdevs =
will get picked up by the zpool.  I have since abandoned this testing.  =
We were never able to get multi-pathing working under freebsd.


---
David P. Discher
dpd@bitgravity.com * AIM: bgDavidDPD
BITGRAVITY * http://www.bitgravity.com

On Sep 25, 2011, at 10:15 AM, Adam Nowacki wrote:

> On 2011-09-25 18:59, Jeremy Chadwick wrote:
>> On Sun, Sep 25, 2011 at 05:32:55PM +0200, Adam Nowacki wrote:
>>> I have a 20 disk storage system, every now and then a disk dies and
>>> causes 3ware controller to reset because of disk timeouts. This cuts
>>> out ZFS from all disks, even healthy ones and the system requires a
>>> hard reset.
>>> Two issues here:
>>> 1) Why the controller has to reset? Thats a completely insane way of
>>> dealing with drive timeout.
>>> 2) ZFS not reopening the disk after controller reset.
>>>=20
>>> FreeBSD version: 8.1-RELEASE-p1
>>>=20
>>> /c0 Driver Version =3D 3.80.06.003
>>> /c0 Model =3D 9650SE-16ML
>>> /c0 Available Memory =3D 224MB
>>> /c0 Firmware Version =3D FE9X 4.10.00.007
>>> /c0 Bios Version =3D BE9X 4.08.00.002
>>> /c0 Boot Loader Version =3D BL9X 3.08.00.001

...

>=20
> I mean that not only the timeouting disk is affected but all disks =
that are on the controller. Every single one stops working for ZFS, you =
can see that in the zpool status output, each disk reports read and =
write errors. zpool clear won't fix it, ZFS simply loses access to all =
disks on the controller while for example dd can read from each disk =
just fine. Also on the same controller I have a disk with UFS =
filesystem, mounted when the controller resets, this survives the reset =
as if it didn't even happen. For ZFS the only fix is to hard reset the =
whole system.




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?299DCA15-FD90-4238-9DD9-C1B8F94CC726>