Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 26 Jan 2010 08:59:27 -0800
From:      Chuck Swiger <cswiger@mac.com>
To:        =?iso-8859-1?Q?Gerrit_K=FChn?= <gerrit@pmp.uni-hannover.de>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: ZFS "zpool replace" problems
Message-ID:  <5F20B2B6-D75C-4E27-9CC9-85C6E64D13BD@mac.com>
In-Reply-To: <20100126172503.927e1bb5.gerrit@pmp.uni-hannover.de>
References:  <20100126143021.GA47535@icarus.home.lan> <20100126160320.6ed67b92.gerrit@pmp.uni-hannover.de> <FA0BAC0D-35A7-4296-B52C-9D4D8A6CC609@mac.com> <20100126172503.927e1bb5.gerrit@pmp.uni-hannover.de>

next in thread | previous in thread | raw e-mail | index | archive | help
Hi--

On Jan 26, 2010, at 8:25 AM, Gerrit K=FChn wrote:
> CS> There's your problem-- the Silicon Image 3112/4 chips are =
remarkably
> CS> buggy and exhibit data corruption:
>=20
> Hm, sure?

I'm sure that the SII 3112 is buggy.
I am not sure that it is the primary or only cause of the problems you =
describe.

[ ... ]
> I already thought about replacing the controller to get rid of the
> detach-problem. However, I cannot do this online and I really would =
prefer
> fixing the disk firmware problem first.
> I could remove the hotspare drive ad14 and use this slot for putting =
in a
> replacement disk. Is it possible to get ad18 out of zfs' replacing
> process? Maybe by detaching the disk from the pool?

I don't know enough about ZFS to provide specific advice for recovery =
attempts (aside from the notion of restoring your data from a backup =
instead).=20

As a general matter of maintaining RAID systems, however, the approach =
to upgrading drive firmware on members of a RAID array should be to take =
down the entire container and offline the drives, update one drive, test =
it (via SMART self-test and read-only checksum comparison or similar), =
and then proceed to update all of the drives (preferably doing the SMART =
self-test for each, if time allows) before returning them to the RAID =
container and onlining them.

Pulling individual drives from a RAID set while live and updating the =
firmware one at a time is not an approach I would take-- running with =
mixed firmware versions doesn't thrill me, and I know of multiple cases =
where someone made a mistake reconnecting a drive with the wrong SCSI id =
or something like that, taking out a second drive while the RAID was not =
redundant, resulting in massive data corruption or even total loss of =
the RAID contents.

Regards,
--=20
-Chuck




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5F20B2B6-D75C-4E27-9CC9-85C6E64D13BD>