From owner-freebsd-stable@FreeBSD.ORG Tue Jan 26 16:59:42 2010 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2972C1065676 for ; Tue, 26 Jan 2010 16:59:42 +0000 (UTC) (envelope-from cswiger@mac.com) Received: from asmtpout024.mac.com (asmtpout024.mac.com [17.148.16.99]) by mx1.freebsd.org (Postfix) with ESMTP id 123478FC08 for ; Tue, 26 Jan 2010 16:59:41 +0000 (UTC) MIME-version: 1.0 Content-type: text/plain; charset=iso-8859-1 Received: from [10.0.1.46] ([173.200.179.65]) by asmtp024.mac.com (Sun Java(tm) System Messaging Server 6.3-8.01 (built Dec 16 2008; 32bit)) with ESMTPSA id <0KWV0035J5V3JE90@asmtp024.mac.com> for freebsd-stable@freebsd.org; Tue, 26 Jan 2010 08:59:28 -0800 (PST) X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 ipscore=0 phishscore=0 bulkscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx engine=5.0.0-0908210000 definitions=main-1001260108 From: Chuck Swiger In-reply-to: <20100126172503.927e1bb5.gerrit@pmp.uni-hannover.de> Date: Tue, 26 Jan 2010 08:59:27 -0800 Content-transfer-encoding: quoted-printable Message-id: <5F20B2B6-D75C-4E27-9CC9-85C6E64D13BD@mac.com> References: <20100126143021.GA47535@icarus.home.lan> <20100126160320.6ed67b92.gerrit@pmp.uni-hannover.de> <20100126172503.927e1bb5.gerrit@pmp.uni-hannover.de> To: =?iso-8859-1?Q?Gerrit_K=FChn?= X-Mailer: Apple Mail (2.1077) Cc: freebsd-stable@freebsd.org Subject: Re: ZFS "zpool replace" problems X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 26 Jan 2010 16:59:42 -0000 Hi-- On Jan 26, 2010, at 8:25 AM, Gerrit K=FChn wrote: > CS> There's your problem-- the Silicon Image 3112/4 chips are = remarkably > CS> buggy and exhibit data corruption: >=20 > Hm, sure? I'm sure that the SII 3112 is buggy. I am not sure that it is the primary or only cause of the problems you = describe. [ ... ] > I already thought about replacing the controller to get rid of the > detach-problem. However, I cannot do this online and I really would = prefer > fixing the disk firmware problem first. > I could remove the hotspare drive ad14 and use this slot for putting = in a > replacement disk. Is it possible to get ad18 out of zfs' replacing > process? Maybe by detaching the disk from the pool? I don't know enough about ZFS to provide specific advice for recovery = attempts (aside from the notion of restoring your data from a backup = instead).=20 As a general matter of maintaining RAID systems, however, the approach = to upgrading drive firmware on members of a RAID array should be to take = down the entire container and offline the drives, update one drive, test = it (via SMART self-test and read-only checksum comparison or similar), = and then proceed to update all of the drives (preferably doing the SMART = self-test for each, if time allows) before returning them to the RAID = container and onlining them. Pulling individual drives from a RAID set while live and updating the = firmware one at a time is not an approach I would take-- running with = mixed firmware versions doesn't thrill me, and I know of multiple cases = where someone made a mistake reconnecting a drive with the wrong SCSI id = or something like that, taking out a second drive while the RAID was not = redundant, resulting in massive data corruption or even total loss of = the RAID contents. Regards, --=20 -Chuck