From owner-freebsd-stable@FreeBSD.ORG  Tue Jan 26 15:03:23 2010
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 667EB1065693
	for <freebsd-stable@freebsd.org>; Tue, 26 Jan 2010 15:03:23 +0000 (UTC)
	(envelope-from gerrit@pmp.uni-hannover.de)
Received: from mrelay1.uni-hannover.de (mrelay1.uni-hannover.de [130.75.2.106])
	by mx1.freebsd.org (Postfix) with ESMTP id EA1F48FC1A
	for <freebsd-stable@freebsd.org>; Tue, 26 Jan 2010 15:03:22 +0000 (UTC)
Received: from www.pmp.uni-hannover.de (www.pmp.uni-hannover.de [130.75.117.2])
	by mrelay1.uni-hannover.de (8.14.2/8.14.2) with ESMTP id o0QF3KA2018799;
	Tue, 26 Jan 2010 16:03:21 +0100
Received: from pmp.uni-hannover.de (arc.pmp.uni-hannover.de [130.75.117.1])
	by www.pmp.uni-hannover.de (Postfix) with SMTP
	id 3FBFA4F; Tue, 26 Jan 2010 16:03:20 +0100 (CET)
Date: Tue, 26 Jan 2010 16:03:20 +0100
From: Gerrit =?ISO-8859-1?Q?K=FChn?= <gerrit@pmp.uni-hannover.de>
To: Jeremy Chadwick <freebsd@jdc.parodius.com>
Message-Id: <20100126160320.6ed67b92.gerrit@pmp.uni-hannover.de>
In-Reply-To: <20100126143021.GA47535@icarus.home.lan>
References: <20100126143021.GA47535@icarus.home.lan>
Organization: Albert-Einstein-Institut (MPI =?ISO-8859-1?Q?f=FCr?=
	Gravitationsphysik & IGP =?ISO-8859-1?Q?Universit=E4t?= Hannover)
X-Mailer: Sylpheed 2.7.1 (GTK+ 2.18.4; i386-portbld-freebsd7.0)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
X-PMX-Version: 5.5.9.388399, Antispam-Engine: 2.7.2.376379,
	Antispam-Data: 2010.1.26.145416
Cc: freebsd-stable@freebsd.org
Subject: Re: ZFS "zpool replace" problems
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 26 Jan 2010 15:03:23 -0000

On Tue, 26 Jan 2010 06:30:21 -0800 Jeremy Chadwick
<freebsd@jdc.parodius.com> wrote about Re: ZFS "zpool replace" problems:

JC> I'm removing the In-Reply-To mail headers for this thread, as you've
JC> now hijacked it for a different purpose.  Please don't do this; start
JC> a new thread altogether.  :-)

Thanks. You're perfectly right, I should have done that.

JC> I'm not sure how the above is supposed to work (I haven't personally
JC> tried it), but:
JC> 
JC> 1) Why didn't you offline the ad10 disk first?
JC>    zpool offline tank ad10

Well, probably because I thought that zfs would simply handle the
situation. I just wanted to replace drive A with drive B, so this was
quite straight-forward for me.

JC> 2) How did you attach ad18?  Did you tell the system about it using
JC>    atacontrol?  If so, what commands did you use?

Yes. The drives did not appear automatically (verified with atacontrol
list). Then I first tried reinit ata9, but that did not work out, so I did
a detach/attach for ata9, then the drive was there (with list and also
the device node appeared).

JC> 3) Can you please provide uname -a output, as well as relevant dmesg
JC>    output to show what kind of SATA controller you have, what's
JC>    attached to what, etc.?

Of course (dmesg is not there anymore, I use pciconf -vl and
atacontrol instead):

ATA channel 0:
    Master:      no device present
    Slave:  acd0 <Optiarc DVD RW AD-7540A/1.01> ATA/ATAPI revision 0
ATA channel 1:
    Master:      no device present
    Slave:       no device present
ATA channel 2:
    Master:  ad4 <ST380815AS/3.AAC> SATA revision 2.x
    Slave:       no device present
ATA channel 3:
    Master:  ad6 <ST380815AS/3.AAC> SATA revision 2.x
    Slave:       no device present
ATA channel 4:
    Master:  ad8 <WDC WD1000FYPS-01ZKB0/02.01B01> SATA revision 2.x
    Slave:       no device present
ATA channel 5:
    Master: ad10 <WDC WD1000FYPS-01ZKB0/02.01B01> SATA revision 2.x
    Slave:       no device present
ATA channel 6:
    Master: ad12 <WDC WD1000FYPS-01ZKB0/02.01B01> SATA revision 2.x
    Slave:       no device present
ATA channel 7:
    Master: ad14 <WDC WD1000FYPS-01ZKB0/02.01B01> SATA revision 2.x
    Slave:       no device present
ATA channel 8:
    Master:      no device present
    Slave:       no device present
ATA channel 9:
    Master:      no device present
    Slave:       no device present


FreeBSD mclane.rt.aei.uni-hannover.de 7.2-STABLE FreeBSD 7.2-STABLE #0:
Mon Sep  7 11:01:56 CEST 2009
root@mclane.rt.aei.uni-hannover.de:/usr/obj/usr/src/sys/MCLANE.72  amd64

The first six drives (up to ad14) are connected onboard (Supermicro dual
opteron board with mcp55):

atapci1@pci0:0:5:0:     class=0x010485 card=0x161115d9 chip=0x037f10de
rev=0xa3 hdr=0x00 vendor     = 'Nvidia Corp'
    device     = 'MCP55 SATA/RAID Controller (MCP55S)'
    class      = mass storage
    subclass   = RAID
atapci2@pci0:0:5:1:     class=0x010485 card=0x161115d9 chip=0x037f10de
rev=0xa3 hdr=0x00 vendor     = 'Nvidia Corp'
    device     = 'MCP55 SATA/RAID Controller (MCP55S)'
    class      = mass storage
    subclass   = RAID
atapci3@pci0:0:5:2:     class=0x010485 card=0x161115d9 chip=0x037f10de
rev=0xa3 hdr=0x00 vendor     = 'Nvidia Corp'
    device     = 'MCP55 SATA/RAID Controller (MCP55S)'
    class      = mass storage
    subclass   = RAID

The other two (ad16 and ad18, the chassis has 8 slots and the last two
were only intended to be used in situtations like the one I have now) are
connected to an extra pci card:

atapci4@pci0:3:6:0:     class=0x010401 card=0x02409005 chip=0x02401095
rev=0x02 hdr=0x00 vendor     = 'Silicon Image Inc (Was: CMD Technology
Inc)' device     = 'SATA/Raid controller(2XSATA150) (SIL3112)'
    class      = mass storage
    subclass   = RAID

Meanwhile I took out the ad18 drive again and tried to use a different
drive. But that was listed as "UNAVAIL" with corrupted data by zfs.
Probably it already branded the disk for resilvering and is looking for
exactly this one now. I also put in the disk which caused the problem
above again. The resilvering process started again, but very soon the
drive got detached again resulting in the same situation I described above.

Any help is greatly appreciated.


cu
  Gerrit