From owner-freebsd-fs@FreeBSD.ORG  Mon Jan 10 17:23:48 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E00DC106567A;
	Mon, 10 Jan 2011 17:23:48 +0000 (UTC) (envelope-from bra@fsn.hu)
Received: from people.fsn.hu (people.fsn.hu [195.228.252.137])
	by mx1.freebsd.org (Postfix) with ESMTP id CD4E38FC18;
	Mon, 10 Jan 2011 17:23:47 +0000 (UTC)
Received: by people.fsn.hu (Postfix, from userid 1001)
	id 4624270614E; Mon, 10 Jan 2011 18:23:45 +0100 (CET)
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000004, version=1.2.2
X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MF-ACE0E1EA [pR:
	15.8155]
X-CRM114-CacheID: sfid-20110110_18234_6D48A357 
X-CRM114-Status: Good  ( pR: 15.8155 )
X-Spambayes-Classification: ham; 0.00
Message-ID: <4D2B40A0.20008@fsn.hu>
Date: Mon, 10 Jan 2011 18:23:44 +0100
From: Attila Nagy <bra@fsn.hu>
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US;
	rv:1.8.1.23) Gecko/20090817 Thunderbird/2.0.0.23 Mnenhy/0.7.6.0
MIME-Version: 1.0
To: Pawel Jakub Dawidek <pjd@FreeBSD.org>
References: <4D0A09AF.3040005@FreeBSD.org> <4D297943.1040507@fsn.hu>
	<4D29A0C7.8050002@fsn.hu> <20110110090205.GB1744@garage.freebsd.pl>
In-Reply-To: <20110110090205.GB1744@garage.freebsd.pl>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-fs@FreeBSD.org, freebsd-stable@FreeBSD.org,
	Martin Matuska <mm@FreeBSD.org>
Subject: Re: New ZFSv28 patchset for 8-STABLE
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 10 Jan 2011 17:23:49 -0000

  On 01/10/2011 10:02 AM, Pawel Jakub Dawidek wrote:
> On Sun, Jan 09, 2011 at 12:49:27PM +0100, Attila Nagy wrote:
>> No, it's not related. One of the disks in the RAIDZ2 pool went bad:
>> (da4:arcmsr0:0:4:0): READ(6). CDB: 8 0 2 10 10 0
>> (da4:arcmsr0:0:4:0): CAM status: SCSI Status Error
>> (da4:arcmsr0:0:4:0): SCSI status: Check Condition
>> (da4:arcmsr0:0:4:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read
>> error)
>> and it seems it froze the whole zpool. Removing the disk by hand solved
>> the problem.
>> I've seen this previously on other machines with ciss.
>> I wonder why ZFS didn't throw it out of the pool.
> Such hangs happen when I/O never returns. ZFS doesn't timeout I/O
> requests on its own, this is driver's responsibility. It is still
> strange that the driver didn't pass I/O error up to ZFS or it might as
> well be ZFS bug, but I don't think so.
>
Indeed, it may to be a controller/driver bug. The newly released (last 
december) firmware says something about a similar problem. I've 
upgraded, we'll see whether it will help next time a drive goes awry.
I've only seen these errors in dmesg, not in zpool status, there 
everything was clear (all zeroes).

BTW, I've swapped those bad drives (da4, which reported the above 
errors, and da16, which didn't reported anything to the OS, it was just 
plain bad according to the controller firmware -and after its deletion, 
I could offline da4, so it seems it's the real cause, see my previous 
e-mail), and zpool replaced first da4, but after some seconds of 
thinking all IO on all disks deceased.
After waiting some minutes, it was still the same, so I've rebooted. 
Then I noticed that a scrub is going on, so I stopped it.
Then the zpool replace da4 went fine, it started to resilver the disk. 
But another zpool replace (for da16) causes the same error: some seconds 
of IO, then nothing and it stuck in that.

Has anybody tried replacing two drives simultaneously with the zfs v28 
patch? (this is a stripe of two raidz2s and da4 and da16 are in 
different raidz2)