From owner-freebsd-current@FreeBSD.ORG  Fri Oct 12 02:38:18 2007
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id C19C216A469
	for <freebsd-current@freebsd.org>; Fri, 12 Oct 2007 02:38:18 +0000 (UTC)
	(envelope-from spawk@acm.poly.edu)
Received: from acm.poly.edu (acm.poly.edu [128.238.9.200])
	by mx1.freebsd.org (Postfix) with ESMTP id 5DA2B13C457
	for <freebsd-current@freebsd.org>; Fri, 12 Oct 2007 02:38:18 +0000 (UTC)
	(envelope-from spawk@acm.poly.edu)
Received: (qmail 33002 invoked from network); 12 Oct 2007 02:33:13 -0000
Received: from unknown (HELO ?192.168.0.2?) (spawk@69.123.41.145)
	by acm.poly.edu with AES256-SHA encrypted SMTP;
	12 Oct 2007 02:33:13 -0000
Message-ID: <470EDE0E.8070800@acm.poly.edu>
Date: Thu, 11 Oct 2007 22:38:06 -0400
From: Boris Kochergin <spawk@acm.poly.edu>
User-Agent: Thunderbird 2.0.0.0 (X11/20070609)
MIME-Version: 1.0
To: freebsd-current@freebsd.org
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Subject: ZFS raidz1 redundancy
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 12 Oct 2007 02:38:18 -0000

Hi. I'm running running an i386 -CURRENT built on October 2nd. I have a 
raidz1 pool consisting of seven 400-GiB PATA disks. ad4 and ad5 are part 
of the pool. This afternoon, the following happened:

Oct 11 19:05:27 exodus kernel: ad4: timeout waiting to issue command
Oct 11 19:05:27 exodus kernel: ad4: error issuing READ_DMA command
Oct 11 19:05:27 exodus root: ZFS: vdev I/O failure, zpool=home 
path=/dev/ad4 offset=70362711040 size=21504 error=5

The machine proceeded to panic after that, and when it rebooted, the 
following happened after a while:

Oct 11 19:11:40 exodus kernel: ad5: detached
Oct 11 19:11:40 exodus kernel: ad4: TIMEOUT - READ_DMA retrying (1 retry 
left) LBA=32

It crashed again half an hour after that, and when it came back up, ad4 
was no longer detected by the ATA controller. The output of "zpool 
status" is as follows:

  pool: home
 state: FAULTED
status: One or more devices could not be opened.  There are insufficient
        replicas for the pool to continue functioning.
action: Attach the missing device and online it using 'zpool online'.
   see: http://www.sun.com/msg/ZFS-8000-D3
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        home        FAULTED      6     0     0  corrupted data
          raidz1    DEGRADED     6     0     0
            ad4     UNAVAIL      0     0     0  cannot open
            ad5     ONLINE       0     0     0
            ad10    ONLINE       0     0     0
            ad11    ONLINE       0     0     0
            ad8     ONLINE       0     0     0
            ad9     ONLINE       0     0     0
            ad6     ONLINE       0     0     0

Is it possible that the data on ad5, in the midst of the failured of 
ad4, has become inconsistent with the other members of the pool, and 
that I need to bring ad4 online (I'm fairly sure that it's a 
motherboard- or power-related issue and that the drive is OK) to be able 
to access the data on the pool?

-Boris