From owner-freebsd-stable@FreeBSD.ORG  Sat Jan 26 18:32:06 2008
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 8C85316A41A
	for <freebsd-stable@freebsd.org>; Sat, 26 Jan 2008 18:32:06 +0000 (UTC)
	(envelope-from joe@skyrush.com)
Received: from shadow.wildlava.net (shadow.wildlava.net [67.40.138.81])
	by mx1.freebsd.org (Postfix) with ESMTP id 4E33313C467
	for <freebsd-stable@freebsd.org>; Sat, 26 Jan 2008 18:32:06 +0000 (UTC)
	(envelope-from joe@skyrush.com)
Received: from [10.1.2.160] (pawnee.wildlava.net [67.40.138.85])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by shadow.wildlava.net (Postfix) with ESMTP id 2E2978F441
	for <freebsd-stable@freebsd.org>; Sat, 26 Jan 2008 11:32:05 -0700 (MST)
Message-ID: <479B7C60.7000800@skyrush.com>
Date: Sat, 26 Jan 2008 11:30:56 -0700
From: Joe Peterson <joe@skyrush.com>
User-Agent: Thunderbird 2.0.0.9 (Windows/20071031)
MIME-Version: 1.0
To: freebsd-stable@freebsd.org
References: <479A0731.6020405@skyrush.com>
	<20080125162940.GA38494@eos.sc1.parodius.com>
	<479A3764.6050800@skyrush.com>
	<3803988D-8D18-4E89-92EA-19BF62FD2395@mac.com>
	<479A4CB0.5080206@skyrush.com>
	<20080126003845.GA52183@eos.sc1.parodius.com>
	<479A86E5.5060806@skyrush.com>
	<20080126012124.GA53400@eos.sc1.parodius.com>
In-Reply-To: <20080126012124.GA53400@eos.sc1.parodius.com>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Subject: Re: "ad0: TIMEOUT - WRITE_DMA" type errors with 7.0-RC1
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 26 Jan 2008 18:32:06 -0000

I performed a ZFS scrub, which finished yesterday, and no new
/var/log/messages errors were reported during that time.  However, the scrub
found something interesting:


crater# zpool status -v
  pool: tank
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: scrub completed with 1 errors on Fri Jan 25 12:52:32 2008
config:

        NAME        STATE     READ WRITE CKSUM
        tank        ONLINE       1     3     2
          ad0s1d    ONLINE       1     3     2

errors: Permanent errors have been detected in the following files:


/home/joe/music/jukebox/christmas/Esquivel/Merry_XMas_from_the_SpaceAge_
Bachelor_Pad/07-Snowfall.mp3


Note that I have not touched this file since copying it to this drive.

So, it seems one file failed a checksum check during the scrub.  I now
(expectedly) get errors trying to read this file - probably ZFS indicating the
condition.  When I just logged in tonight, I got two more /var/log/messages
disk messages about WRITE_DMA48 TIMEOUT/FAILURE - might be a coincidence (just
as I was typing my password).

Also, smartctl still shows PASSED, however, this is interesting:

195 Hardware_ECC_Recovered  0x001a   061   046   000    Old_age   Always
      -       9070

The number is much *smaller* now!  It was "6" a few minutes before this...
wrap around?  Hmm, I'm really not sure, at this point, what is going on.

So I have started a "SeaTools" (disk scanner from Seagate) "long test" of the
drive.  The short test passed already.  The results should be interesting.  If
it finds nothing wrong, I am going to start to wonder if I am experiencing ZFS
bugs that just happen to look like drive problems.  I already did a long read,
under linux, of disk contents, and got no messages about anything wrong.

If I can turn on any debugging info to help determine if this is
software-related, let me know the magic keywords to use.  :)

							-Joe