From owner-freebsd-questions@FreeBSD.ORG  Tue May 18 15:49:32 2010
Return-Path: <owner-freebsd-questions@FreeBSD.ORG>
Delivered-To: freebsd-questions@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 8292C106564A
	for <freebsd-questions@freebsd.org>;
	Tue, 18 May 2010 15:49:32 +0000 (UTC)
	(envelope-from freebsd-questions@m.gmane.org)
Received: from lo.gmane.org (lo.gmane.org [80.91.229.12])
	by mx1.freebsd.org (Postfix) with ESMTP id 0E7C18FC32
	for <freebsd-questions@freebsd.org>;
	Tue, 18 May 2010 15:49:31 +0000 (UTC)
Received: from list by lo.gmane.org with local (Exim 4.69)
	(envelope-from <freebsd-questions@m.gmane.org>) id 1OEP2v-0002Dw-HI
	for freebsd-questions@freebsd.org; Tue, 18 May 2010 17:49:29 +0200
Received: from pool-70-21-10-109.res.east.verizon.net ([70.21.10.109])
	by main.gmane.org with esmtp (Gmexim 0.1 (Debian))
	id 1AlnuQ-0007hv-00
	for <freebsd-questions@freebsd.org>; Tue, 18 May 2010 17:49:29 +0200
Received: from nightrecon by pool-70-21-10-109.res.east.verizon.net with local
	(Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00
	for <freebsd-questions@freebsd.org>; Tue, 18 May 2010 17:49:29 +0200
X-Injected-Via-Gmane: http://gmane.org/
To: freebsd-questions@freebsd.org
connect(): No such file or directory
From: Michael Powell <nightrecon@hotmail.com>
Followup-To: gmane.os.freebsd.questions
Date: Tue, 18 May 2010 11:49:02 -0400
Lines: 69
Message-ID: <hsuctq$14o$1@dough.gmane.org>
References: <f889b374b170eba47ff5f7d530a9c878.squirrel@email.polands.org>
	<4BF2AA7F.9090503@infracaninophile.co.uk>
Mime-Version: 1.0
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 7Bit
X-Complaints-To: usenet@dough.gmane.org
X-Gmane-NNTP-Posting-Host: pool-70-21-10-109.res.east.verizon.net
Subject: Re: Interpretting 3Ware error messages
X-BeenThere: freebsd-questions@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: User questions <freebsd-questions.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-questions>, 
	<mailto:freebsd-questions-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-questions>
List-Post: <mailto:freebsd-questions@freebsd.org>
List-Help: <mailto:freebsd-questions-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-questions>, 
	<mailto:freebsd-questions-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 18 May 2010 15:49:32 -0000

Matthew Seaman wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On 18/05/2010 15:43:25, Doug Poland wrote:
>> Hello,
>> 
>> I have a 7.2-R i386 system running a 3ware 9500S-4LP SATA 150
>> controller with 4 SATA drives.  I recently starting seeing the
>> following in my logs....
>> 
>> smartd[906]: Device: /dev/twa0 [3ware_disk_00], 1 Currently unreadable
>> (pending) sectors
>> smartd[906]: Device: /dev/twa0 [3ware_disk_00], 1 Offline
                                                     ^^^^^^^
>> uncorrectable sectors
   ^^^^^^^^^^^^^^^^^^^^^
I think this error usually indicates that there are sectors that are pending
remap, but will not get remapped or marked out until the next write occurs 
to them. On blank space these can easily be gotten rid of with a write from 
dd, however you don't want to be messing with this around active data.
 
>> Using the twi_cli program, I can examine the disk subsystem, but I do
>> not see any issues with an underlying drive.
>> 
>> Unit     UnitType  Status         %RCmpl  %V/I/M  Port  Stripe  Size(GB)
>> ------------------------------------------------------------------------
>> u0       RAID-10   OK             -       -       -     64K     298.002
>> u0-0     RAID-1    OK             -       -       -     -       -
>> u0-0-0   DISK      OK             -       -       p2    -       149.001
>> u0-0-1   DISK      OK             -       -       p3    -       149.001
>> u0-1     RAID-1    OK             -       -       -     -       -
>> u0-1-0   DISK      OK             -       -       p0    -       149.001
>> u0-1-1   DISK      OK             -       -       p1    -       149.001
>> 
>> 
>> I suspect a disk problem, but cannot identify the individual disk or
>> the nature of the problem.  Can anyone shed some light on this?
>> 
> Look at the SMART data for the disk(s) -- my guess is that you're seeing
> sectors failing and being re-mapped by the drive firmware.  If this is
> happening to any significant extent the disk may well be reaching the
> end of its usable life: happily you would seem to have been alerted to
> that in time to do something about it without needing to run around in a
> blind panic.

If the remap area is not yet filled these should still get remapped at next 
write. If it is full replace the drive.
 
> There's a background task you can set up on 3ware controllers that will
> attempt to access all sectors of a disk specifically to bring to light
> problems like this, which otherwise could go unnoticed for a long time
> and lead to silent data corruption.

Many controllers refer to this as 'disk scrub' or 'disk verify'. If the 
remap zone still has space available a scrub should juggle sectors around 
and clear this counter.

Periodic scrubbing can find and fix the 'silent data corruption', which is 
data sectors which have failed between the time of the last write and the 
next read. When this pattern is spread out across multiple drives you won't 
know it until you have a drive go bad, pull it and replace, then find the 
array will not rebuild. I scrub my arrays every Friday night.

-Mike