From owner-freebsd-stable@FreeBSD.ORG  Thu Feb 16 11:59:07 2012
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 76D041065672
	for <stable@freebsd.org>; Thu, 16 Feb 2012 11:59:07 +0000 (UTC)
	(envelope-from oscarmpp@googlemail.com)
Received: from mail-tul01m020-f182.google.com (mail-tul01m020-f182.google.com
	[209.85.214.182])
	by mx1.freebsd.org (Postfix) with ESMTP id 3A5298FC18
	for <stable@freebsd.org>; Thu, 16 Feb 2012 11:59:07 +0000 (UTC)
Received: by obcwo16 with SMTP id wo16so3784817obc.13
	for <stable@freebsd.org>; Thu, 16 Feb 2012 03:59:06 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=googlemail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type:content-transfer-encoding;
	bh=hM2oNe+LPprLeIPZWKnWpwh1DhFUPOWMZkGbkEVgGvY=;
	b=A3fqM2g9Lobze1+x/iRhKlyWEkulAxHYqIGi8PFKYlFp8xKJ056nqEWC0EHaJ1fods
	47z/26PEbETSOxsPhbTCQc8fJsQOmnmHZWIj7hSBS3n+e9ABs6Pvim3pBv01hzAIeG+4
	NtCU5xu/iyLjUB6WZmQ0TWSp2nWUELF+Ywo5I=
MIME-Version: 1.0
Received: by 10.60.26.133 with SMTP id l5mr802242oeg.22.1329391783542; Thu, 16
	Feb 2012 03:29:43 -0800 (PST)
Received: by 10.60.78.36 with HTTP; Thu, 16 Feb 2012 03:29:43 -0800 (PST)
In-Reply-To: <20120216044842.282B16B9@server.theusgroup.com>
References: <20120214091909.GP2010@equilibrium.bsdes.net>
	<20120214100513.GA94501@icarus.home.lan>
	<20120214135435.GQ2010@equilibrium.bsdes.net>
	<20120214141601.GA98986@icarus.home.lan>
	<20120215181757.GX2010@equilibrium.bsdes.net>
	<20120215191931.GA30747@icarus.home.lan>
	<20120216044842.282B16B9@server.theusgroup.com>
Date: Thu, 16 Feb 2012 12:29:43 +0100
Message-ID: <CAK9wqRoZhcY7wjjiMD63Kjigfi2Y2NedB6w7TngQV6+318njHg@mail.gmail.com>
From: Oscar Prieto <oscarmpp@googlemail.com>
To: John <john@theusgroup.com>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Cc: Victor Balada Diaz <victor@bsdes.net>, stable@freebsd.org,
	Jeremy Chadwick <freebsd@jdc.parodius.com>
Subject: Re: problems with AHCI on FreeBSD 8.2
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 16 Feb 2012 11:59:07 -0000

Yesterday I did a backup of the sensible stuff of the pool and decided
to just break stuff on purpose ;)

I writed with dd over the sector marked as faulty by smartctl and
runned a smartctl short test. I repeated the process several times
until smartctl gave no errors at all on ada3.

After that i left the pool doing a scrub and it seemed to  repair the
integrity of the pool:
------
[root@zaibach ~]# zpool status
  pool: tank
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
	attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
	using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scan: scrub repaired 398K in 10h39m with 0 errors on Thu Feb 16 09:15:59 2=
012
config:

	NAME        STATE     READ WRITE CKSUM
	tank        ONLINE       0     0     0
	  raidz1-0  ONLINE       0     0     0
	    ada2p1  ONLINE       0     0     0
	    ada1p1  ONLINE       0     0     0
	    ada3p1  ONLINE       0     0    11
	    ada0p1  ONLINE       0     0     0
-----

But funnily i got an ahci timeout on other drive, /dev/ada2.
-----
Feb 16 04:08:23 zaibach kernel: ahcich2: Timeout on slot 15 port 0
Feb 16 04:08:23 zaibach kernel: ahcich2: is 00000000 cs 00040000 ss
00078000 rs 00078000 tfd c0 serr 00000000 cmd 0004d217
-------

At least a short smartctl test on /dev/ada2 doesn't seem to complain this t=
ime.

On Thu, Feb 16, 2012 at 5:48 AM, John <john@theusgroup.com> wrote:
> Jeremy Chadwick wrote:
>>
>> CRC errors ...
>>
>>I have no real advice for tracking this kind of problem down. =A0The most
>>common response is "replace cables", which isn't necessarily the root
>>cause. =A0I have no advice or tips on how to track down interference
>>issues, or how to truly examine a disk PCB or controller PCB for the
>>latter item. =A0"Flaky traces" on a PCB could cause this sort of thing.
>>Folks in the EE field would know more about these issues; I am not an EE
>>person.
>>
>>Since the attribute increased on both drives simultaneously (I have to
>>assume simultaneously?), it's more likely that the problem is not with
>>SATA cables or the drives but the controller on the motherboard. =A0I'd
>>recommend replacing the motherboard. =A0I make no guarantees this will fi=
x
>>anything however, but it is the "common point" for both of your drives.
>
> This EE agrees with your advise. I would add if replacing the motherboard=
 fails
> to fix the problem, then replace the power supply. Even with extremely hi=
gh
> end test equipment, you likely would never be able to see the failure occ=
ur
> for at least two reasons; the most likely failure mode is inside a single=
 IC,
> and adding probes would alter the environment enough to change the failur=
e
> mode.
>
> John Theus
> TheUs Group
> TheUsGroup.com
> _______________________________________________
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"