From owner-freebsd-stable@FreeBSD.ORG Thu Feb 16 11:59:07 2012 Return-Path: Delivered-To: stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 76D041065672 for ; Thu, 16 Feb 2012 11:59:07 +0000 (UTC) (envelope-from oscarmpp@googlemail.com) Received: from mail-tul01m020-f182.google.com (mail-tul01m020-f182.google.com [209.85.214.182]) by mx1.freebsd.org (Postfix) with ESMTP id 3A5298FC18 for ; Thu, 16 Feb 2012 11:59:07 +0000 (UTC) Received: by obcwo16 with SMTP id wo16so3784817obc.13 for ; Thu, 16 Feb 2012 03:59:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=hM2oNe+LPprLeIPZWKnWpwh1DhFUPOWMZkGbkEVgGvY=; b=A3fqM2g9Lobze1+x/iRhKlyWEkulAxHYqIGi8PFKYlFp8xKJ056nqEWC0EHaJ1fods 47z/26PEbETSOxsPhbTCQc8fJsQOmnmHZWIj7hSBS3n+e9ABs6Pvim3pBv01hzAIeG+4 NtCU5xu/iyLjUB6WZmQ0TWSp2nWUELF+Ywo5I= MIME-Version: 1.0 Received: by 10.60.26.133 with SMTP id l5mr802242oeg.22.1329391783542; Thu, 16 Feb 2012 03:29:43 -0800 (PST) Received: by 10.60.78.36 with HTTP; Thu, 16 Feb 2012 03:29:43 -0800 (PST) In-Reply-To: <20120216044842.282B16B9@server.theusgroup.com> References: <20120214091909.GP2010@equilibrium.bsdes.net> <20120214100513.GA94501@icarus.home.lan> <20120214135435.GQ2010@equilibrium.bsdes.net> <20120214141601.GA98986@icarus.home.lan> <20120215181757.GX2010@equilibrium.bsdes.net> <20120215191931.GA30747@icarus.home.lan> <20120216044842.282B16B9@server.theusgroup.com> Date: Thu, 16 Feb 2012 12:29:43 +0100 Message-ID: From: Oscar Prieto To: John Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: Victor Balada Diaz , stable@freebsd.org, Jeremy Chadwick Subject: Re: problems with AHCI on FreeBSD 8.2 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Feb 2012 11:59:07 -0000 Yesterday I did a backup of the sensible stuff of the pool and decided to just break stuff on purpose ;) I writed with dd over the sector marked as faulty by smartctl and runned a smartctl short test. I repeated the process several times until smartctl gave no errors at all on ada3. After that i left the pool doing a scrub and it seemed to repair the integrity of the pool: ------ [root@zaibach ~]# zpool status pool: tank state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scan: scrub repaired 398K in 10h39m with 0 errors on Thu Feb 16 09:15:59 2= 012 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 ada2p1 ONLINE 0 0 0 ada1p1 ONLINE 0 0 0 ada3p1 ONLINE 0 0 11 ada0p1 ONLINE 0 0 0 ----- But funnily i got an ahci timeout on other drive, /dev/ada2. ----- Feb 16 04:08:23 zaibach kernel: ahcich2: Timeout on slot 15 port 0 Feb 16 04:08:23 zaibach kernel: ahcich2: is 00000000 cs 00040000 ss 00078000 rs 00078000 tfd c0 serr 00000000 cmd 0004d217 ------- At least a short smartctl test on /dev/ada2 doesn't seem to complain this t= ime. On Thu, Feb 16, 2012 at 5:48 AM, John wrote: > Jeremy Chadwick wrote: >> >> CRC errors ... >> >>I have no real advice for tracking this kind of problem down. =A0The most >>common response is "replace cables", which isn't necessarily the root >>cause. =A0I have no advice or tips on how to track down interference >>issues, or how to truly examine a disk PCB or controller PCB for the >>latter item. =A0"Flaky traces" on a PCB could cause this sort of thing. >>Folks in the EE field would know more about these issues; I am not an EE >>person. >> >>Since the attribute increased on both drives simultaneously (I have to >>assume simultaneously?), it's more likely that the problem is not with >>SATA cables or the drives but the controller on the motherboard. =A0I'd >>recommend replacing the motherboard. =A0I make no guarantees this will fi= x >>anything however, but it is the "common point" for both of your drives. > > This EE agrees with your advise. I would add if replacing the motherboard= fails > to fix the problem, then replace the power supply. Even with extremely hi= gh > end test equipment, you likely would never be able to see the failure occ= ur > for at least two reasons; the most likely failure mode is inside a single= IC, > and adding probes would alter the environment enough to change the failur= e > mode. > > John Theus > TheUs Group > TheUsGroup.com > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"