From owner-freebsd-stable@FreeBSD.ORG Wed Feb 15 19:19:34 2012 Return-Path: Delivered-To: stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6D177106564A for ; Wed, 15 Feb 2012 19:19:34 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta01.westchester.pa.mail.comcast.net (qmta01.westchester.pa.mail.comcast.net [76.96.62.16]) by mx1.freebsd.org (Postfix) with ESMTP id 18C038FC0C for ; Wed, 15 Feb 2012 19:19:33 +0000 (UTC) Received: from omta08.westchester.pa.mail.comcast.net ([76.96.62.12]) by qmta01.westchester.pa.mail.comcast.net with comcast id aJy71i0010Fqzac51KKaQk; Wed, 15 Feb 2012 19:19:34 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta08.westchester.pa.mail.comcast.net with comcast id aKKZ1i00R1t3BNj3UKKZju; Wed, 15 Feb 2012 19:19:34 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id B570E102C1E; Wed, 15 Feb 2012 11:19:31 -0800 (PST) Date: Wed, 15 Feb 2012 11:19:31 -0800 From: Jeremy Chadwick To: Victor Balada Diaz Message-ID: <20120215191931.GA30747@icarus.home.lan> References: <20120214091909.GP2010@equilibrium.bsdes.net> <20120214100513.GA94501@icarus.home.lan> <20120214135435.GQ2010@equilibrium.bsdes.net> <20120214141601.GA98986@icarus.home.lan> <20120215181757.GX2010@equilibrium.bsdes.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20120215181757.GX2010@equilibrium.bsdes.net> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: stable@FreeBSD.org Subject: Re: problems with AHCI on FreeBSD 8.2 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 Feb 2012 19:19:34 -0000 On Wed, Feb 15, 2012 at 07:17:57PM +0100, Victor Balada Diaz wrote: > On Tue, Feb 14, 2012 at 06:16:01AM -0800, Jeremy Chadwick wrote: > > Thanks. Both your drives look overall fine, sort-of. I'll outline my > > concern points, and ask for some more info: > > > > * ada0 has 28 CRC errors, while ada1 has 2. These drives have been in > > use for 4688 hours and 4583 hours (respectively), which is roughly 6 > > months for each drive. CRC errors usually result in transparent > > retransmits, but this can sometimes cause I/O delays (especially if the > > CRC errors are repeated). > > > > If the timeout messages recur in the future, please run the commands I > > gave you above once more and provide the output. I can then compare the > > old to the new and see if there is anything of interest. > > I've made it fail again. You can see smartctl -a output. CRC errors are increasing. > But i'm not sure what does it really mean. Is HD broken? both? at the same time? CRC errors indicate one of the following, in no particular order: * Physical cabling problems (number of reasons/possibilities here are too many to list) * Dirty/dusty SATA connectors (cables/drive/host controller) * Electrical interference (badly shielded cables, etc.) * Physical electronic/electrical problems (disk PCB, host controller PCB, etc.) The important thing to remember about CRCs is that they indicate a hardware-level problem between the host controller and the controller chip on the drive. They do not indicate problems with the drive's cache (those are tracked in attribute 184), and they do not indicate software-level problems (e.g. driver bugs, etc.). I have no real advice for tracking this kind of problem down. The most common response is "replace cables", which isn't necessarily the root cause. I have no advice or tips on how to track down interference issues, or how to truly examine a disk PCB or controller PCB for the latter item. "Flaky traces" on a PCB could cause this sort of thing. Folks in the EE field would know more about these issues; I am not an EE person. Since the attribute increased on both drives simultaneously (I have to assume simultaneously?), it's more likely that the problem is not with SATA cables or the drives but the controller on the motherboard. I'd recommend replacing the motherboard. I make no guarantees this will fix anything however, but it is the "common point" for both of your drives. There really isn't anything else I can do going forward. This is pretty much where the buck stops for me, and is validation as to why each and every problem/issue has to be handled individually. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB |