From owner-freebsd-questions@FreeBSD.ORG  Tue Jun 15 18:53:09 2010
Return-Path: <owner-freebsd-questions@FreeBSD.ORG>
Delivered-To: freebsd-questions@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 62A021065676
	for <freebsd-questions@freebsd.org>;
	Tue, 15 Jun 2010 18:53:09 +0000 (UTC)
	(envelope-from freebsd-questions@m.gmane.org)
Received: from lo.gmane.org (lo.gmane.org [80.91.229.12])
	by mx1.freebsd.org (Postfix) with ESMTP id E27EA8FC1F
	for <freebsd-questions@freebsd.org>;
	Tue, 15 Jun 2010 18:53:08 +0000 (UTC)
Received: from list by lo.gmane.org with local (Exim 4.69)
	(envelope-from <freebsd-questions@m.gmane.org>) id 1OObFy-00080Y-Uk
	for freebsd-questions@freebsd.org; Tue, 15 Jun 2010 20:53:06 +0200
Received: from pool-71-166-153-11.washdc.east.verizon.net ([71.166.153.11])
	by main.gmane.org with esmtp (Gmexim 0.1 (Debian))
	id 1AlnuQ-0007hv-00
	for <freebsd-questions@freebsd.org>; Tue, 15 Jun 2010 20:53:06 +0200
Received: from nightrecon by pool-71-166-153-11.washdc.east.verizon.net with
	local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00
	for <freebsd-questions@freebsd.org>; Tue, 15 Jun 2010 20:53:06 +0200
X-Injected-Via-Gmane: http://gmane.org/
To: freebsd-questions@freebsd.org
connect(): No such file or directory
From: Michael Powell <nightrecon@hotmail.com>
Followup-To: gmane.os.freebsd.questions
Date: Tue, 15 Jun 2010 14:54:50 -0400
Lines: 56
Message-ID: <hv8i6b$ttj$1@dough.gmane.org>
References: <1609626746.429.1276527962347.JavaMail.root@spitfire.phantombsd.org>
Mime-Version: 1.0
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 7Bit
X-Complaints-To: usenet@dough.gmane.org
X-Gmane-NNTP-Posting-Host: pool-71-166-153-11.washdc.east.verizon.net
Subject: Re: SATA time outs
X-BeenThere: freebsd-questions@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: User questions <freebsd-questions.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-questions>, 
	<mailto:freebsd-questions-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-questions>
List-Post: <mailto:freebsd-questions@freebsd.org>
List-Help: <mailto:freebsd-questions-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-questions>, 
	<mailto:freebsd-questions-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 15 Jun 2010 18:53:09 -0000

Casey Scott wrote:

> Since upgrading to 8.0 RELEASE, I continually get these errors:
> 
> ...
> Jun 11 15:24:08 xxxx kernel: ad6: 953869MB <Seagate ST31000340AS SD1A> at
> ata3-master SATA150 Jun 11 15:24:08 xxxx kernel: (probe6:ahc0:0:6:0): TEST
> UNIT READY. CDB: 0 0 0 0 0 0 Jun 11 15:24:08 xxxx kernel:
> (probe6:ahc0:0:6:0): CAM Status: SCSI Status Error Jun 11 15:24:08 xxxx
> kernel: (probe6:ahc0:0:6:0): SCSI Status: Check Condition Jun 11 15:24:08
> xxxx kernel: (probe6:ahc0:0:6:0): UNIT ATTENTION asc:29,2 Jun 11 15:24:08
> xxxx kernel: (probe6:ahc0:0:6:0): SCSI bus reset occurred Jun 11 15:24:08
> xxxx kernel: (probe6:ahc0:0:6:0): Retrying Command (per Sense Data) ...
> 
> 
> I've tried 3 different drives w/ 2 different disk controllers. Anything I
> use as the second drive generates this message on boot, and will
> eventually fail with timeout errors after a couple hours.  The other drive
> on the system, ad4, never displays these symptoms. This isn't new
> hardware, and worked flawlessly until now.
> 
> Any suggestions? Has a bug been introduced into the ata driver?
> 

These drives are known to be failing in large numbers, with various forms of 
defective firmwares. The worst is the so-called "self-bricking" feature. Try 
some other kind of drive other than just replacing with more of the same. 
Possibly a firmware flash might help in cases other then the "self-bricking" 
scenario, as once it happens they're done.

Also, I'm very leery of putting "Green" drives in any kind of server 
environment. They spend way to much time parking heads and spinning down. 
Another thing to watch for is using desktop drives with RAID controllers. 
Enterprise drives have a very short timeout period designed to keep them 
from being dropped by the RAID controller:

http://wdc.custhelp.com/cgi-bin/wdc.cfg/php/enduser/std_adp.php?p_faqid=1397

If it is slightly older motherboard/BIOS look and see if these are set to 
"1" in sysctl -a and maybe try toggling in loader,conf like the following:

hw.pci.enable_msi="0"
hw.pci.enable_msix="0"

vmstat -i and look for really outlandish interrupt storm. Hard to tell as 
disk controllers are usually pretty busy here. Newer equipment is supposed 
to be able to operate in a shared interrupt environment. Can try and 
manually sort out so that irq's for the controller aren't shared.

As far as the ATA driver code, if you have recently changed from 7.x to 8.x 
that might be worth considering. If there has been a regression I'm sure a 
PR would be in order. Just a few random thoughts off the top of my head. But 
me, the first thing I'd do is dump the Seagates.

-Mike