From owner-freebsd-questions@FreeBSD.ORG Tue Jun 15 18:53:09 2010 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 62A021065676 for ; Tue, 15 Jun 2010 18:53:09 +0000 (UTC) (envelope-from freebsd-questions@m.gmane.org) Received: from lo.gmane.org (lo.gmane.org [80.91.229.12]) by mx1.freebsd.org (Postfix) with ESMTP id E27EA8FC1F for ; Tue, 15 Jun 2010 18:53:08 +0000 (UTC) Received: from list by lo.gmane.org with local (Exim 4.69) (envelope-from ) id 1OObFy-00080Y-Uk for freebsd-questions@freebsd.org; Tue, 15 Jun 2010 20:53:06 +0200 Received: from pool-71-166-153-11.washdc.east.verizon.net ([71.166.153.11]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Tue, 15 Jun 2010 20:53:06 +0200 Received: from nightrecon by pool-71-166-153-11.washdc.east.verizon.net with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Tue, 15 Jun 2010 20:53:06 +0200 X-Injected-Via-Gmane: http://gmane.org/ To: freebsd-questions@freebsd.org connect(): No such file or directory From: Michael Powell Followup-To: gmane.os.freebsd.questions Date: Tue, 15 Jun 2010 14:54:50 -0400 Lines: 56 Message-ID: References: <1609626746.429.1276527962347.JavaMail.root@spitfire.phantombsd.org> Mime-Version: 1.0 Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7Bit X-Complaints-To: usenet@dough.gmane.org X-Gmane-NNTP-Posting-Host: pool-71-166-153-11.washdc.east.verizon.net Subject: Re: SATA time outs X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Jun 2010 18:53:09 -0000 Casey Scott wrote: > Since upgrading to 8.0 RELEASE, I continually get these errors: > > ... > Jun 11 15:24:08 xxxx kernel: ad6: 953869MB at > ata3-master SATA150 Jun 11 15:24:08 xxxx kernel: (probe6:ahc0:0:6:0): TEST > UNIT READY. CDB: 0 0 0 0 0 0 Jun 11 15:24:08 xxxx kernel: > (probe6:ahc0:0:6:0): CAM Status: SCSI Status Error Jun 11 15:24:08 xxxx > kernel: (probe6:ahc0:0:6:0): SCSI Status: Check Condition Jun 11 15:24:08 > xxxx kernel: (probe6:ahc0:0:6:0): UNIT ATTENTION asc:29,2 Jun 11 15:24:08 > xxxx kernel: (probe6:ahc0:0:6:0): SCSI bus reset occurred Jun 11 15:24:08 > xxxx kernel: (probe6:ahc0:0:6:0): Retrying Command (per Sense Data) ... > > > I've tried 3 different drives w/ 2 different disk controllers. Anything I > use as the second drive generates this message on boot, and will > eventually fail with timeout errors after a couple hours. The other drive > on the system, ad4, never displays these symptoms. This isn't new > hardware, and worked flawlessly until now. > > Any suggestions? Has a bug been introduced into the ata driver? > These drives are known to be failing in large numbers, with various forms of defective firmwares. The worst is the so-called "self-bricking" feature. Try some other kind of drive other than just replacing with more of the same. Possibly a firmware flash might help in cases other then the "self-bricking" scenario, as once it happens they're done. Also, I'm very leery of putting "Green" drives in any kind of server environment. They spend way to much time parking heads and spinning down. Another thing to watch for is using desktop drives with RAID controllers. Enterprise drives have a very short timeout period designed to keep them from being dropped by the RAID controller: http://wdc.custhelp.com/cgi-bin/wdc.cfg/php/enduser/std_adp.php?p_faqid=1397 If it is slightly older motherboard/BIOS look and see if these are set to "1" in sysctl -a and maybe try toggling in loader,conf like the following: hw.pci.enable_msi="0" hw.pci.enable_msix="0" vmstat -i and look for really outlandish interrupt storm. Hard to tell as disk controllers are usually pretty busy here. Newer equipment is supposed to be able to operate in a shared interrupt environment. Can try and manually sort out so that irq's for the controller aren't shared. As far as the ATA driver code, if you have recently changed from 7.x to 8.x that might be worth considering. If there has been a regression I'm sure a PR would be in order. Just a few random thoughts off the top of my head. But me, the first thing I'd do is dump the Seagates. -Mike