Date: Tue, 15 Jun 2010 14:54:50 -0400 From: Michael Powell <nightrecon@hotmail.com> To: freebsd-questions@freebsd.org Subject: Re: SATA time outs Message-ID: <hv8i6b$ttj$1@dough.gmane.org> References: <1609626746.429.1276527962347.JavaMail.root@spitfire.phantombsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Casey Scott wrote: > Since upgrading to 8.0 RELEASE, I continually get these errors: > > ... > Jun 11 15:24:08 xxxx kernel: ad6: 953869MB <Seagate ST31000340AS SD1A> at > ata3-master SATA150 Jun 11 15:24:08 xxxx kernel: (probe6:ahc0:0:6:0): TEST > UNIT READY. CDB: 0 0 0 0 0 0 Jun 11 15:24:08 xxxx kernel: > (probe6:ahc0:0:6:0): CAM Status: SCSI Status Error Jun 11 15:24:08 xxxx > kernel: (probe6:ahc0:0:6:0): SCSI Status: Check Condition Jun 11 15:24:08 > xxxx kernel: (probe6:ahc0:0:6:0): UNIT ATTENTION asc:29,2 Jun 11 15:24:08 > xxxx kernel: (probe6:ahc0:0:6:0): SCSI bus reset occurred Jun 11 15:24:08 > xxxx kernel: (probe6:ahc0:0:6:0): Retrying Command (per Sense Data) ... > > > I've tried 3 different drives w/ 2 different disk controllers. Anything I > use as the second drive generates this message on boot, and will > eventually fail with timeout errors after a couple hours. The other drive > on the system, ad4, never displays these symptoms. This isn't new > hardware, and worked flawlessly until now. > > Any suggestions? Has a bug been introduced into the ata driver? > These drives are known to be failing in large numbers, with various forms of defective firmwares. The worst is the so-called "self-bricking" feature. Try some other kind of drive other than just replacing with more of the same. Possibly a firmware flash might help in cases other then the "self-bricking" scenario, as once it happens they're done. Also, I'm very leery of putting "Green" drives in any kind of server environment. They spend way to much time parking heads and spinning down. Another thing to watch for is using desktop drives with RAID controllers. Enterprise drives have a very short timeout period designed to keep them from being dropped by the RAID controller: http://wdc.custhelp.com/cgi-bin/wdc.cfg/php/enduser/std_adp.php?p_faqid=1397 If it is slightly older motherboard/BIOS look and see if these are set to "1" in sysctl -a and maybe try toggling in loader,conf like the following: hw.pci.enable_msi="0" hw.pci.enable_msix="0" vmstat -i and look for really outlandish interrupt storm. Hard to tell as disk controllers are usually pretty busy here. Newer equipment is supposed to be able to operate in a shared interrupt environment. Can try and manually sort out so that irq's for the controller aren't shared. As far as the ATA driver code, if you have recently changed from 7.x to 8.x that might be worth considering. If there has been a regression I'm sure a PR would be in order. Just a few random thoughts off the top of my head. But me, the first thing I'd do is dump the Seagates. -Mike
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?hv8i6b$ttj$1>