Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 20 Sep 2005 08:26:07 GMT
From:      Thede Loder <thede@loder.com>
To:        freebsd-gnats-submit@FreeBSD.org
Subject:   i386/86364: ATA woes, SATA controller: failed writes, FS corruption, and system hang under heavy loads
Message-ID:  <200509200826.j8K8Q7jO014354@www.freebsd.org>
Resent-Message-ID: <200509200830.j8K8UI0v096696@freefall.freebsd.org>

next in thread | raw e-mail | index | archive | help

>Number:         86364
>Category:       i386
>Synopsis:       ATA woes, SATA controller: failed writes, FS corruption, and system hang under heavy loads
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-i386
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Tue Sep 20 08:30:18 GMT 2005
>Closed-Date:
>Last-Modified:
>Originator:     Thede Loder
>Release:        FreeBSD-6.0BETA5
>Organization:
Paritive, Inc. 
>Environment:
FreeBSD davros.loder.com 6.0-BETA5 FreeBSD 6.0-BETA5 #21: Sun Sep 18 22:34:20 PDT 2005    root@davros.loder.com:/usr/src/sys/i386/compile/DAVROS  i386

>Description:
      Hi all.  A little ATA trouble.  I've been running an 
NFS client-driven stress test on a NFS exported file system.  
It seems just fine unless the FS is hosted on a drive attached to 
a PCI SATA controller, which is a Promise SATAII150 TX2plus.  After a short period of time with the stress test (as little as a few seconds, as long as a minute or two), the exported drive simply hangs, eventually
causing writes to timing out on the NFS client  The drive device path
/dev/ad4 remains visible in /dev, but calls to access the drive do not seem to return.  'umount'ing the filesystem on the hung drive 
freezes all ATA devices and hangs the system (I am not overclocked).  
A hard reboot is required to bring things back to normal.  Not sure if data is being lost or not, but fsck always finds FS errors, and self
reboot is not possible with the console reporting failed buffer writes.  
I have repeated the stress test using a filesystem on ATA100 drives hosted by the mainboard's VIA 8235 without any problems, so it seems to be specific to the PCI Promise Controller and it's drives.  
The drive itself is a Western Digital (WDC WD2500JD-50GBB0 02.05D02).  
Motherboard is a KT3 Ultra 2 with an AMD 1800+ on it.  

I'm happy to dig into it further and provide more specifics, 
but need some experienced advice as to where to instrument.  

>How-To-Repeat:
     Export, via NFS, a filesystem that is on a drive hosted by the SATA controller.  Stress the filesystem (I used an import of mp3 files using 
iTunes).  After a minute or two, (repeatable) the the kernel outputs "ad4: FAILURE - SETFEATURES SET TRANSFER MODE timed out", and the drive becomes unresponsive, halting the NFS activity.  A subsequent 'umount' of the file system hangs all ATA devices on the system, preventing login or logout.  If "reboot" is issued before the 'umount', the reboot
process starts but hangs while flushing buffers.  
>Fix:
      
>Release-Note:
>Audit-Trail:
>Unformatted:



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200509200826.j8K8Q7jO014354>