From owner-freebsd-hardware@FreeBSD.ORG Tue Oct 3 05:54:51 2006 Return-Path: X-Original-To: freebsd-hardware@freebsd.org Delivered-To: freebsd-hardware@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id F00C316A403 for ; Tue, 3 Oct 2006 05:54:50 +0000 (UTC) (envelope-from richw@richw.org) Received: from smtp3.stanford.edu (smtp3.Stanford.EDU [171.67.20.26]) by mx1.FreeBSD.org (Postfix) with ESMTP id 8CE0A43D45 for ; Tue, 3 Oct 2006 05:54:50 +0000 (GMT) (envelope-from richw@richw.org) Received: from smtp3.stanford.edu (localhost.localdomain [127.0.0.1]) by localhost (Postfix) with SMTP id 197CB4C529 for ; Mon, 2 Oct 2006 22:54:50 -0700 (PDT) Received: from whodunit.richw.org (SW-90-716-276-1.Stanford.EDU [171.66.155.243]) by smtp3.stanford.edu (Postfix) with ESMTP id D92944C494 for ; Mon, 2 Oct 2006 22:54:49 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by whodunit.richw.org (Postfix) with ESMTP id A06573C36D; Mon, 2 Oct 2006 22:54:49 -0700 (PDT) X-Virus-Scanned: amavisd-new at richw.org Received: from whodunit.richw.org ([127.0.0.1]) by localhost (whodunit.richw.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id lgb+t6LokbEz; Mon, 2 Oct 2006 22:54:48 -0700 (PDT) Received: from [172.29.0.21] (evilempire.richw.org [172.29.0.21]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "evilempire.richw.org", Issuer "richw.org" (verified OK)) (Authenticated sender: richw) by whodunit.richw.org (Postfix) with ESMTP id B3E2B3C36B; Mon, 2 Oct 2006 22:54:48 -0700 (PDT) Date: Mon, 02 Oct 2006 22:54:48 -0700 From: Rich Wales User-Agent: Thunderbird 1.5.0.7 (Windows/20060909) MIME-Version: 1.0 To: freebsd-hardware@freebsd.org X-Enigmail-Version: 0.94.0.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Message-Id: <20061003055448.B3E2B3C36B@whodunit.richw.org> Subject: Re: SATA-hdd or SATA-controller trouble. X-BeenThere: freebsd-hardware@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: General discussion of FreeBSD hardware List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 Oct 2006 05:54:51 -0000 "Anton" wrote: >> Aug 21 18:46:27 nrr kernel: ad4: TIMEOUT - READ_DMA retrying (2 retries left) LBA=344654303 >> Aug 21 18:46:32 nrr kernel: ad4: FAILURE - ATA_IDENTIFY timed out >> Aug 21 18:46:37 nrr kernel: ad4: FAILURE - ATA_IDENTIFY timed out >> Aug 21 18:46:37 nrr kernel: ad4: WARNING - removed from configuration >> Aug 21 18:46:37 nrr kernel: ata2-master: FAILURE - READ_DMA timed out "Veronica" replied: > I have had similar messages when my ATA cable was damaged. So I suggest > replacing your cable. I've been seeing similar problems to Anton, with brand-new SATA cables that are definitely not damaged. (Note that Anton was talking about a SATA disk, with a completely different kind of data cable from old ATA drives.) Veronica continued: > Also you might want to check the temperature of the disk using the > "smartmontools" utility from freebsd-ports. Harddrives should always > be kept very cool < 40 degrees if possible. A higher risk of data loss > and/or lower lifespan could be the result of a higher temperature. > Smartmontools can also run self-tests (short or long ones) to check for > problems with your drive. Although it's possible that Anton could be having hardware problems due to overheating or other drive flakiness, there have been lots of reports of timeout problems with SATA drives on Promise controllers under heavy I/O load, from many people, for quite some time now, and I would be surprised if they were all due to overheating. I'm currently running a "dd if=/dev/adXXX of=/dev/null bs=64k conv=noerror" command on each of my two Seagate 300GB SATA drives simultaneously (with "adXXX" replaced by the real drive device name in each case). I've got the case open, with a large external fan blowing air onto the drives. Running "smartctl -a /dev/adXXX" on each drive shows the temperature in each drive to be around 35C. Earlier self-tests on both drives finished successfully. Nevertheless, I'm seeing a bunch of timeout problems reported on both drives. Something is messed up -- maybe in the Promise controller, maybe in the FreeBSD driver, or (I'll admit for the sake of completeness) maybe in the drives or elsewhere in the system. And as I said, lots of people on the net have reported this problem, but no one (so far) has confessed to having a clue as to what is causing it or how to fix it. I'm running 6.1-RELEASE-p9 on an old 800-MHz Athlon (original "Slot A" CPU type), in a DTK VAM-0070 motherboard. I've seen other people, though, report this problem with much newer hardware. Rich Wales Palo Alto, CA, USA richw@richw.org http://www.richw.org