From owner-freebsd-stable  Sat Aug 31  3:15:37 2002
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.FreeBSD.org (mx1.FreeBSD.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 8ECE337B400
	for <freebsd-stable@freebsd.org>; Sat, 31 Aug 2002 03:15:34 -0700 (PDT)
Received: from mail.allcaps.org (h-66-166-142-198.SNDACAGL.covad.net [66.166.142.198])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 1E38643E6A
	for <freebsd-stable@freebsd.org>; Sat, 31 Aug 2002 03:15:34 -0700 (PDT)
	(envelope-from bsder@mail.allcaps.org)
Received: by mail.allcaps.org (Postfix, from userid 501)
	id 55AEA154EF; Sat, 31 Aug 2002 03:15:33 -0700 (PDT)
Received: from localhost (localhost [127.0.0.1])
	by mail.allcaps.org (Postfix) with ESMTP
	id 4BBC4154EE; Sat, 31 Aug 2002 03:15:33 -0700 (PDT)
Date: Sat, 31 Aug 2002 03:15:33 -0700 (PDT)
From: "Andrew P. Lentvorski" <bsder@mail.allcaps.org>
To: BugsGrief@bugsgrief.net
Cc: freebsd-stable@freebsd.org
Subject: Re: ata problem(s)
In-Reply-To: <200208310930.g7V9UcK57531@ogyo.bugsgrief.net>
Message-ID: <20020831025133.O55708-100000@mail.allcaps.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-stable@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-stable.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-stable>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-stable>
X-Loop: FreeBSD.ORG

You have symptoms similar to what I had.  After digging on the web and
wading through IBM tech support, my hypothesis was that these symptoms
were caused by a thermal recalibration cycle.  However, since this bug was
intermittent, I never managed to get conclusive proof.

This thermal recalibration seems to be the reason why IBM specified that
certain drives were only expected to be on about 10 hours per day (it does
a thermal recalibrate on power up and then doesn't need to do one again
for a while).  I had 3 40GB IBM disks which had exactly the same symptoms.
Normally the disk makes a pretty good racket (clicks and clunks) just
before FreeBSD throws the errors.

If my hypothesis is correct, there really isn't any way around this unless
someone can suggest a kernel variable tweak that allows more time for disk
I/O writes to complete or increases the number of retries before resetting
the ATA bus so that the drive can come out of calibration and then
complete the transaction.

If the drives are still under warranty, harass IBM.  Unfortunately, the
drive is going to show up as a good drive under their drive tester
software (thermal recalibration is very intermittent); so they'll give you
grief.  I managed to get them to swap all of my drives for new ones, but I
just use them as spares now.

I dumped my IBM drives from my RAID box and replaced them with another
brand (Maxtor, I think).  I haven't had any problems since.

-a


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message