From owner-freebsd-current@FreeBSD.ORG  Fri Nov  7 20:36:23 2003
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id CE5D116A4CE; Fri,  7 Nov 2003 20:36:23 -0800 (PST)
Received: from mail.allcaps.org (mail.allcaps.org [206.251.247.157])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id CF5A443FCB; Fri,  7 Nov 2003 20:36:22 -0800 (PST)
	(envelope-from bsder@allcaps.org)
Received: from mail.allcaps.org (localhost [127.0.0.1])
	by mail.allcaps.org (Postfix) with ESMTP
	id E081CD844C; Fri,  7 Nov 2003 20:36:28 -0800 (PST)
Received: from localhost (bsder@localhost)hA84aSmk000547;
	Fri, 7 Nov 2003 20:36:28 -0800 (PST)
X-Authentication-Warning: mail.allcaps.org: bsder owned process doing -bs
Date: Fri, 7 Nov 2003 20:36:28 -0800 (PST)
From: "Andrew P. Lentvorski, Jr." <bsder@allcaps.org>
To: John Baldwin <jhb@FreeBSD.org>
In-Reply-To: <XFMail.20031107140654.jhb@FreeBSD.org>
Message-ID: <20031107202526.S532@mail.allcaps.org>
References: <XFMail.20031107140654.jhb@FreeBSD.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
cc: sos@FreeBSD.org
cc: re@FreeBSD.org
cc: current@FreeBSD.org
cc: Kris Kennaway <kris@obsecurity.org>
Subject: RE: Too many uncorrectable read errors with atang
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 08 Nov 2003 04:36:23 -0000

On Fri, 7 Nov 2003, John Baldwin wrote:

> On 07-Nov-2003 Kris Kennaway wrote:
> > So far this has happened (well, the panic above was new) on 5 separate
> > machines that were all working on older -current.  Now, these are all
> > IBM DeathStar drives, but previously I was only experiencing ata
> > errors every month or two, and they were correctable for another month
> > or two by /dev/zero'ing the drive.

IBM Deathstar's have this annoying tendency to perform thermal
recalibration cycles that cause them to delay returning data for somewhere
between 30-90 seconds until the calibration finishes.  Unfortunately,
these seem to show up as uncorrectable errors.  It's a true pain with RAID
cards as the RAID array will take the drive offline when it could retry
the data.

If you can, try to reduce the temperature of the drives.  This generally
helped my Deathstars before I got rid of them all.

Also, given the touchiness of PRML detectors, it is entirely possible that
the drive is reading increased errors due to the solar flares as a need to
thermally recalibrate more often.

Other than tossing the drives, ATAng, like Windows, would have to be more
aggressive about retrying even uncorrectable errors for up to a minute or
so before giving up.

-a