From owner-freebsd-current@FreeBSD.ORG  Fri Nov  7 20:48:00 2003
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 017BD16A4CE; Fri,  7 Nov 2003 20:48:00 -0800 (PST)
Received: from obsecurity.dyndns.org
	(adsl-63-207-60-234.dsl.lsan03.pacbell.net [63.207.60.234])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id D2B0043FDD; Fri,  7 Nov 2003 20:47:58 -0800 (PST)
	(envelope-from kris@obsecurity.org)
Received: by obsecurity.dyndns.org (Postfix, from userid 1000)
	id 10EC566D80; Fri,  7 Nov 2003 20:47:58 -0800 (PST)
Date: Fri, 7 Nov 2003 20:47:57 -0800
From: Kris Kennaway <kris@obsecurity.org>
To: "Andrew P. Lentvorski, Jr." <bsder@allcaps.org>
Message-ID: <20031108044757.GA1387@xor.obsecurity.org>
References: <XFMail.20031107140654.jhb@FreeBSD.org>
	<20031107202526.S532@mail.allcaps.org>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="MGYHOYXEY6WxJCY8"
Content-Disposition: inline
In-Reply-To: <20031107202526.S532@mail.allcaps.org>
User-Agent: Mutt/1.4.1i
cc: Kris Kennaway <kris@obsecurity.org>
cc: re@FreeBSD.org
cc: current@FreeBSD.org
cc: John Baldwin <jhb@FreeBSD.org>
cc: sos@FreeBSD.org
Subject: Re: Too many uncorrectable read errors with atang
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 08 Nov 2003 04:48:00 -0000


--MGYHOYXEY6WxJCY8
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Fri, Nov 07, 2003 at 08:36:28PM -0800, Andrew P. Lentvorski, Jr. wrote:
> On Fri, 7 Nov 2003, John Baldwin wrote:
>=20
> > On 07-Nov-2003 Kris Kennaway wrote:
> > > So far this has happened (well, the panic above was new) on 5 separate
> > > machines that were all working on older -current.  Now, these are all
> > > IBM DeathStar drives, but previously I was only experiencing ata
> > > errors every month or two, and they were correctable for another month
> > > or two by /dev/zero'ing the drive.
>=20
> IBM Deathstar's have this annoying tendency to perform thermal
> recalibration cycles that cause them to delay returning data for somewhere
> between 30-90 seconds until the calibration finishes.  Unfortunately,
> these seem to show up as uncorrectable errors.  It's a true pain with RAID
> cards as the RAID array will take the drive offline when it could retry
> the data.
>=20
> If you can, try to reduce the temperature of the drives.  This generally
> helped my Deathstars before I got rid of them all.
>=20
> Also, given the touchiness of PRML detectors, it is entirely possible that
> the drive is reading increased errors due to the solar flares as a need to
> thermally recalibrate more often.
>=20
> Other than tossing the drives, ATAng, like Windows, would have to be more
> aggressive about retrying even uncorrectable errors for up to a minute or
> so before giving up.

Thanks..that's interesting, perhaps there's something sos can do here.
Unfortunately the drives in question are in Yahoo's datacenter, so I
do not have physical access.

Kris

--MGYHOYXEY6WxJCY8
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (FreeBSD)

iD8DBQE/rHV9Wry0BWjoQKURAt93AJ4zHrIyHAK/dFX5qZN/sF99CCUosQCfZ5u8
3ACuVW8aTBA+RXZ8EbVLbyM=
=Sh0v
-----END PGP SIGNATURE-----

--MGYHOYXEY6WxJCY8--