From owner-freebsd-current@FreeBSD.ORG Fri Nov 7 20:48:00 2003 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 017BD16A4CE; Fri, 7 Nov 2003 20:48:00 -0800 (PST) Received: from obsecurity.dyndns.org (adsl-63-207-60-234.dsl.lsan03.pacbell.net [63.207.60.234]) by mx1.FreeBSD.org (Postfix) with ESMTP id D2B0043FDD; Fri, 7 Nov 2003 20:47:58 -0800 (PST) (envelope-from kris@obsecurity.org) Received: by obsecurity.dyndns.org (Postfix, from userid 1000) id 10EC566D80; Fri, 7 Nov 2003 20:47:58 -0800 (PST) Date: Fri, 7 Nov 2003 20:47:57 -0800 From: Kris Kennaway To: "Andrew P. Lentvorski, Jr." Message-ID: <20031108044757.GA1387@xor.obsecurity.org> References: <20031107202526.S532@mail.allcaps.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="MGYHOYXEY6WxJCY8" Content-Disposition: inline In-Reply-To: <20031107202526.S532@mail.allcaps.org> User-Agent: Mutt/1.4.1i cc: Kris Kennaway cc: re@FreeBSD.org cc: current@FreeBSD.org cc: John Baldwin cc: sos@FreeBSD.org Subject: Re: Too many uncorrectable read errors with atang X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 08 Nov 2003 04:48:00 -0000 --MGYHOYXEY6WxJCY8 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Nov 07, 2003 at 08:36:28PM -0800, Andrew P. Lentvorski, Jr. wrote: > On Fri, 7 Nov 2003, John Baldwin wrote: >=20 > > On 07-Nov-2003 Kris Kennaway wrote: > > > So far this has happened (well, the panic above was new) on 5 separate > > > machines that were all working on older -current. Now, these are all > > > IBM DeathStar drives, but previously I was only experiencing ata > > > errors every month or two, and they were correctable for another month > > > or two by /dev/zero'ing the drive. >=20 > IBM Deathstar's have this annoying tendency to perform thermal > recalibration cycles that cause them to delay returning data for somewhere > between 30-90 seconds until the calibration finishes. Unfortunately, > these seem to show up as uncorrectable errors. It's a true pain with RAID > cards as the RAID array will take the drive offline when it could retry > the data. >=20 > If you can, try to reduce the temperature of the drives. This generally > helped my Deathstars before I got rid of them all. >=20 > Also, given the touchiness of PRML detectors, it is entirely possible that > the drive is reading increased errors due to the solar flares as a need to > thermally recalibrate more often. >=20 > Other than tossing the drives, ATAng, like Windows, would have to be more > aggressive about retrying even uncorrectable errors for up to a minute or > so before giving up. Thanks..that's interesting, perhaps there's something sos can do here. Unfortunately the drives in question are in Yahoo's datacenter, so I do not have physical access. Kris --MGYHOYXEY6WxJCY8 Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.3 (FreeBSD) iD8DBQE/rHV9Wry0BWjoQKURAt93AJ4zHrIyHAK/dFX5qZN/sF99CCUosQCfZ5u8 3ACuVW8aTBA+RXZ8EbVLbyM= =Sh0v -----END PGP SIGNATURE----- --MGYHOYXEY6WxJCY8--