From owner-freebsd-current@FreeBSD.ORG Fri Nov 7 20:36:23 2003 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id CE5D116A4CE; Fri, 7 Nov 2003 20:36:23 -0800 (PST) Received: from mail.allcaps.org (mail.allcaps.org [206.251.247.157]) by mx1.FreeBSD.org (Postfix) with ESMTP id CF5A443FCB; Fri, 7 Nov 2003 20:36:22 -0800 (PST) (envelope-from bsder@allcaps.org) Received: from mail.allcaps.org (localhost [127.0.0.1]) by mail.allcaps.org (Postfix) with ESMTP id E081CD844C; Fri, 7 Nov 2003 20:36:28 -0800 (PST) Received: from localhost (bsder@localhost)hA84aSmk000547; Fri, 7 Nov 2003 20:36:28 -0800 (PST) X-Authentication-Warning: mail.allcaps.org: bsder owned process doing -bs Date: Fri, 7 Nov 2003 20:36:28 -0800 (PST) From: "Andrew P. Lentvorski, Jr." To: John Baldwin In-Reply-To: Message-ID: <20031107202526.S532@mail.allcaps.org> References: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: sos@FreeBSD.org cc: re@FreeBSD.org cc: current@FreeBSD.org cc: Kris Kennaway Subject: RE: Too many uncorrectable read errors with atang X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 08 Nov 2003 04:36:23 -0000 On Fri, 7 Nov 2003, John Baldwin wrote: > On 07-Nov-2003 Kris Kennaway wrote: > > So far this has happened (well, the panic above was new) on 5 separate > > machines that were all working on older -current. Now, these are all > > IBM DeathStar drives, but previously I was only experiencing ata > > errors every month or two, and they were correctable for another month > > or two by /dev/zero'ing the drive. IBM Deathstar's have this annoying tendency to perform thermal recalibration cycles that cause them to delay returning data for somewhere between 30-90 seconds until the calibration finishes. Unfortunately, these seem to show up as uncorrectable errors. It's a true pain with RAID cards as the RAID array will take the drive offline when it could retry the data. If you can, try to reduce the temperature of the drives. This generally helped my Deathstars before I got rid of them all. Also, given the touchiness of PRML detectors, it is entirely possible that the drive is reading increased errors due to the solar flares as a need to thermally recalibrate more often. Other than tossing the drives, ATAng, like Windows, would have to be more aggressive about retrying even uncorrectable errors for up to a minute or so before giving up. -a