From owner-freebsd-questions@FreeBSD.ORG Fri Jan 21 02:19:10 2005 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id C927D16A4CE for ; Fri, 21 Jan 2005 02:19:10 +0000 (GMT) Received: from dexter.starfire.mn.org (starfire.skypoint.net [66.93.17.236]) by mx1.FreeBSD.org (Postfix) with ESMTP id EFFDE43D1F for ; Fri, 21 Jan 2005 02:19:09 +0000 (GMT) (envelope-from john@dexter.starfire.mn.org) Received: (from john@localhost) by dexter.starfire.mn.org (8.11.3/8.11.3) id j0L2IuC00643; Thu, 20 Jan 2005 20:18:56 -0600 (CST) (envelope-from john) Date: Thu, 20 Jan 2005 20:18:56 -0600 From: John To: David Bear Message-ID: <20050120201856.A572@starfire.mn.org> References: <20050121002113.GH6843@asu.edu> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: <20050121002113.GH6843@asu.edu>; from David.Bear@asu.edu on Thu, Jan 20, 2005 at 05:21:13PM -0700 cc: freebsd-questions@freebsd.org Subject: Re: hard drive errors X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Jan 2005 02:19:11 -0000 On Thu, Jan 20, 2005 at 05:21:13PM -0700, David Bear wrote: > I am receiving the following errors on my hard drive. This appears to > affect some file in /var/log. My question is twofold. 1) shouldn't ufs > notice this sector as being unuseable and mark it offlimites? 2) if > not, is there a way to mark it so manually? > > > ad0s1g: hard error reading fsbn 19674311 of 6765124-6765135 (ad0s1 bn > 19674311; cn 1618 tn 16 sn 41) status=59 error=40 > ad0s1g: hard error reading fsbn 6765124 (ad0s1 bn 6765124; cn 556 tn > 74 sn 58) status=59 error=40 > ad0s1h: hard error reading fsbn 88412159 of 35809248-35809251 (ad0s1 > bn 88412159; cn 7271 tn 64 sn 38) status=59 error=40 > ad0s1h: hard error reading fsbn 35809251 (ad0s1 bn 35809251; cn 2945 > tn 15 sn 51) status=59 error=40 > ad0s1g: hard error reading fsbn 19674303 of 6765120-6765133 (ad0s1 bn > 19674303; cn 1618 tn 16 sn 33) status=59 error=40 > ad0s1g: hard error reading fsbn 6765124 (ad0s1 bn 6765124; cn 556 tn > 74 sn 58) status=59 error=40 Modern disk drives do a lot to manage errors, but things can still happen that they cannot protect against - this is part of the reason various RAID schemes are used. If the drive gets a lot of recoverable (soft) errors, that means that it can reconstruct the data, even though it was damaged. Having reconstructed the data, it can remap the sector. A hard error means that, by the time the problem was noticed, data were already unrecoverable. It can't simply remap the sector somewhere else, because the data are already gone! If it were to map it somewhere else - what would it put there? It doesn't know, and neither do I. You really, really need to back up your data somewhere. You may already have lost data which are valuable to you, but that's no reason to loose more. After that, go into the BIOS and do a surface scan of the drive. That will cause it to remap all the sectors that are unrecoverable. Then, remake the affected filesystem, and restore your data. If the drive is basically a good drive, you should be fine again. If the drive is failing, more hard (and soft) errors will pop up, and your data are at greater risk. Fortunately, you say the errors seem to be in /var/log. Maybe remaking the /var subsystem and loosing some log files won't really cause you any pain. I hope that that is the case. There used to be filesystem-level code to manage bad sectors. This was bad, because when you went to do unit copies (rarely done anymore), you'd still hit the bad spots. The ability to manage disk defects was then pushed down into the driver (bad144 disk defect management), and then down into the drives themselves. NONE of those methods can protect you from the sudden and seemingly spontaneous loss of data! If you move your system, or it is subject to shock and vibration, and the heads go bouncing across the surface - data may be lost. Sometimes I swear cosmic rays just blast out some bits (well, it SEEMS like it), and, ultimately, thermodynamics cannot be beaten - any image, magnetic or otherwise, fades with time. The signal-to-noise ratio of the heads and eletronics also changes over the life of the product, and tiny flecks can come off, be deposited on, or moved around the disk surface. All of this can cause data problems. Though almost no-one does it, back up your data. Back up your data. Back up your data. Like the old joke about real estate that the three most imporant features are location, location, and location, the three most imporant steps in preserving and protecting data (short of hardware RAID protection and remote and local subsystem based replication) are backup, backup, and backup. I actually have an arrangement with a friend of mine that the most imporant data on my system are rolled up into a tarball and an expect script FTP's it to one of his servers every night. A little kludgy, but it works as poor-man's remote data replication. -- John Lind john@starfire.MN.ORG