From owner-freebsd-current@FreeBSD.ORG Mon Mar 8 11:47:05 2010 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 38CC3106566B for ; Mon, 8 Mar 2010 11:47:05 +0000 (UTC) (envelope-from morganw@chemikals.org) Received: from warped.bluecherry.net (unknown [IPv6:2001:440:eeee:fffb::2]) by mx1.freebsd.org (Postfix) with ESMTP id 933BF8FC0C for ; Mon, 8 Mar 2010 11:47:04 +0000 (UTC) Received: from volatile.chemikals.org (adsl-67-211-10.shv.bellsouth.net [98.67.211.10]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by warped.bluecherry.net (Postfix) with ESMTPSA id D2EB3806DF7C; Mon, 8 Mar 2010 05:47:02 -0600 (CST) Received: from localhost (morganw@localhost [127.0.0.1]) by volatile.chemikals.org (8.14.4/8.14.4) with ESMTP id o28Bkv3m066839; Mon, 8 Mar 2010 05:46:57 -0600 (CST) (envelope-from morganw@chemikals.org) Date: Mon, 8 Mar 2010 05:46:57 -0600 (CST) From: Wes Morgan X-X-Sender: morganw@volatile To: Miroslav Lachman <000.fbsd@quip.cz> In-Reply-To: <4B94DDC8.5080008@quip.cz> Message-ID: References: <20100308102918.GA5485@localhost> <4B94DDC8.5080008@quip.cz> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Virus-Scanned: clamav-milter 0.95.3 at warped X-Virus-Status: Clean Cc: Eugeny N Dzhurinsky , freebsd-current@freebsd.org Subject: Re: A tool for remapping bad sectors in CURRENT? X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 08 Mar 2010 11:47:05 -0000 On Mon, 8 Mar 2010, Miroslav Lachman wrote: > Eugeny N Dzhurinsky wrote: > > Hello, all! > > > > Recently I've started to see the following logs in messages: > > > > Mar 8 12:00:24 localhost smartd[795]: Device: /dev/ad4, 2 Currently > > unreadable (pending) sectors > > Mar 8 12:00:24 localhost smartd[795]: Device: /dev/ad4, 2 Offline > > uncorrectable sectors > > > > smartctl did really show that something is wrong with my HDD, but still no > > remaps - just read errors. > > > > SMART Self-test log structure revision number 1 > > Num Test_Description Status Remaining LifeTime(hours) > > LBA_of_first_error > > # 1 Extended offline Completed: read failure 60% 1198 > > 222342559 > > # 2 Extended offline Completed: read failure 60% 1187 > > 222342557 > > # 3 Extended offline Completed: read failure 60% 1180 > > 222342559 > > # 4 Short offline Completed without error 00% 1178 > > - > > # 5 Extended offline Aborted by host 90% 1178 > > - > > > > and > > > > ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED > > WHEN_FAILED RAW_VALUE > > ... > > Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - > > 0 > > ... > > > > Now can I find out which file owns the LBAs 222342557 and 222342559 ? How do > > I > > force remapping of these sectors? I assume that I have to write something > > directly to the sectors? > > We have this problem from time to time on bunch of machines. As we are using > gmirror, the easiest way is to force re-synchronization (rewrite) of the whole > drive. The problem is when there are Pending unreadable sectors on both drives > - it ends up with read error and some file(s) are corrupted, but there is no > easy way (on FreeBSD) to find what file. *cough* zfs *cough* I believe this kind of silent corruption is precisely what zfs was designed to prevent. Even though you do have a mirror, how do you know which copy is the correct one? If one drive re-allocates the sector silently, what is the recovery method? If gmirror synchronizes, how do you make sure that the *good* copy is the one synchronized? You'll notice it eventually if you see it in a garbled file, but how does the filesystem handle it? > I tried it in the past with fsdb / findblk, but it does not work as I expect > or I do not fully understand the needed calculations with slices + partitions > offsets / LBAs and right meaning of the term "block". It seems there are > several meaning in different contexts. > > It would be nice if somebody with enough FS / GEOM knowledge can write some > HowTo or shell script to do the calculations and operations to find file > containing bad sector(s) and put it in FAQ, Handbook, or Wiki.