Date: Wed, 13 Jun 2007 21:08:48 -0700 From: Jeremy Chadwick <koitsu@FreeBSD.org> To: Matthew Hagerty <matthew@digitalstratum.com> Cc: freebsd-hackers@freebsd.org Subject: Re: Disk block or sector to file mapping? Message-ID: <20070614040848.GA30741@eos.sc1.parodius.com> In-Reply-To: <4670B27B.6060606@digitalstratum.com> References: <4670B27B.6060606@digitalstratum.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Jun 13, 2007 at 11:14:03PM -0400, Matthew Hagerty wrote: > Greetings, > > I have a drive that failed and fsck and dump both report the failed sector > or block (the term seems to be used interchangeably at times), but how can I > find out what file(s) were using that block? I have a file-based backup and > I could possibly replace the bad files if I know which ones were affected by > the bad blocks. There's apparently a way to work out what block on the disk is used by a specific inode using some math and numerous parameters taken from the drive, filesystems, and other such things. It might be mentioned in the URL I've included below (for Linux though, not BSD), so I'd peek there. Anyways, I'd do the following: * Run the disk manufacturer's native disk analysis utility. Many of them will do some extra magic (particularly for PATA/SATA disks; with SCSI there's no magic, you can do it yourself by manipulating the grown defect list) to try and work around a full bad block/remapped sector list. Besides, when RMA'ing the disk, the manu. will usually ask if you've run their analysis tool and what the result was. * You might be able to use smartctl (ports/sysutils/smartmontools) to run a selective LBA test (smartctl -t select,X-Y /dev/adN, where X-Y are starting and ending LBAs to do checks on). Not all drives support this though. If select isn't permitted, you can try -t long which should work on most disks, but scans the entire disk (takes a long time). Then you can use smartctl -a /dev/adN and see if the last test you ran was successful or if an error was encountered, hopefully what LBA it's at. This document might also come in handy: http://smartmontools.sourceforge.net/badblockhowto.html * There's also ports/sysutils/drivecheckd which I've never used, but looks like it might possibly provide more detailed info. * The purpose of doing any of the above is to try and get the drive mark the block in question as bad, thus not access it any longer. It may have already done that when the OS reported an issue[1]. That should (hopefully) cause fsck to notice inconsistencies in filesystem data, and give you a filename that used the aforementioned block, telling you the file is inaccessible or should move to lost+found and so on. (I'm sure someone will correct me on the last part :) ) * Now try fsck -f on each unmounted filesystem and see if any errors come up, with filenames referenced. Realistically, what we need on FreeBSD is a tool similar to Solaris's format(8) "analyze" command, which does a raw disk scan (r, r/w, and a couple other operations). For those not familiar with it, I'll include a sample session of a disk being analysed at the bottom of this Email. Sorry if this is too verbose, but I quite often deal with disks going bad during my day job. [1] - If the OS is seeing bad blocks on a PATA/SATA disk, usually it means that the internal remapping table is full, which means that there were other bad blocks on the disk which it has silently remapped for you to avoid pain -- and space for those blocks has been exhausted. Sometimes you can work around this as mentioned, but most of the time you can't, and you're stuck simply replacing the disk entirely. Bad blocks have a tendency to spread too... -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | bash# format Searching for disks...done AVAILABLE DISK SELECTIONS: 0. c0t0d0 <DEFAULT cyl 4464 alt 2 hd 255 sec 63> /pci@0,0/pci8086,2543@2/pci8086,1460@1d/pci9005,ffff@4/sd@0,0 Specify disk (enter its number): 0 selecting c0t0d0 [disk formatted] Warning: Current Disk has mounted partitions. FORMAT MENU: disk - select a disk type - select (define) a disk type partition - select (define) a partition table current - describe the current disk format - format and analyze the disk fdisk - run the fdisk program repair - repair a defective sector label - write label to the disk analyze - surface analysis defect - defect list management backup - search for backup labels verify - read and display labels save - save new disk/partition definitions inquiry - show vendor, product and revision volname - set 8-character volume name !<cmd> - execute <cmd>, then return quit format> analyze ANALYZE MENU: read - read only test (doesn't harm SunOS) refresh - read then write (doesn't harm data) test - pattern testing (doesn't harm data) write - write then read (corrupts data) compare - write, read, compare (corrupts data) purge - write, read, write (corrupts data) verify - write entire disk, then verify (corrupts data) print - display data buffer setup - set analysis parameters config - show analysis parameters !<cmd> - execute <cmd> , then return quit analyze> setup Analyze entire disk[yes]? Loop continuously[no]? Enter number of passes[2]: Repair defective blocks[yes]? Stop after first error[no]? yes Use random bit patterns[no]? yes Enter number of blocks per transfer[126, 0/2/0]: Verify media after formatting[yes]? Enable extended messages[no]? Restore defect list[yes]? Restore disk label[yes]? analyze> read Ready to analyze (won't harm SunOS). This takes a long time, but is interruptable with CTRL-C. Continue? y pass 0 ^C 17/59/0 Total of 0 defective blocks repaired. analyze>
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070614040848.GA30741>