Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 13 Jun 2007 21:08:48 -0700
From:      Jeremy Chadwick <koitsu@FreeBSD.org>
To:        Matthew Hagerty <matthew@digitalstratum.com>
Cc:        freebsd-hackers@freebsd.org
Subject:   Re: Disk block or sector to file mapping?
Message-ID:  <20070614040848.GA30741@eos.sc1.parodius.com>
In-Reply-To: <4670B27B.6060606@digitalstratum.com>
References:  <4670B27B.6060606@digitalstratum.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Jun 13, 2007 at 11:14:03PM -0400, Matthew Hagerty wrote:
>  Greetings,
> 
>  I have a drive that failed and fsck and dump both report the failed sector 
>  or block (the term seems to be used interchangeably at times), but how can I 
>  find out what file(s) were using that block?  I have a file-based backup and 
>  I could possibly replace the bad files if I know which ones were affected by 
>  the bad blocks.

There's apparently a way to work out what block on the disk is used
by a specific inode using some math and numerous parameters taken
from the drive, filesystems, and other such things.  It might be
mentioned in the URL I've included below (for Linux though, not BSD),
so I'd peek there.

Anyways, I'd do the following:

* Run the disk manufacturer's native disk analysis utility.  Many of
them will do some extra magic (particularly for PATA/SATA disks; with
SCSI there's no magic, you can do it yourself by manipulating the grown
defect list) to try and work around a full bad block/remapped sector
list.  Besides, when RMA'ing the disk, the manu. will usually ask if
you've run their analysis tool and what the result was.

* You might be able to use smartctl (ports/sysutils/smartmontools) to
run a selective LBA test (smartctl -t select,X-Y /dev/adN, where X-Y are
starting and ending LBAs to do checks on).  Not all drives support this
though.  If select isn't permitted, you can try -t long which should
work on most disks, but scans the entire disk (takes a long time).  Then
you can use smartctl -a /dev/adN and see if the last test you ran was
successful or if an error was encountered, hopefully what LBA it's at.
This document might also come in handy:

  http://smartmontools.sourceforge.net/badblockhowto.html

* There's also ports/sysutils/drivecheckd which I've never used, but
looks like it might possibly provide more detailed info.

* The purpose of doing any of the above is to try and get the drive
mark the block in question as bad, thus not access it any longer.  It
may have already done that when the OS reported an issue[1].  That
should (hopefully) cause fsck to notice inconsistencies in filesystem
data, and give you a filename that used the aforementioned block,
telling you the file is inaccessible or should move to lost+found and so
on.  (I'm sure someone will correct me on the last part :) )

* Now try fsck -f on each unmounted filesystem and see if any errors
come up, with filenames referenced.

Realistically, what we need on FreeBSD is a tool similar to Solaris's
format(8) "analyze" command, which does a raw disk scan (r, r/w, and a
couple other operations).  For those not familiar with it, I'll include
a sample session of a disk being analysed at the bottom of this Email.

Sorry if this is too verbose, but I quite often deal with disks going
bad during my day job.

[1] - If the OS is seeing bad blocks on a PATA/SATA disk, usually it means
that the internal remapping table is full, which means that there were
other bad blocks on the disk which it has silently remapped for you to
avoid pain -- and space for those blocks has been exhausted.  Sometimes
you can work around this as mentioned, but most of the time you can't,
and you're stuck simply replacing the disk entirely.  Bad blocks have a
tendency to spread too...

-- 
| Jeremy Chadwick                                    jdc at parodius.com |
| Parodius Networking                           http://www.parodius.com/ |
| UNIX Systems Administrator                      Mountain View, CA, USA |
| Making life hard for others since 1977.                  PGP: 4BD6C0CB |

bash# format
Searching for disks...done

AVAILABLE DISK SELECTIONS:
       0. c0t0d0 <DEFAULT cyl 4464 alt 2 hd 255 sec 63>
          /pci@0,0/pci8086,2543@2/pci8086,1460@1d/pci9005,ffff@4/sd@0,0
Specify disk (enter its number): 0
selecting c0t0d0
[disk formatted]
Warning: Current Disk has mounted partitions.

FORMAT MENU:
        disk       - select a disk
        type       - select (define) a disk type
        partition  - select (define) a partition table
        current    - describe the current disk
        format     - format and analyze the disk
        fdisk      - run the fdisk program
        repair     - repair a defective sector
        label      - write label to the disk
        analyze    - surface analysis
        defect     - defect list management
        backup     - search for backup labels
        verify     - read and display labels
        save       - save new disk/partition definitions
        inquiry    - show vendor, product and revision
        volname    - set 8-character volume name
        !<cmd>     - execute <cmd>, then return
        quit
format> analyze

ANALYZE MENU:
        read     - read only test   (doesn't harm SunOS)
        refresh  - read then write  (doesn't harm data)
        test     - pattern testing  (doesn't harm data)
        write    - write then read      (corrupts data)
        compare  - write, read, compare (corrupts data)
        purge    - write, read, write   (corrupts data)
        verify   - write entire disk, then verify (corrupts data)
        print    - display data buffer
        setup    - set analysis parameters
        config   - show analysis parameters
        !<cmd>   - execute <cmd> , then return
        quit

analyze> setup
Analyze entire disk[yes]?
Loop continuously[no]?
Enter number of passes[2]:
Repair defective blocks[yes]?
Stop after first error[no]? yes
Use random bit patterns[no]? yes
Enter number of blocks per transfer[126, 0/2/0]:
Verify media after formatting[yes]?
Enable extended messages[no]?
Restore defect list[yes]?
Restore disk label[yes]?

analyze> read
Ready to analyze (won't harm SunOS). This takes a long time,
but is interruptable with CTRL-C. Continue? y

        pass 0
^C 17/59/0
Total of 0 defective blocks repaired.
analyze>




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070614040848.GA30741>