Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 3 Oct 2012 01:20:14 -0400
From:      Alan Gerber <unlateral@gmail.com>
To:        freebsd-fs@freebsd.org
Subject:   Please help: trying to determine how to resurrect a ZFS pool
Message-ID:  <CABvG6ETbrsGDkf2m5Dm%2BYfD2rEvc2DwTgMOZG2PBi_i-UH9xTw@mail.gmail.com>

next in thread | raw e-mail | index | archive | help
All,

Apologies if I've sent this to the wrong list.

I had a kernel panic take down a machine earlier this evening that has
been running a ZFS pool stably since the feature first became
available back in the 7.x days. Today, that system is running 8.3.
I'm hoping for a pointer that will help me recover this pool, or at
least some of the data from it.  I'd certainly like to hear something
other than "your pool is hosed!" ;)

Anyway, once the system came back online after the panic, ZFS showed
that it had lost a number of devices:

hss01fs# zpool status
  pool: storage
 state: UNAVAIL
status: One or more devices could not be opened.  There are insufficient
        replicas for the pool to continue functioning.
action: Attach the missing device and online it using 'zpool online'.
   see: http://www.sun.com/msg/ZFS-8000-3C
  scan: none requested
config:

        NAME                      STATE     READ WRITE CKSUM
        storage                   UNAVAIL      0     0     0
          raidz1-0                ONLINE       0     0     0
            ad18                  ONLINE       0     0     0
            ad14                  ONLINE       0     0     0
            ad16                  ONLINE       0     0     0
          raidz1-2                UNAVAIL      0     0     0
            13538029832220131655  UNAVAIL      0     0     0  was /dev/da4
            7801765878003193608   UNAVAIL      0     0     0  was /dev/da6
            8205912151490430094   UNAVAIL      0     0     0  was /dev/da5
          raidz1-3                DEGRADED     0     0     0
            da0                   ONLINE       0     0     0
            da1                   ONLINE       0     0     0
            9503593162443292907   UNAVAIL      0     0     0  was /dev/da2

As you can see, the big problem is the loss of the raidz1-2 vdev.  The
catch is that all of the missing devices are in fact present on the
system and fully operational.  I've tried moving these devices to
different physical drive slots, reseating the drives, zfs import -F
and everything else I can think of to make the four missing devices
show up again.  Inspecting the labels on the various devices shows
what I would expect to see:

hss01fs# zdb -l /dev/da4
--------------------------------------------
LABEL 0
--------------------------------------------
    version: 28
    name: 'storage'
    state: 0
    txg: 14350975
    pool_guid: 14645280560957485120
    hostid: 666199208
    hostname: 'hss01fs'
    top_guid: 11177432030203081903
    guid: 17379190273116326394
    hole_array[0]: 1
    vdev_children: 4
    vdev_tree:
        type: 'raidz'
        id: 2
        guid: 11177432030203081903
        nparity: 1
        metaslab_array: 4097
        metaslab_shift: 32
        ashift: 9
        asize: 750163329024
        is_log: 0
        create_txg: 11918593
        children[0]:
            type: 'disk'
            id: 0
            guid: 4427378272884026385
            path: '/dev/da7'
            phys_path: '/dev/da7'
            whole_disk: 1
            DTL: 4104
            create_txg: 11918593
        children[1]:
            type: 'disk'
            id: 1
            guid: 17379190273116326394
            path: '/dev/da6'
            phys_path: '/dev/da6'
            whole_disk: 1
            DTL: 4107
            create_txg: 11918593
        children[2]:
            type: 'disk'
            id: 2
            guid: 6091017181957750886
            path: '/dev/da3'
            phys_path: '/dev/da3'
            whole_disk: 1
            DTL: 4101
            create_txg: 11918593
[labels 1-3 with identical output values snipped]

If I look at one of the operational drives that ZFS recognizes, such
as /dev/ad18, I see the same transaction group value present.

I've done enough digging to realize that at this point the problem is
likely that the GUID entries for each disk are not matching up with
what ZFS is expecting from the disk.  But I'm not sure what to do
about it.  If one of you fine folks could please point me in the
direction of recovering this pool, I'd greatly appreciate it!

--
Alan Gerber



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CABvG6ETbrsGDkf2m5Dm%2BYfD2rEvc2DwTgMOZG2PBi_i-UH9xTw>