Date: Wed, 3 Oct 2012 01:20:14 -0400 From: Alan Gerber <unlateral@gmail.com> To: freebsd-fs@freebsd.org Subject: Please help: trying to determine how to resurrect a ZFS pool Message-ID: <CABvG6ETbrsGDkf2m5Dm%2BYfD2rEvc2DwTgMOZG2PBi_i-UH9xTw@mail.gmail.com>
next in thread | raw e-mail | index | archive | help
All, Apologies if I've sent this to the wrong list. I had a kernel panic take down a machine earlier this evening that has been running a ZFS pool stably since the feature first became available back in the 7.x days. Today, that system is running 8.3. I'm hoping for a pointer that will help me recover this pool, or at least some of the data from it. I'd certainly like to hear something other than "your pool is hosed!" ;) Anyway, once the system came back online after the panic, ZFS showed that it had lost a number of devices: hss01fs# zpool status pool: storage state: UNAVAIL status: One or more devices could not be opened. There are insufficient replicas for the pool to continue functioning. action: Attach the missing device and online it using 'zpool online'. see: http://www.sun.com/msg/ZFS-8000-3C scan: none requested config: NAME STATE READ WRITE CKSUM storage UNAVAIL 0 0 0 raidz1-0 ONLINE 0 0 0 ad18 ONLINE 0 0 0 ad14 ONLINE 0 0 0 ad16 ONLINE 0 0 0 raidz1-2 UNAVAIL 0 0 0 13538029832220131655 UNAVAIL 0 0 0 was /dev/da4 7801765878003193608 UNAVAIL 0 0 0 was /dev/da6 8205912151490430094 UNAVAIL 0 0 0 was /dev/da5 raidz1-3 DEGRADED 0 0 0 da0 ONLINE 0 0 0 da1 ONLINE 0 0 0 9503593162443292907 UNAVAIL 0 0 0 was /dev/da2 As you can see, the big problem is the loss of the raidz1-2 vdev. The catch is that all of the missing devices are in fact present on the system and fully operational. I've tried moving these devices to different physical drive slots, reseating the drives, zfs import -F and everything else I can think of to make the four missing devices show up again. Inspecting the labels on the various devices shows what I would expect to see: hss01fs# zdb -l /dev/da4 -------------------------------------------- LABEL 0 -------------------------------------------- version: 28 name: 'storage' state: 0 txg: 14350975 pool_guid: 14645280560957485120 hostid: 666199208 hostname: 'hss01fs' top_guid: 11177432030203081903 guid: 17379190273116326394 hole_array[0]: 1 vdev_children: 4 vdev_tree: type: 'raidz' id: 2 guid: 11177432030203081903 nparity: 1 metaslab_array: 4097 metaslab_shift: 32 ashift: 9 asize: 750163329024 is_log: 0 create_txg: 11918593 children[0]: type: 'disk' id: 0 guid: 4427378272884026385 path: '/dev/da7' phys_path: '/dev/da7' whole_disk: 1 DTL: 4104 create_txg: 11918593 children[1]: type: 'disk' id: 1 guid: 17379190273116326394 path: '/dev/da6' phys_path: '/dev/da6' whole_disk: 1 DTL: 4107 create_txg: 11918593 children[2]: type: 'disk' id: 2 guid: 6091017181957750886 path: '/dev/da3' phys_path: '/dev/da3' whole_disk: 1 DTL: 4101 create_txg: 11918593 [labels 1-3 with identical output values snipped] If I look at one of the operational drives that ZFS recognizes, such as /dev/ad18, I see the same transaction group value present. I've done enough digging to realize that at this point the problem is likely that the GUID entries for each disk are not matching up with what ZFS is expecting from the disk. But I'm not sure what to do about it. If one of you fine folks could please point me in the direction of recovering this pool, I'd greatly appreciate it! -- Alan Gerber
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CABvG6ETbrsGDkf2m5Dm%2BYfD2rEvc2DwTgMOZG2PBi_i-UH9xTw>