Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 10 Apr 2012 14:34:00 +0300
From:      Andriy Gapon <avg@FreeBSD.org>
To:        Rumen Telbizov <telbizov@gmail.com>
Cc:        freebsd-stable@FreeBSD.org
Subject:   Re: ZFS: can't read MOS
Message-ID:  <4F841AA8.3030602@FreeBSD.org>
In-Reply-To: <CAENR%2B_X6gb5TB01i3FTfq_zD=RyFUGfLAWwA56SNm6Gqf_49iw@mail.gmail.com>
References:  <CAENR%2B_X6gb5TB01i3FTfq_zD=RyFUGfLAWwA56SNm6Gqf_49iw@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
on 09/04/2012 21:50 Rumen Telbizov said the following:
> Hello everyone,
> 
> I have a ZFS FreeBSD 8.2-STABLE (Aug 30, 2011) that I am having issues with
> and might use some help.
> 
> In a nutshell, this machine has been running fine for about a year and a
> half but after a recent power
> outage (complete colo blackout) I can't boot of the ZFS pool any more.
> Here's the error I get (attached screenshot as well):
> 
> ZFS: i/o error - all block copies unavailable
> ZFS: can't read MOS
> ZFS: unexpected object set type 0
> ZFS: unexpected object set type 0
> 
> FreeBSD/x86 boot
> Default: zroot:/boot/kernel/kernel
> boot: ZFS: unexpected object set type 0
> 
> I've been searching the net high and low for an actual solution but all the
> threads end up nowhere.
> I hope I can get some clue here. Thanks in advance.

Not sure if the following could be of any help to you but
${SRC}/tools/tools/zfsboottest utility can help diagnosing and debugging such
issues from userland (without requiring a reboot).

See also a small nitpick below.

> Here's the relevant hardware configuration of this box (serves as a backup
> box).
> 
>    - SuperMicro 4U + another 4U totalling 48 x 2TB disks
>    - Hardware raid LSI 9261-8i holding both shelves giving 1 mfid0 device
>    to the OS
>    - Hardware raid 60 -- 6 x 8 raid6 groups
>    - ZFS with gptzfsboot installed on the "single" mfid0 device. Partition
>    table is:
> 
> [root@mfsbsd /zroot/etc]# gpart show -l
> =>          34  140554616765  mfid0  GPT  (65T)
>             34           128      1  (null)  (64k)
>            162      33554432      2  swap  (16G)
>       33554594  140521062205      3  zroot  (65T)
> 
> 
> 
>    - boot device is: vfs.root.mountfrom="zfs:zroot" (as per loader.conf)
>    - zpool status is:
> 
> [root@mfsbsd /zroot/etc]# zpool status
>   pool: zroot
>  state: ONLINE
>  scan: scrub canceled on Mon Apr  9 09:48:14 2012
> config:
> 
> NAME        STATE     READ WRITE CKSUM
> zroot       ONLINE       0     0     0
>  mfid0p3   ONLINE       0     0     0
> 
> errors: No known data errors
> 
> 
> 
>    - zpool get all:
> 
> [root@mfsbsd /zroot/etc]# zpool get all zroot
> NAME   PROPERTY       VALUE       SOURCE
> zroot  size           65T         -
> zroot  capacity       36%         -
> zroot  altroot        -           default
> zroot  health         ONLINE      -
> zroot  guid           3339338746696340707  default
> zroot  version        28          default
> *zroot  bootfs         zroot       local*
> zroot  delegation     on          default
> zroot  autoreplace    off         default
> zroot  cachefile      -           default
> zroot  failmode       wait        default
> zroot  listsnapshots  on          local
> zroot  autoexpand     off         default
> zroot  dedupditto     0           default
> zroot  dedupratio     1.00x       -
> zroot  free           41.2T       -
> zroot  allocated      23.8T       -
> zroot  readonly       off         -
> 
> 
> Here's what happened chronologically:
> 
>    - Savvis Toronto blacked out completely for 31 minutes
>    - After power was restored this machine came up with the above error
>    - I managed to PXE boot into mfsbsd successfully and managed to import
>    the pool and access actual data/snapshots - no problem
>    - Shortly after another reboot the hardware raid controller complained
>    that it has lost
>    it's configuration and now sees only half of the disks as foreign good
>    and the
>    rest as foreign bad. BIOS didn't see any boot device.
>    - Spent some time on the phone with LSI and managed to restore the
>    hardware RAID
>    by basically removing any and all configuration, making disks
>    unconfigured good
>    and recreating the array in exactly the same way as I created it in the
>    beginning BUT
>    with the important exception that I did NOT initialize the array.
>    - After this I was back to square one where I could see all the data
>    without any loss
>    (via mfsbsd) but cannot boot of the volume any more.
>    - First thing I tried was to restore the boot loader without any luck:
>       gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 mfid0p1
>    - Then out of desperation, took zfsboot, zfsloader, gptzfsboot from
>    9.0-RELEASE and replaced them in /boot,
>    reinitialized again - no luck
>    - Currently running zdb -ccv zroot to check for any corruptions - I am
>    afraid this will take forever since I have *23.8T* used space. No errors
>    yet
>    - One thing I did notice is that zdb zroot returned the metaslab
>    information line by line very slowly (10-15 seconds a line). I don't know
>    if it's related.
>    - Another thing I tried (saw that in a thread) without any difference
>    whatsoever was:
> 
> # cd src/sys/boot/i386/zfsboot
> # make clean; make cleandir
> # make obj ; make depend ; make
> # cd i386/loader

You probably wanted to do this in i386/zfsloader

> # make install
> # cd /usr/src/sys/boot/i386/zfsboot
> # make install
> # sysctl kern.geom.debugflags=16
> # dd if=/boot/zfsboot of=/dev/da0 count=1
> # dd if=/boot/zfsboot of=/dev/da0 skip=1 seek=1024
> # reboot
> 
> 
> At this point I am contemplating how to evacuate all the data from there or
> better yet put some USB flash to boot from.
> I could provide further details/execute commands if needed. Any help would
> be appreciated.
> 



-- 
Andriy Gapon



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4F841AA8.3030602>