Date: Sun, 06 May 2012 08:38:18 -0400 From: "Simon" <simon@optinet.com> To: "Artem Belevich" <art@freebsd.org> Cc: "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org> Subject: Re: ZFS Kernel Panics with 32 and 64 bit versions of 8.3 and 9.0 Message-ID: <20120506123826.412881065672@hub.freebsd.org> In-Reply-To: <CAFqOu6gz%2BFd-NvPivMz3nfeGCYz0a563yNBOpmsAyHZS_TQybQ@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Are you suggesting that if a disk sector goes bad or memory corrupts few blocks of data, the entire zpool is gonna go bust? can the same occur with a ZRAID? I thought the ZFS was designed to overcome all these issues to begin with. Is this not the case? -Simon On Sat, 5 May 2012 23:11:01 -0700, Artem Belevich wrote: >I believe I've ran into this issue couple three times. In all cases >the culprit was memory corruption. If were to guess, corruption >damaged critical data *before* ZFS calculated checksum and was able to >write it to disk. Once that happened, kernel would panic every time >once the pool was in use. Crashes could happen as soon as zpool import >or as late as after few days of uptime or next scheduled scrub. I even >tried importing/scrubbing the pool on opensolaris without much success >-- while solaris didn't crash outright, it failed to import the pool >with internal assertion. >On Sat, May 5, 2012 at 7:13 PM, Michael Richards <hackish@gmail.com> wrote: >> Originally I had an 8.1 server setup on a 32bit kernel. The OS is on a >> UFS filesystem and (it's a mail server) the business part of the >> operation is on ZFS. >> >> One day it crashed with an odd kernel panic. I assumed it was a memory >> issue so I had more RAM installed. I tried to get a PAE kernel working >> to use this extra ram but it was crashing every few hours. >> >> Suspecting a hardware issue all the hardware was replaced. >Bad memory could indeed do that. >> I had some difficulty trying to figure out how to mount my old ZFS >> partition but eventually did so. >... >> zpool import -f -R /altroot 10433152746165646153 olddata >> panics the kernel. Similar panic as seen in all the other kernel versions. >> Gives a bit more info about things I've tried. Whatever it is seems to >> affect a wide variety of kernels. >Kernel is just a messenger here. The root cause is that while ZFS does >go an extra mile or two in order to ensure data consistency, there's >only so much it can do if RAM is bad. Once that kind of problem >happened, it may leave the pool in a state that ZFS will not be able >to deal with out of the box. >Not everything may be lost, though. >First of all -- make a copy of your pool, if it's feasible. >Probability of screwing it up even more is rather high. >ZFS internally keeps large number of uberblocks. Each uberblock is >sort of a periodic checkpoint of the pool state after ZFS writes next >transaction group (every 10-40 sec, depending on vfs.zfs.txg.timeout >sysctl, more often if there are a lot of ongoing write activity). >Basically you need to destroy the most recent uberblock to manually >roll-back your ZFS pool. Hopefully, you'll only need to nuke few most >recent ones to restore the pool to the point before corruption ruined >it. >Now, ZFS keeps multiple copies of uberblocks. You will need to nuke >*all* instances of the most recent uberblock in order to roll pool >state backwards. >Solaris internals site seems to have a script to do that now (I wish I >knew about it back when I needed it): >http://www.solarisinternals.com/wiki/index.php/ZFS_forensics_scrollback_script >Good luck! >--Artem >_______________________________________________ >freebsd-fs@freebsd.org mailing list >http://lists.freebsd.org/mailman/listinfo/freebsd-fs >To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120506123826.412881065672>