From owner-freebsd-fs@FreeBSD.ORG  Sun May  6 12:38:26 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 412881065672
	for <freebsd-fs@freebsd.org>; Sun,  6 May 2012 12:38:26 +0000 (UTC)
	(envelope-from simon@optinet.com)
Received: from cobra.acceleratedweb.net (cobra-gw.acceleratedweb.net
	[207.99.79.37]) by mx1.freebsd.org (Postfix) with SMTP id E1A9B8FC17
	for <freebsd-fs@freebsd.org>; Sun,  6 May 2012 12:38:25 +0000 (UTC)
Received: (qmail 12043 invoked by uid 110); 6 May 2012 12:38:18 -0000
Received: from ool-4571afe7.dyn.optonline.net (HELO desktop1)
	(simon@optinet.com@69.113.175.231)
	by cobra.acceleratedweb.net with SMTP; 6 May 2012 12:38:18 -0000
From: "Simon" <simon@optinet.com>
To: "Artem Belevich" <art@freebsd.org>
Date: Sun, 06 May 2012 08:38:18 -0400
Priority: Normal
X-Mailer: PMMail 2000 Professional (2.20.2717) For Windows 2000 (5.1.2600;3)
In-Reply-To: <CAFqOu6gz+Fd-NvPivMz3nfeGCYz0a563yNBOpmsAyHZS_TQybQ@mail.gmail.com>
MIME-Version: 1.0
Message-Id: <20120506123826.412881065672@hub.freebsd.org>
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Cc: "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org>
Subject: Re: ZFS Kernel Panics with 32 and 64 bit versions of 8.3 and 9.0
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 06 May 2012 12:38:26 -0000


Are you suggesting that if a disk sector goes bad or memory corrupts few blocks
of data, the entire zpool is gonna go bust? can the same occur with a ZRAID?
I thought the ZFS was designed to overcome all these issues to begin with. Is
this not the case?

-Simon

On Sat, 5 May 2012 23:11:01 -0700, Artem Belevich wrote:

>I believe I've ran into this issue couple three times. In all cases
>the culprit was memory corruption. If were to guess, corruption
>damaged critical data *before* ZFS calculated checksum and was able to
>write it to disk. Once that happened, kernel would panic every time
>once the pool was in use. Crashes could happen as soon as zpool import
>or as late as after few days of uptime or next scheduled scrub. I even
>tried importing/scrubbing the pool on opensolaris without much success
>-- while solaris didn't crash outright, it failed to import the pool
>with internal assertion.

>On Sat, May 5, 2012 at 7:13 PM, Michael Richards <hackish@gmail.com> wrote:
>> Originally I had an 8.1 server setup on a 32bit kernel. The OS is on a
>> UFS filesystem and (it's a mail server) the business part of the
>> operation is on ZFS.
>>
>> One day it crashed with an odd kernel panic. I assumed it was a memory
>> issue so I had more RAM installed. I tried to get a PAE kernel working
>> to use this extra ram but it was crashing every few hours.
>>
>> Suspecting a hardware issue all the hardware was replaced.

>Bad memory could indeed do that.

>> I had some difficulty trying to figure out how to mount my old ZFS
>> partition but eventually did so.
>...
>> zpool import -f -R /altroot 10433152746165646153 olddata
>> panics the kernel. Similar panic as seen in all the other kernel versions.


>> Gives a bit more info about things I've tried. Whatever it is seems to
>> affect a wide variety of kernels.

>Kernel is just a messenger here. The root cause is that while ZFS does
>go an extra mile or two in order to ensure data consistency, there's
>only so much it can do if RAM is bad. Once that kind of problem
>happened, it may leave the pool in a state that ZFS will not be able
>to deal with out of the box.

>Not everything may be lost, though.

>First of all -- make a copy of your pool, if it's feasible.
>Probability of screwing it up even more is rather high.

>ZFS internally keeps large number of uberblocks. Each uberblock is
>sort of a periodic checkpoint of the pool state after ZFS writes next
>transaction group (every 10-40 sec, depending on vfs.zfs.txg.timeout
>sysctl, more often if there are a lot of ongoing write activity).
>Basically you need to destroy the most recent uberblock to manually
>roll-back your ZFS pool. Hopefully, you'll only need to nuke few most
>recent ones to restore the pool to the point before corruption ruined
>it.

>Now, ZFS keeps multiple copies of uberblocks. You will need to nuke
>*all* instances of the most recent uberblock in order to roll pool
>state backwards.

>Solaris internals site seems to have a script to do that now (I wish I
>knew about it back when I needed it):
>http://www.solarisinternals.com/wiki/index.php/ZFS_forensics_scrollback_script

>Good luck!

>--Artem
>_______________________________________________
>freebsd-fs@freebsd.org mailing list
>http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"