From owner-freebsd-fs@FreeBSD.ORG Sun May 6 12:38:26 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 412881065672 for ; Sun, 6 May 2012 12:38:26 +0000 (UTC) (envelope-from simon@optinet.com) Received: from cobra.acceleratedweb.net (cobra-gw.acceleratedweb.net [207.99.79.37]) by mx1.freebsd.org (Postfix) with SMTP id E1A9B8FC17 for ; Sun, 6 May 2012 12:38:25 +0000 (UTC) Received: (qmail 12043 invoked by uid 110); 6 May 2012 12:38:18 -0000 Received: from ool-4571afe7.dyn.optonline.net (HELO desktop1) (simon@optinet.com@69.113.175.231) by cobra.acceleratedweb.net with SMTP; 6 May 2012 12:38:18 -0000 From: "Simon" To: "Artem Belevich" Date: Sun, 06 May 2012 08:38:18 -0400 Priority: Normal X-Mailer: PMMail 2000 Professional (2.20.2717) For Windows 2000 (5.1.2600;3) In-Reply-To: MIME-Version: 1.0 Message-Id: <20120506123826.412881065672@hub.freebsd.org> Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: "freebsd-fs@freebsd.org" Subject: Re: ZFS Kernel Panics with 32 and 64 bit versions of 8.3 and 9.0 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 06 May 2012 12:38:26 -0000 Are you suggesting that if a disk sector goes bad or memory corrupts few blocks of data, the entire zpool is gonna go bust? can the same occur with a ZRAID? I thought the ZFS was designed to overcome all these issues to begin with. Is this not the case? -Simon On Sat, 5 May 2012 23:11:01 -0700, Artem Belevich wrote: >I believe I've ran into this issue couple three times. In all cases >the culprit was memory corruption. If were to guess, corruption >damaged critical data *before* ZFS calculated checksum and was able to >write it to disk. Once that happened, kernel would panic every time >once the pool was in use. Crashes could happen as soon as zpool import >or as late as after few days of uptime or next scheduled scrub. I even >tried importing/scrubbing the pool on opensolaris without much success >-- while solaris didn't crash outright, it failed to import the pool >with internal assertion. >On Sat, May 5, 2012 at 7:13 PM, Michael Richards wrote: >> Originally I had an 8.1 server setup on a 32bit kernel. The OS is on a >> UFS filesystem and (it's a mail server) the business part of the >> operation is on ZFS. >> >> One day it crashed with an odd kernel panic. I assumed it was a memory >> issue so I had more RAM installed. I tried to get a PAE kernel working >> to use this extra ram but it was crashing every few hours. >> >> Suspecting a hardware issue all the hardware was replaced. >Bad memory could indeed do that. >> I had some difficulty trying to figure out how to mount my old ZFS >> partition but eventually did so. >... >> zpool import -f -R /altroot 10433152746165646153 olddata >> panics the kernel. Similar panic as seen in all the other kernel versions. >> Gives a bit more info about things I've tried. Whatever it is seems to >> affect a wide variety of kernels. >Kernel is just a messenger here. The root cause is that while ZFS does >go an extra mile or two in order to ensure data consistency, there's >only so much it can do if RAM is bad. Once that kind of problem >happened, it may leave the pool in a state that ZFS will not be able >to deal with out of the box. >Not everything may be lost, though. >First of all -- make a copy of your pool, if it's feasible. >Probability of screwing it up even more is rather high. >ZFS internally keeps large number of uberblocks. Each uberblock is >sort of a periodic checkpoint of the pool state after ZFS writes next >transaction group (every 10-40 sec, depending on vfs.zfs.txg.timeout >sysctl, more often if there are a lot of ongoing write activity). >Basically you need to destroy the most recent uberblock to manually >roll-back your ZFS pool. Hopefully, you'll only need to nuke few most >recent ones to restore the pool to the point before corruption ruined >it. >Now, ZFS keeps multiple copies of uberblocks. You will need to nuke >*all* instances of the most recent uberblock in order to roll pool >state backwards. >Solaris internals site seems to have a script to do that now (I wish I >knew about it back when I needed it): >http://www.solarisinternals.com/wiki/index.php/ZFS_forensics_scrollback_script >Good luck! >--Artem >_______________________________________________ >freebsd-fs@freebsd.org mailing list >http://lists.freebsd.org/mailman/listinfo/freebsd-fs >To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"