Date: Tue, 16 Nov 2010 02:01:58 -0500 (EST) From: Terry Kennedy <TERRY@tmk.com> To: Michael DeMan <freebsd@deman.com> Cc: freebsd-fs@freebsd.org, freebsd-stable@freebsd.org Subject: Re: ZFS panic after replacing log device Message-ID: <01NUB3IOMZJW00BNN4@tmk.com> In-Reply-To: "Your message dated Mon, 15 Nov 2010 22:55:11 -0800" <E7621997-3485-43A2-A2EE-A11574054FF6@deman.com> References: <01NUB1F8POL000BNN4@tmk.com>
next in thread | previous in thread | raw e-mail | index | archive | help
> I am no ZFS kernel-code dude or anything, but it is well known that losing > the ZIL can corrupt things pretty bad with ZFS. First, thanks for writing back! I agree that this could be the problem. As I mentioned in my original post, I followed the steps recommended by "zpool status" - clearing the device and then doing a replace. The fix may be as simple as testing for whether the de- vice in question is a log device and if so, erroring out with "You can't do that". Also note that multiple scrubs pass with no errors detected - it is only writes that trigger the panic. It looks like something isn't being cleaned up in the clear / replace path. I would save a crash dump for people to look at, but unfortunately the last time a crash dump actually worked for me (on dozens of systems) was back in the FreeBSD 6.2 days. There wasn't any data corruption (the filesystem was not being written at the time the log device failed) - I have my own checksum files written by the sysutils/cfv port, and the data all matches. > All in all, if I was in your situation I would give a whirl at installing > OpenSolaris and going from there, being sure not to upgrade the pool vers- > ion past what is supported by FreeBSD and going from there. I have the data on another server (see my prior "snapshots are not back- ups" discussion on freebsd-stable if interested). So, fortunately, this is not a case of data recovery. > Unfortunately we all find ourselves in a bit of a pickle with ZFS right > now with the Oracle acquisition of Sun. For myself, I would stick with > deploying on FreeBSD but I think its going to be FBSD 9.1 before its go- > ing to be truly ready for production. The problem with hardware on the leading edge is that the software often needs time to catch up. In this particular case, the ZFS pool is 32TB. I can't begin to imagine how long a UFS fsck would take on such a partition, even if it were possible to create one. It was bad enough on the previous generation of my servers (2TB UFS partitions). Terry Kennedy http://www.tmk.com terry@tmk.com New York, NY USA
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?01NUB3IOMZJW00BNN4>