Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 14 Oct 2009 08:21:07 +0200
From:      Pawel Jakub Dawidek <pjd@FreeBSD.org>
To:        Alex Trull <alextzfs@googlemail.com>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: zraid2 loses a single disk and becomes difficult to recover
Message-ID:  <20091014062107.GB1696@garage.freebsd.pl>
In-Reply-To: <4d98b5320910121249q36c68b8vf63ec27cf4bb94c9@mail.gmail.com>
References:  <4d98b5320910110741w794c154cs22b527485c1938da@mail.gmail.com> <4d98b5320910110927o62f8f588r9acdeb40a19587ea@mail.gmail.com> <4d98b5320910121249q36c68b8vf63ec27cf4bb94c9@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help

--XF85m9dhOBO43t/C
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Mon, Oct 12, 2009 at 08:49:37PM +0100, Alex Trull wrote:
> I managed to cleanly recover all critical data by cloning the most recent
> snapshots of all my filesystems (which worked even for those filesystems
> that had disappeared from 'zfs list') - and moving back to ufs2
>=20
> The 'live' filesystems since the snapshots had pretty much gone corrupt.
>=20
> Intereresting note is that even if I promoted those clones - if the system
> was rebooted the contents of the snapshots became gobbledygooked (invalid
> byte sequence errors on numerous files).
>=20
> As it stands I managed to recover 100% of the data, so I'm out the woods.

I'm glad to hear that.

> How does a dual-parity array lose its mind when only one disk is lost ?
> Might it have been related to the old TXGid I found on ad16 and ad17 ?

Yes, definiately. For some reason ZFS didn't update txg on those two
disks, so at this point you were running without parity. The problem is
that ZFS didn't start resilver automatically and also didn't report this
situation properly. I think I saw this in the past. Running 'zpool scrub'
on this pool will trigger resilver. There must be a bug. I tried to
reproduce it by modifying the code not to update txg on one of the
components. There are three places where this can happen on sytem
crash/power failure and I tried all of them - no luck, ZFS was able to
recover properly.

It would be good idea to run 'zpool scrub' on regular basis, even if
only to see if it won't trigger resilver (it can be stopped after few
minutes with 'zpool scrub -s'). Of course it is adviced to run full
scrub from time to time.

Do you have this pool around still?

--=20
Pawel Jakub Dawidek                       http://www.wheel.pl
pjd@FreeBSD.org                           http://www.FreeBSD.org
FreeBSD committer                         Am I Evil? Yes, I Am!

--XF85m9dhOBO43t/C
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.4 (FreeBSD)

iD8DBQFK1W3SForvXbEpPzQRAntPAKDRIJlRaFazDnVyQ836Zgksdeg7+wCgzV+Z
3+DBuZkOEgeihv4p3OXMyYI=
=JN8d
-----END PGP SIGNATURE-----

--XF85m9dhOBO43t/C--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20091014062107.GB1696>