Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 9 Aug 2009 13:14:32 +0200
From:      Jason Edwards <sub.mesa@gmail.com>
To:        freebsd-fs@freebsd.org
Subject:   ZFS corruption on 8-CURRENT
Message-ID:  <883b2dc50908090414o71bc5fc2q5aef64c2b5da653e@mail.gmail.com>

next in thread | raw e-mail | index | archive | help
Hi guys,

I'm investigating some weird corruption issue. After filling up my 8-disk
RAID-Z pool with data and using it for a few weeks, it started to show me
this:


# zpool status sub
  pool: sub
 state: UNAVAIL
status: One or more devices could not be used because the label is missing
        or invalid.  There are insufficient replicas for the pool to
continue
        functioning.
action: Destroy and re-create the pool from a backup source.
   see: http://www.sun.com/msg/ZFS-8000-5E
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        sub         UNAVAIL      0     0     0  insufficient replicas
          raidz1    UNAVAIL      0     0     0  insufficient replicas
            ad14a   FAULTED      0     0     0  corrupted data
            ad8a    ONLINE       0     0     0
            ad10a   ONLINE       0     0     0
            ad10a   FAULTED      0     0     0  corrupted data
            ad18a   FAULTED      0     0     0  corrupted data
            ad12a   FAULTED      0     0     0  corrupted data
            ad16a   FAULTED      0     0     0  corrupted data
            ad8a    FAULTED      0     0     0  corrupted data


oops? What happened here? Besides the "corrupted data" it can also be seen
ad10a is displayed twice, one online and one failed.
After rebooting, it shows a little cleaner, but it found a problem with the
ZIL:


# zpool status sub
  pool: sub
 state: FAULTED
status: An intent log record could not be read.
        Waiting for adminstrator intervention to fix the faulted pool.
action: Either restore the affected device(s) and run 'zpool online',
        or ignore the intent log records by running 'zpool clear'.
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        sub          FAULTED      0     0     0  bad intent log
          raidz1    ONLINE       0     0     0
            ad14a   ONLINE       0     0     0
            ad4a    ONLINE       0     0     0
            ad6a    ONLINE       0     0     0
            ad10a   ONLINE       0     0     0
            ad18a   ONLINE       6     0     0
            ad12a   ONLINE       0     0     0
            ad16a   ONLINE       0     0     0
            ad8a    ONLINE       0     0     0


Additionally, i got some read errors on ad18. But since this is a raid-z i
guess one disk alone cannot corrupt/fail the entire array.
Before i do any actions that might be destructive, anybody has a clue what
happened here and how i can prevent this in the future?
Box is a quadcore X4 9350e with 6GB RAM and its running 8-CURRENT as of July
21th 2009 (after 8.0-BETA2). It did work correctly before upgrading CURRENT
to a newer date. Maybe some bug slipped in?

Kind regards,
sub



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?883b2dc50908090414o71bc5fc2q5aef64c2b5da653e>