Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 7 Jan 2008 10:44:13 +0800
From:      "Tz-Huan Huang" <tzhuan@csie.org>
To:        "Brooks Davis" <brooks@freebsd.org>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: ZFS i/o errors - which disk is the problem?
Message-ID:  <6a7033710801061844m59f8c62dvdd3eea80f6c239c1@mail.gmail.com>
In-Reply-To: <20080103171825.GA28361@lor.one-eyed-alien.net>
References:  <477B16BB.8070104@freebsd.org> <20080102070146.GH49874@cicely12.cicely.de> <477B8440.1020501@freebsd.org> <200801031750.31035.peter.schuller@infidyne.com> <477D16EE.6070804@freebsd.org> <20080103171825.GA28361@lor.one-eyed-alien.net>

next in thread | previous in thread | raw e-mail | index | archive | help
2008/1/4, Brooks Davis <brooks@freebsd.org>:
>
> We've definitely seen cases where hardware changes fixed ZFS checksum errors.
> In once case, a firmware upgrade on the raid controller fixed it.  In another
> case, we'd been connecting to an external array with a SCSI card that didn't
> have a PCI bracket and the errors went away when the replacement one arrived
> and was installed.  The fact that there were significant errors caught by ZFS
> was quite disturbing since we wouldn't have found them with UFS.

Hi,

We have a nfs server using zfs with the similar problem.
The box is i386 7.0-PRERELEASE with 3G ram:

# uname -a
FreeBSD cml3 7.0-PRERELEASE FreeBSD 7.0-PRERELEASE #2:
Sat Jan  5 14:42:41 CST 2008 root@cml3:/usr/obj/usr/src/sys/CML2  i386

The zfs pool contains 3 raids now:

2007-11-20.11:49:17 zpool create pool /dev/label/proware263
2007-11-20.11:53:31 zfs create pool/project
... (zfs create other filesystems) ...
2007-11-20.11:54:32 zfs set atime=off pool
2007-12-08.22:59:15 zpool add pool /dev/da0
2008-01-05.21:20:03 zpool add pool /dev/label/proware262

After a power loss yesterday, the zfs status shows

# zpool status -v
  pool: pool
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: scrub completed with 231 errors on Mon Jan  7 08:05:35 2008
config:

        NAME                STATE     READ WRITE CKSUM
        pool                ONLINE       0     0   516
          label/proware263  ONLINE       0     0   231
          da0               ONLINE       0     0   285
          label/proware262  ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        /system/database/mysql/flickr_geo/flickr_raw_tag.MYI
        pool/project:<0x0>
        pool/home/master/96:<0xbf36>

The main problem is that we cannot mount pool/project any more:

# zfs mount pool/project
cannot mount 'pool/project': Input/output error
# grep ZFS /var/log/messages
Jan  7 10:08:35 cml3 root: ZFS: zpool I/O failure, zpool=pool error=86
(repeat many times)

There are many data in pool/project, probably 3.24T. zdb shows

# zdb pool
...
Dataset pool/project [ZPL], ID 33, cr_txg 57, 3.24T, 22267231 objects
...

(zdb is still running now, we can provide the output if helpful)

Is there any way to recover any data from pool/project?
Thank you very much.

Sincerely,
Tz-Huan



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?6a7033710801061844m59f8c62dvdd3eea80f6c239c1>