Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 19 Jul 2012 18:05:32 +0100
From:      Dr Joe Karthauser <joe@tao.org.uk>
To:        James Snow <snow@teardrop.org>
Cc:        "freebsd-stable@freebsd.org" <freebsd-stable@freebsd.org>
Subject:   Re: Checksum errors across ZFS array
Message-ID:  <002D6A20-D2A4-4909-B2EA-3DB562326050@tao.org.uk>
In-Reply-To: <20120719152909.GL32960@teardrop.org>
References:  <20120719152909.GL32960@teardrop.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Hi James,

It's almost definitely a memory problem. I'd change it ASAP if I were you.

I lost about 70mb from my zfs pool for this very reason just a few weeks ago=
. Luckily I had enough snapshots from before the rot set in to recover most o=
f what I lost.

Joe

--=20
Dr Joe Karthauser

On 19 Jul 2012, at 16:29, James Snow <snow@teardrop.org> wrote:

> I have a ZFS server on which I've seen periodic checksum errors on
> almost every drive. While scrubbing the pool last night, it began to
> report unrecoverable data errors on a single file.
>=20
> I compared an md5 of the supposedly corrupted file to an md5 of the
> original copy, stored on different media. They were the same, suggesting
> no corruption.
>=20
> A large file was being written to the pool while the scrub was in
> progress, and the entire array became unresponsive. The OS was still up,
> but 'zpool status' showed the scrub progress stuck at the same spot,
> with the throughput rate falling. 'shutdown -r now' stalled. Eventually
> I hard power cycled the system.
>=20
> Now, attempting to read the file that ZFS reports errors on yields
> "Input/output error." The scrub completed, with the following result:
>=20
>        NAME         STATE     READ WRITE CKSUM
>        tank         ONLINE       0     0     7
>          mirror-0   ONLINE       0     0     0
>            aacd0p1  ONLINE       0     0     0
>            aacd4p1  ONLINE       0     0     1
>          mirror-1   ONLINE       0     0     0
>            aacd1p1  ONLINE       0     0     0
>            aacd5p1  ONLINE       0     0     0
>          mirror-2   ONLINE       0     0    14
>            aacd2p1  ONLINE       0     0    14
>            aacd6p1  ONLINE       0     0    14
>          mirror-3   ONLINE       0     0     0
>            aacd3p1  ONLINE       0     0     0
>            aacd7p1  ONLINE       0     0     0
>=20
> The system configuration is as follows:
>=20
> Controller:  Adaptec 2805=20
> Motherboard: Supermicro X8STE
> Drive Cage:  2x Supermicro CSE-M35T-1
> Memory:      2x Kingston 12GB ECC (KVR1066D3E7SK3/12G)
> PSU:         Nexus RX-7000
> OS:          9.0-RELEASE-p3
> ZFS:         ZFS filesystem version 5, ZFS storage pool version 28
>=20
>=20
> The Adaptec card has 2 ports, each of which uses a 4-port fan-out cable.
> The cables are routed as shown:
>=20
>      /--- aacd0 (ST1000DM003-9YN1 CC4D)
>     / /-- aacd1 (ST1000DM003-9YN1 CC4D)
> p1-----
>     \ \-- aacd2 (WDC WD1001FALS-0 05.0)
>      \--- aacd3 (WDC WD1001FALS-0 05.0)
>=20
>      /--- aacd4 (ST1000DM003-9YN1 CC4D)
>     / /-- aacd5 (ST1000DM003-9YN1 CC4D)
> p2-----
>     \ \-- aacd6 (WDC WD1002FAEX-0 05.0)
>      \--- aacd7 (WDC WD1002FAEX-0 05.0)
>=20
> You can see that each ZFS mirror device is comprised of one drive from
> each drive carrier, on separate ports, on separate cables.
>=20
> Since I have seen periodic checksum errors on almost every drive but the
> only common component is the Adapter controller and the motherboard, I
> suspect the controller. (Or the motherboard, but I'm starting with the
> controller since it's much simpler to swap out.)
>=20
> Could it be something else? What else I should be looking at? Any input
> greatly appreciated.
>=20
>=20
> -Snow
>=20
> _______________________________________________
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"
>=20



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?002D6A20-D2A4-4909-B2EA-3DB562326050>