Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 9 Nov 2015 11:54:39 -0800
From:      Sean Chittenden <sean@chittenden.org>
To:        Tim Gustafson <tjg@ucsc.edu>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: ZFS RAID 0+1 Throwing Checksum Errors
Message-ID:  <sig.0755373dd9.AF7A1800-6AA1-4B6D-9446-F51572CC64D9@chittenden.org>
In-Reply-To: <CAPyBAS7oYvp6vvzetcGmrXy0_Qn0fXBN_d510w41CguDZCzMxw@mail.gmail.com>
References:  <CAPyBAS7oYvp6vvzetcGmrXy0_Qn0fXBN_d510w41CguDZCzMxw@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Tim, I've run into this a dozen or so times on servers where their power =
is "dirty" (i.e. home or small offices with small servers that use ZFS). =
 If you plug the box into a UPS to condition the line you may find that =
the checksum errors go away.  It's pretty amazing to see and happens =
with both SSD and spinning rust.  It's not always the case, but it's a =
common enough environmental problem.  Report back if you try this and it =
solves your problem.

-sc



--
Sean Chittenden
sean@chittenden.org

> On Nov 9, 2015, at 11:08, Tim Gustafson <tjg@ucsc.edu> wrote:
>=20
> I have a FreeBSD 10.1 server configured as root-on-zfs with the
> following pool configuration:
>=20
> NAME            STATE     READ WRITE CKSUM
> tank           ONLINE       0     0     0
> mirror-0      ONLINE       0     0     0
>   gpt/zfs0    ONLINE       0     0     0
>   gpt/zfs1    ONLINE       0     0     0
> mirror-1      ONLINE       0     0     0
>   gpt/zfs2    ONLINE       0     0     0
>   gpt/zfs3    ONLINE       0     0     0
>=20
> The disks are each 1TB Samsung 850EVO SSDs connected via an mrsas Dell
> Perc raid controller configured in "RAID Disabled" mode.
>=20
> I run a "zpool scrub" every weekend and every weekend the scrub finds
> a handful (usually between 1 and 10) checksum errors per disk.  The
> scrub fixes the checksum errors, and I clear the counters and
> everything seems fine.  As far as I know, I do not have any corrupt or
> missing data.
>=20
> The server is a fairly busy web and database server, handling about 5
> million hits per day.
>=20
> I'm wondering if the problem is that the scrub is calculating the
> checksum for the data on gpt/zfs0, and while that's happening, some
> data is updated by Apache or MySQL, and then checksum for the data on
> gpt/zfs1 is calculated, which now doesn't match, and therefore the
> scrub is reporting an error.  Is that possible?
>=20
> If that's not it, could this be a bug?  Or should I be worried about
> my SSDs?  What additional data would be helpful for me to share to
> diagnose this?
>=20
> --=20
>=20
> Tim Gustafson
> Technical Lead, Baskin School of Engineering
> tjg@ucsc.edu
> 831-459-5354
> Baskin Engineering, Room 313A
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?sig.0755373dd9.AF7A1800-6AA1-4B6D-9446-F51572CC64D9>