Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 03 Sep 2019 15:36:51 +0000
From:      bugzilla-noreply@freebsd.org
To:        bugs@FreeBSD.org
Subject:   [Bug 239801] mfi errors causing zfs checksum errors
Message-ID:  <bug-239801-227-o0DfBYouV5@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-239801-227@https.bugs.freebsd.org/bugzilla/>
References:  <bug-239801-227@https.bugs.freebsd.org/bugzilla/>

next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D239801

--- Comment #3 from marco@tols.org ---
Hi, so my bit more detailed story is this:

We have 2 identical Dell R730xd systems with each 12 6TB drives in them,
running on raidz2.  It also has 2 SSD's in it which add to the system as ZIL
and cache drives.

Both have been running 11.1-RELEASE which gave no problems.  The systems al=
so
have been on 11.2-RELEASE, also without any problems.  In between the upgra=
des
I have also done "zpool upgrade" where available.

Then I upgraded to version 11.3-RELEASE, and trusting that all was as flawl=
ess
as in the first 2 years, I gave it not much attention other then keeping an=
 eye
on it from the monitoring host.  Unfortunately we only monitor zpool status,
which has been ONLINE throughout the entire process.

I left it running for quite a while when at some point I wanted to show the
pool to someone and found out both systems had a few 100K checksum errors on
each of the 12 drives in the pool.  Not on the SSDs that make up the ZIL and
cache.

One of the systems was running 11.3-RELEASE-p1, and the other was running
11.3-RELEASE-p3.

My path to fixing this was this:
- Google for the issue, and find out mrsas was a potential fix
- Change the system on p3 to mrsas and reboot
- Do a zpool scrub to autoheal all broken sectors, which ended up fixing al=
most
100GB of data
- Do a zpool clear to clear the counters
- Do another zpool scrub to see if the counters remain 0, which they did.
- Do the same on the other system, which was on p1, and bring it to p3 in t=
he
process

Hope this helps,

Marco van Tol

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-239801-227-o0DfBYouV5>