From owner-freebsd-bugs@freebsd.org Tue Sep 3 15:36:52 2019 Return-Path: Delivered-To: freebsd-bugs@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id A5F4CDEDC9 for ; Tue, 3 Sep 2019 15:36:52 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mailman.nyi.freebsd.org (unknown [127.0.1.3]) by mx1.freebsd.org (Postfix) with ESMTP id 46N9zm40PGz3Dd3 for ; Tue, 3 Sep 2019 15:36:52 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: by mailman.nyi.freebsd.org (Postfix) id 875A3DEDC8; Tue, 3 Sep 2019 15:36:52 +0000 (UTC) Delivered-To: bugs@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 871CBDEDC7 for ; Tue, 3 Sep 2019 15:36:52 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 46N9zm2zb5z3Dd1 for ; Tue, 3 Sep 2019 15:36:52 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2610:1c1:1:606c::50:1d]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 4AB735013 for ; Tue, 3 Sep 2019 15:36:52 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org ([127.0.1.5]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id x83FaqgN083492 for ; Tue, 3 Sep 2019 15:36:52 GMT (envelope-from bugzilla-noreply@freebsd.org) Received: (from www@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id x83FaqYL083491 for bugs@FreeBSD.org; Tue, 3 Sep 2019 15:36:52 GMT (envelope-from bugzilla-noreply@freebsd.org) X-Authentication-Warning: kenobi.freebsd.org: www set sender to bugzilla-noreply@freebsd.org using -f From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 239801] mfi errors causing zfs checksum errors Date: Tue, 03 Sep 2019 15:36:51 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 11.3-RELEASE X-Bugzilla-Keywords: regression X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: marco@tols.org X-Bugzilla-Status: Open X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: bugs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 Sep 2019 15:36:52 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D239801 --- Comment #3 from marco@tols.org --- Hi, so my bit more detailed story is this: We have 2 identical Dell R730xd systems with each 12 6TB drives in them, running on raidz2. It also has 2 SSD's in it which add to the system as ZIL and cache drives. Both have been running 11.1-RELEASE which gave no problems. The systems al= so have been on 11.2-RELEASE, also without any problems. In between the upgra= des I have also done "zpool upgrade" where available. Then I upgraded to version 11.3-RELEASE, and trusting that all was as flawl= ess as in the first 2 years, I gave it not much attention other then keeping an= eye on it from the monitoring host. Unfortunately we only monitor zpool status, which has been ONLINE throughout the entire process. I left it running for quite a while when at some point I wanted to show the pool to someone and found out both systems had a few 100K checksum errors on each of the 12 drives in the pool. Not on the SSDs that make up the ZIL and cache. One of the systems was running 11.3-RELEASE-p1, and the other was running 11.3-RELEASE-p3. My path to fixing this was this: - Google for the issue, and find out mrsas was a potential fix - Change the system on p3 to mrsas and reboot - Do a zpool scrub to autoheal all broken sectors, which ended up fixing al= most 100GB of data - Do a zpool clear to clear the counters - Do another zpool scrub to see if the counters remain 0, which they did. - Do the same on the other system, which was on p1, and bring it to p3 in t= he process Hope this helps, Marco van Tol --=20 You are receiving this mail because: You are the assignee for the bug.=