From nobody Fri Jan 19 20:17:03 2024
X-Original-To: dev-commits-src-all@mlmmj.nyi.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
	by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4TGrW34jVwz57KDr;
	Fri, 19 Jan 2024 20:17:03 +0000 (UTC)
	(envelope-from git@FreeBSD.org)
Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256
	 client-signature RSA-PSS (4096 bits) client-digest SHA256)
	(Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK))
	by mx1.freebsd.org (Postfix) with ESMTPS id 4TGrW32TsNz3wkl;
	Fri, 19 Jan 2024 20:17:03 +0000 (UTC)
	(envelope-from git@FreeBSD.org)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim;
	t=1705695423;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:mime-version:mime-version:content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding;
	bh=p2GJ0ZitXeioPGtP8JgpO00WCICFwbQZacxfQ85Yrm4=;
	b=MZg0Tp5QWmygeUDOCRWRNnXOsdvonrmndUTsLXnjQR4qR/wTIRKkym6tvz6xQt2+c4+J2Y
	6lYplzXE1WZQB9wbirTEOrBn11v6M+kFeBJN+uaqO16bbbtXkDDk2UWTwG5g2O64KB2Ngh
	N/Au1YjVVCEZrWmI9C6nS4ZDOZk2759i7eTAr2AS/nXEPlMiQR3diwFD77qf2UcYKHsIff
	ayQYDYQEdOm2sk5T6wL2bzEeG8hqgQqDAV0R5tM9KX6b9lrU+GelZkYZYZKdRuT2RK+GvX
	18UJ9NlbCgSwjEijH3agT7Tp1PTrwjNvrJah4ilUlxaRs7WC6br/sL1v1mdv7w==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org;
	s=dkim; t=1705695423;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:mime-version:mime-version:content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding;
	bh=p2GJ0ZitXeioPGtP8JgpO00WCICFwbQZacxfQ85Yrm4=;
	b=MhnPOW8cE9kxgpwwCebrRUp1xMO5fyuNAPTSlzfiym+xEyxiW8DdRMOChYvs3poWPHRmcp
	MUs3xXX9iKQXzI/1J521b37r7ou6AdcN0jymR9UnWwcjxodr1S9eUUDbx7UDIbV+smJn3s
	cgZBi7YT1YjKIeGxBnfoehWsy6GDbKg04Lh7N3O8XteUw9i8UA951F/HlDvsXSSC9VE06Q
	EgZPugtOuneTPzLUuSPC35wHbd6KGzYFnc0JTd5zvAkS/Re42ADQVqDKg5eZzQfxGV82Sh
	/dN+YemVAkeoOfOBQkB2O+mpwTWi3QXB+maObrLovmWmXrUa7nG+Be5bW8A/sw==
ARC-Authentication-Results: i=1;
	mx1.freebsd.org;
	none
ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1705695423; a=rsa-sha256; cv=none;
	b=fFwoKm5vBD+Rsa/8tQc49VNqWd5App9cK3lrcO9KsJLtvJXeQr38loFJZxrlzAY5wlNO5i
	ShEAqZjtPjSoyaLaXNDlnLObAlqUZFNFc4EW/ot1b4da3cSiFI6GQRGdqexNL1V3wMTbxz
	cyeHA2vxYU3Vj/YxvGyrssbqzqHdayh9GOQUv/QWbItwuhOrJwgo+lnBPx1MLN7PiaYney
	m3rPylvxWY/EQUE7KDwO0Z5eBiA/4tQDLs1mjKMExErG6IZWxWvwKVnmXShuEAg5cS6RHa
	HO1aqiozHNSSUziXwGaob7+aw/ttNXaKWCQD85LvdUvDr88scZYX6rKrirV7rw==
Received: from gitrepo.freebsd.org (gitrepo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:5])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256)
	(Client did not present a certificate)
	by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 4TGrW31Ylwz156F;
	Fri, 19 Jan 2024 20:17:03 +0000 (UTC)
	(envelope-from git@FreeBSD.org)
Received: from gitrepo.freebsd.org ([127.0.1.44])
	by gitrepo.freebsd.org (8.17.1/8.17.1) with ESMTP id 40JKH37m000094;
	Fri, 19 Jan 2024 20:17:03 GMT
	(envelope-from git@gitrepo.freebsd.org)
Received: (from git@localhost)
	by gitrepo.freebsd.org (8.17.1/8.17.1/Submit) id 40JKH32F000090;
	Fri, 19 Jan 2024 20:17:03 GMT
	(envelope-from git)
Date: Fri, 19 Jan 2024 20:17:03 GMT
Message-Id: <202401192017.40JKH32F000090@gitrepo.freebsd.org>
To: src-committers@FreeBSD.org, dev-commits-src-all@FreeBSD.org,
        dev-commits-src-branches@FreeBSD.org
From: Alan Somers <asomers@FreeBSD.org>
Subject: git: e2ce586899ff - stable/13 - zfsd: fault disks that
  generate too many I/O delay events
List-Id: Commit messages for all branches of the src repository <dev-commits-src-all.freebsd.org>
List-Archive: https://lists.freebsd.org/archives/dev-commits-src-all
List-Help: <mailto:dev-commits-src-all+help@freebsd.org>
List-Post: <mailto:dev-commits-src-all@freebsd.org>
List-Subscribe: <mailto:dev-commits-src-all+subscribe@freebsd.org>
List-Unsubscribe: <mailto:dev-commits-src-all+unsubscribe@freebsd.org>
Sender: owner-dev-commits-src-all@freebsd.org
X-BeenThere: dev-commits-src-all@freebsd.org
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
X-Git-Committer: asomers
X-Git-Repository: src
X-Git-Refname: refs/heads/stable/13
X-Git-Reftype: branch
X-Git-Commit: e2ce586899ff94d8e3baf10de614dfe61d06a5e0
Auto-Submitted: auto-generated

The branch stable/13 has been updated by asomers:

URL: https://cgit.FreeBSD.org/src/commit/?id=e2ce586899ff94d8e3baf10de614dfe61d06a5e0

commit e2ce586899ff94d8e3baf10de614dfe61d06a5e0
Author:     Alan Somers <asomers@FreeBSD.org>
AuthorDate: 2023-07-12 20:46:27 +0000
Commit:     Alan Somers <asomers@FreeBSD.org>
CommitDate: 2024-01-19 20:16:48 +0000

    zfsd: fault disks that generate too many I/O delay events
    
    If ZFS reports that a disk had at least 8 I/O operations over 60s that
    were each delayed by at least 30s (implying a queue depth > 4 or I/O
    aggregation, obviously), fault that disk.  Disks that respond this
    slowly can degrade the entire system's performance.
    
    Sponsored by:   Axcient
    Reviewed by:    delphij
    Differential Revision: https://reviews.freebsd.org/D42825
    
    (cherry picked from commit d565784a7ebaa59e26febdcfd4a60329786ea5f5)
---
 cddl/usr.sbin/zfsd/case_file.cc                    | 20 ++++-
 cddl/usr.sbin/zfsd/case_file.h                     |  6 +-
 cddl/usr.sbin/zfsd/tests/zfsd_unittest.cc          | 66 +++++++++++++++
 tests/sys/cddl/zfs/tests/zfsd/Makefile             |  1 +
 .../sys/cddl/zfs/tests/zfsd/zfsd_fault_002_pos.ksh | 99 ++++++++++++++++++++++
 tests/sys/cddl/zfs/tests/zfsd/zfsd_test.sh         | 32 +++++++
 6 files changed, 220 insertions(+), 4 deletions(-)

diff --git a/cddl/usr.sbin/zfsd/case_file.cc b/cddl/usr.sbin/zfsd/case_file.cc
index ad3bb0ffc538..f6b6a0228a50 100644
--- a/cddl/usr.sbin/zfsd/case_file.cc
+++ b/cddl/usr.sbin/zfsd/case_file.cc
@@ -442,7 +442,8 @@ CaseFile::ReEvaluate(const ZfsEvent &event)
 		consumed = true;
 	}
 	else if (event.Value("class") == "ereport.fs.zfs.io" ||
-	         event.Value("class") == "ereport.fs.zfs.checksum") {
+	         event.Value("class") == "ereport.fs.zfs.checksum" ||
+		 event.Value("class") == "ereport.fs.zfs.delay") {
 
 		m_tentativeEvents.push_front(event.DeepCopy());
 		RegisterCallout(event);
@@ -1143,6 +1144,13 @@ IsIOEvent(const Event* const event)
 	return ("ereport.fs.zfs.io" == event->Value("type"));
 }
 
+/* Does the argument event refer to an IO delay? */
+static bool
+IsDelayEvent(const Event* const event)
+{
+	return ("ereport.fs.zfs.delay" == event->Value("type"));
+}
+
 bool
 CaseFile::ShouldDegrade() const
 {
@@ -1153,8 +1161,14 @@ CaseFile::ShouldDegrade() const
 bool
 CaseFile::ShouldFault() const
 {
-	return (std::count_if(m_events.begin(), m_events.end(),
-			      IsIOEvent) > ZFS_DEGRADE_IO_COUNT);
+	bool should_fault_for_io, should_fault_for_delay;
+
+	should_fault_for_io = std::count_if(m_events.begin(), m_events.end(),
+			      IsIOEvent) > ZFS_DEGRADE_IO_COUNT;
+	should_fault_for_delay = std::count_if(m_events.begin(), m_events.end(),
+			      IsDelayEvent) > ZFS_FAULT_DELAY_COUNT;
+
+	return (should_fault_for_io || should_fault_for_delay);
 }
 
 nvlist_t *
diff --git a/cddl/usr.sbin/zfsd/case_file.h b/cddl/usr.sbin/zfsd/case_file.h
index 48eb33350a71..00d66ed76a67 100644
--- a/cddl/usr.sbin/zfsd/case_file.h
+++ b/cddl/usr.sbin/zfsd/case_file.h
@@ -224,7 +224,11 @@ protected:
 		 * to transition a vdev from healthy to degraded
 		 * status.
 		 */
-		ZFS_DEGRADE_IO_COUNT = 50
+		ZFS_DEGRADE_IO_COUNT = 50,
+		/**
+		 * The number of delay errors on a vdev required to fault it
+		 */
+		ZFS_FAULT_DELAY_COUNT = 8,
 	};
 
 	static CalloutFunc_t OnGracePeriodEnded;
diff --git a/cddl/usr.sbin/zfsd/tests/zfsd_unittest.cc b/cddl/usr.sbin/zfsd/tests/zfsd_unittest.cc
index caeb077a3de8..d76abb54c9ed 100644
--- a/cddl/usr.sbin/zfsd/tests/zfsd_unittest.cc
+++ b/cddl/usr.sbin/zfsd/tests/zfsd_unittest.cc
@@ -519,6 +519,72 @@ TEST_F(CaseFileTest, PoolDestroy)
 	EXPECT_TRUE(m_caseFile->ReEvaluate(*zfs_event));
 }
 
+/*
+ * A Vdev with a very large number of Delay errors should fault
+ * For performance reasons, RefreshVdevState should be called at most once
+ */
+TEST_F(CaseFileTest, VeryManyDelayErrors)
+{
+	EXPECT_CALL(*m_caseFile, RefreshVdevState())
+	    .Times(::testing::AtMost(1))
+	    .WillRepeatedly(::testing::Return(true));
+
+	for(int i=0; i<100; i++) {
+		stringstream evStringStream;
+		evStringStream <<
+			"!system=ZFS "
+			"class=ereport.fs.zfs.delay "
+			"ena=12091638756982918145 "
+			"parent_guid=13237004955564865395 "
+			"parent_type=raidz "
+			"pool=testpool.4415 "
+			"pool_context=0 "
+			"pool_failmode=wait "
+			"pool_guid=456 "
+			"pool_state= 0"
+			"subsystem=ZFS "
+			"time=";
+		evStringStream << i << "0000000000000000 ";
+		evStringStream << "timestamp=" << i << " ";
+		evStringStream <<
+			"type=ereport.fs.zfs.delay "
+			"vdev_ashift=12 "
+			"vdev_cksum_errors=0 "
+			"vdev_complete_ts=948336226469 "
+			"vdev_delays=77 "
+			"vdev_delta_ts=123998485899 "
+			"vdev_guid=123 "
+			"vdev_path=/dev/da400 "
+			"vdev_read_errors=0 "
+			"vdev_spare_guids= "
+			"vdev_type=disk "
+			"vdev_write_errors=0 "
+			"zio_blkid=622 "
+			"zio_delay=31000041101 "
+			"zio_delta=123998485899 "
+			"zio_err=0 "
+			"zio_flags=1572992 "
+			"zio_level=-2 "
+			"zio_object=0 "
+			"zio_objset=37 "
+			"zio_offset=25598976 "
+			"zio_pipeline=48234496 "
+			"zio_priority=3 "
+			"zio_size=1024"
+			"zio_stage=33554432 "
+			"zio_timestamp=824337740570 ";
+		Event *event(Event::CreateEvent(*m_eventFactory,
+						evStringStream.str()));
+		ZfsEvent *zfs_event = static_cast<ZfsEvent*>(event);
+		EXPECT_TRUE(m_caseFile->ReEvaluate(*zfs_event));
+		delete event;
+	}
+
+	m_caseFile->SpliceEvents();
+	EXPECT_FALSE(m_caseFile->ShouldDegrade());
+	EXPECT_TRUE(m_caseFile->ShouldFault());
+}
+
 /*
  * A Vdev with a very large number of IO errors should fault
  * For performance reasons, RefreshVdevState should be called at most once
diff --git a/tests/sys/cddl/zfs/tests/zfsd/Makefile b/tests/sys/cddl/zfs/tests/zfsd/Makefile
index c328cafecb6c..e2b709dc25c0 100644
--- a/tests/sys/cddl/zfs/tests/zfsd/Makefile
+++ b/tests/sys/cddl/zfs/tests/zfsd/Makefile
@@ -21,6 +21,7 @@ ${PACKAGE}FILES+=	zfsd_autoreplace_003_pos.ksh
 ${PACKAGE}FILES+=	zfsd_degrade_001_pos.ksh
 ${PACKAGE}FILES+=	zfsd_degrade_002_pos.ksh
 ${PACKAGE}FILES+=	zfsd_fault_001_pos.ksh
+${PACKAGE}FILES+=	zfsd_fault_002_pos.ksh
 ${PACKAGE}FILES+=	zfsd_hotspare_001_pos.ksh
 ${PACKAGE}FILES+=	zfsd_hotspare_002_pos.ksh
 ${PACKAGE}FILES+=	zfsd_hotspare_003_pos.ksh
diff --git a/tests/sys/cddl/zfs/tests/zfsd/zfsd_fault_002_pos.ksh b/tests/sys/cddl/zfs/tests/zfsd/zfsd_fault_002_pos.ksh
new file mode 100644
index 000000000000..d8de1ceaff4b
--- /dev/null
+++ b/tests/sys/cddl/zfs/tests/zfsd/zfsd_fault_002_pos.ksh
@@ -0,0 +1,99 @@
+#!/usr/local/bin/ksh93 -p
+#
+# CDDL HEADER START
+#
+# The contents of this file are subject to the terms of the
+# Common Development and Distribution License (the "License").
+# You may not use this file except in compliance with the License.
+#
+# You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
+# or http://www.opensolaris.org/os/licensing.
+# See the License for the specific language governing permissions
+# and limitations under the License.
+#
+# When distributing Covered Code, include this CDDL HEADER in each
+# file and include the License file at usr/src/OPENSOLARIS.LICENSE.
+# If applicable, add the following below this CDDL HEADER, with the
+# fields enclosed by brackets "[]" replaced with your own identifying
+# information: Portions Copyright [yyyy] [name of copyright owner]
+#
+# CDDL HEADER END
+#
+
+#
+# Copyright (c) 2023 Axcient.  All rights reserved.
+# Use is subject to license terms.
+#
+# $FreeBSD$
+
+. $STF_SUITE/include/libtest.kshlib
+. $STF_SUITE/include/libgnop.kshlib
+
+################################################################################
+#
+# __stc_assertion_start
+#
+# ID: zfsd_fault_002_pos
+#
+# DESCRIPTION:
+#   If a vdev experiences delayed I/O, it will become faulted.
+#
+#
+# STRATEGY:
+#   1. Create a storage pool.  Use gnop vdevs so we can inject I/O delays.
+#   2. Inject IO delays while doing IO to the pool.
+#   3. Verify that the vdev becomes FAULTED.
+#   4. ONLINE it and verify that it resilvers and joins the pool.
+#
+# TESTABILITY: explicit
+#
+# TEST_AUTOMATION_LEVEL: automated
+#
+# __stc_assertion_end
+#
+###############################################################################
+
+verify_runnable "global"
+
+log_assert "ZFS will fault a vdev that experiences delayed I/O"
+
+ensure_zfsd_running
+
+DISK0_NOP=${DISK0}.nop
+DISK1_NOP=${DISK1}.nop
+
+log_must create_gnops $DISK0 $DISK1
+
+for type in "raidz" "mirror"; do
+	log_note "Testing raid type $type"
+
+	# Create a pool on the supplied disks
+	create_pool $TESTPOOL $type "$DISK0_NOP" "$DISK1_NOP"
+	log_must $ZFS create $TESTPOOL/$TESTFS
+
+	# Cause some IO delays writing to the pool
+	while true; do
+		# ZFS currently considers an I/O to be "slow" if it's delayed
+		# for 30 seconds (zio_slow_io_ms).
+		log_must gnop configure -d 31000 -x 100 "$DISK1_NOP"
+		$DD if=/dev/zero bs=128k count=1 >> \
+			/$TESTPOOL/$TESTFS/$TESTFILE 2> /dev/null
+		$FSYNC /$TESTPOOL/$TESTFS/$TESTFILE
+		# Check to see if the pool is faulted yet
+		$ZPOOL status $TESTPOOL | grep -q 'state: DEGRADED'
+		if [ $? == 0 ]
+		then
+			log_note "$TESTPOOL got degraded"
+			$ZPOOL status -s $TESTPOOL
+			break
+		fi
+	done
+
+	log_must check_state $TESTPOOL $TMPDISK "FAULTED"
+
+	log_must gnop configure -x 0 "$DISK1_NOP"
+	destroy_pool $TESTPOOL
+	log_must $RM -rf /$TESTPOOL
+done
+
+log_pass
diff --git a/tests/sys/cddl/zfs/tests/zfsd/zfsd_test.sh b/tests/sys/cddl/zfs/tests/zfsd/zfsd_test.sh
index b6dcfe97dd7b..7aa808c9eb82 100755
--- a/tests/sys/cddl/zfs/tests/zfsd/zfsd_test.sh
+++ b/tests/sys/cddl/zfs/tests/zfsd/zfsd_test.sh
@@ -56,6 +56,37 @@ zfsd_fault_001_pos_cleanup()
 }
 
 
+atf_test_case zfsd_fault_002_pos cleanup
+zfsd_fault_002_pos_head()
+{
+	atf_set "descr" "ZFS will fault a vdev that experiences delayed I/O"
+	atf_set "require.progs" "ksh93 gnop zfs zpool zfsd"
+	atf_set "timeout" 300
+}
+zfsd_fault_002_pos_body()
+{
+	. $(atf_get_srcdir)/../../include/default.cfg
+	. $(atf_get_srcdir)/../hotspare/hotspare.kshlib
+	. $(atf_get_srcdir)/../hotspare/hotspare.cfg
+	. $(atf_get_srcdir)/zfsd.cfg
+
+	verify_disk_count "$DISKS" 2
+	verify_zfsd_running
+	ksh93 $(atf_get_srcdir)/setup.ksh || atf_fail "Setup failed"
+	ksh93 $(atf_get_srcdir)/zfsd_fault_002_pos.ksh
+	if [[ $? != 0 ]]; then
+		save_artifacts
+		atf_fail "Testcase failed"
+	fi
+}
+zfsd_fault_002_pos_cleanup()
+{
+	. $(atf_get_srcdir)/../../include/default.cfg
+	. $(atf_get_srcdir)/zfsd.cfg
+
+	ksh93 $(atf_get_srcdir)/cleanup.ksh || atf_fail "Cleanup failed"
+}
+
 atf_test_case zfsd_degrade_001_pos cleanup
 zfsd_degrade_001_pos_head()
 {
@@ -575,6 +606,7 @@ zfsd_import_001_pos_cleanup()
 atf_init_test_cases()
 {
 	atf_add_test_case zfsd_fault_001_pos
+	atf_add_test_case zfsd_fault_002_pos
 	atf_add_test_case zfsd_degrade_001_pos
 	atf_add_test_case zfsd_degrade_002_pos
 	atf_add_test_case zfsd_hotspare_001_pos