From nobody Fri Jan 19 20:17:03 2024 X-Original-To: dev-commits-src-all@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4TGrW34jVwz57KDr; Fri, 19 Jan 2024 20:17:03 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4TGrW32TsNz3wkl; Fri, 19 Jan 2024 20:17:03 +0000 (UTC) (envelope-from git@FreeBSD.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1705695423; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=p2GJ0ZitXeioPGtP8JgpO00WCICFwbQZacxfQ85Yrm4=; b=MZg0Tp5QWmygeUDOCRWRNnXOsdvonrmndUTsLXnjQR4qR/wTIRKkym6tvz6xQt2+c4+J2Y 6lYplzXE1WZQB9wbirTEOrBn11v6M+kFeBJN+uaqO16bbbtXkDDk2UWTwG5g2O64KB2Ngh N/Au1YjVVCEZrWmI9C6nS4ZDOZk2759i7eTAr2AS/nXEPlMiQR3diwFD77qf2UcYKHsIff ayQYDYQEdOm2sk5T6wL2bzEeG8hqgQqDAV0R5tM9KX6b9lrU+GelZkYZYZKdRuT2RK+GvX 18UJ9NlbCgSwjEijH3agT7Tp1PTrwjNvrJah4ilUlxaRs7WC6br/sL1v1mdv7w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1705695423; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=p2GJ0ZitXeioPGtP8JgpO00WCICFwbQZacxfQ85Yrm4=; b=MhnPOW8cE9kxgpwwCebrRUp1xMO5fyuNAPTSlzfiym+xEyxiW8DdRMOChYvs3poWPHRmcp MUs3xXX9iKQXzI/1J521b37r7ou6AdcN0jymR9UnWwcjxodr1S9eUUDbx7UDIbV+smJn3s cgZBi7YT1YjKIeGxBnfoehWsy6GDbKg04Lh7N3O8XteUw9i8UA951F/HlDvsXSSC9VE06Q EgZPugtOuneTPzLUuSPC35wHbd6KGzYFnc0JTd5zvAkS/Re42ADQVqDKg5eZzQfxGV82Sh /dN+YemVAkeoOfOBQkB2O+mpwTWi3QXB+maObrLovmWmXrUa7nG+Be5bW8A/sw== ARC-Authentication-Results: i=1; mx1.freebsd.org; none ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1705695423; a=rsa-sha256; cv=none; b=fFwoKm5vBD+Rsa/8tQc49VNqWd5App9cK3lrcO9KsJLtvJXeQr38loFJZxrlzAY5wlNO5i ShEAqZjtPjSoyaLaXNDlnLObAlqUZFNFc4EW/ot1b4da3cSiFI6GQRGdqexNL1V3wMTbxz cyeHA2vxYU3Vj/YxvGyrssbqzqHdayh9GOQUv/QWbItwuhOrJwgo+lnBPx1MLN7PiaYney m3rPylvxWY/EQUE7KDwO0Z5eBiA/4tQDLs1mjKMExErG6IZWxWvwKVnmXShuEAg5cS6RHa HO1aqiozHNSSUziXwGaob7+aw/ttNXaKWCQD85LvdUvDr88scZYX6rKrirV7rw== Received: from gitrepo.freebsd.org (gitrepo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:5]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 4TGrW31Ylwz156F; Fri, 19 Jan 2024 20:17:03 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from gitrepo.freebsd.org ([127.0.1.44]) by gitrepo.freebsd.org (8.17.1/8.17.1) with ESMTP id 40JKH37m000094; Fri, 19 Jan 2024 20:17:03 GMT (envelope-from git@gitrepo.freebsd.org) Received: (from git@localhost) by gitrepo.freebsd.org (8.17.1/8.17.1/Submit) id 40JKH32F000090; Fri, 19 Jan 2024 20:17:03 GMT (envelope-from git) Date: Fri, 19 Jan 2024 20:17:03 GMT Message-Id: <202401192017.40JKH32F000090@gitrepo.freebsd.org> To: src-committers@FreeBSD.org, dev-commits-src-all@FreeBSD.org, dev-commits-src-branches@FreeBSD.org From: Alan Somers Subject: git: e2ce586899ff - stable/13 - zfsd: fault disks that generate too many I/O delay events List-Id: Commit messages for all branches of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-all List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-dev-commits-src-all@freebsd.org X-BeenThere: dev-commits-src-all@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Git-Committer: asomers X-Git-Repository: src X-Git-Refname: refs/heads/stable/13 X-Git-Reftype: branch X-Git-Commit: e2ce586899ff94d8e3baf10de614dfe61d06a5e0 Auto-Submitted: auto-generated The branch stable/13 has been updated by asomers: URL: https://cgit.FreeBSD.org/src/commit/?id=e2ce586899ff94d8e3baf10de614dfe61d06a5e0 commit e2ce586899ff94d8e3baf10de614dfe61d06a5e0 Author: Alan Somers AuthorDate: 2023-07-12 20:46:27 +0000 Commit: Alan Somers CommitDate: 2024-01-19 20:16:48 +0000 zfsd: fault disks that generate too many I/O delay events If ZFS reports that a disk had at least 8 I/O operations over 60s that were each delayed by at least 30s (implying a queue depth > 4 or I/O aggregation, obviously), fault that disk. Disks that respond this slowly can degrade the entire system's performance. Sponsored by: Axcient Reviewed by: delphij Differential Revision: https://reviews.freebsd.org/D42825 (cherry picked from commit d565784a7ebaa59e26febdcfd4a60329786ea5f5) --- cddl/usr.sbin/zfsd/case_file.cc | 20 ++++- cddl/usr.sbin/zfsd/case_file.h | 6 +- cddl/usr.sbin/zfsd/tests/zfsd_unittest.cc | 66 +++++++++++++++ tests/sys/cddl/zfs/tests/zfsd/Makefile | 1 + .../sys/cddl/zfs/tests/zfsd/zfsd_fault_002_pos.ksh | 99 ++++++++++++++++++++++ tests/sys/cddl/zfs/tests/zfsd/zfsd_test.sh | 32 +++++++ 6 files changed, 220 insertions(+), 4 deletions(-) diff --git a/cddl/usr.sbin/zfsd/case_file.cc b/cddl/usr.sbin/zfsd/case_file.cc index ad3bb0ffc538..f6b6a0228a50 100644 --- a/cddl/usr.sbin/zfsd/case_file.cc +++ b/cddl/usr.sbin/zfsd/case_file.cc @@ -442,7 +442,8 @@ CaseFile::ReEvaluate(const ZfsEvent &event) consumed = true; } else if (event.Value("class") == "ereport.fs.zfs.io" || - event.Value("class") == "ereport.fs.zfs.checksum") { + event.Value("class") == "ereport.fs.zfs.checksum" || + event.Value("class") == "ereport.fs.zfs.delay") { m_tentativeEvents.push_front(event.DeepCopy()); RegisterCallout(event); @@ -1143,6 +1144,13 @@ IsIOEvent(const Event* const event) return ("ereport.fs.zfs.io" == event->Value("type")); } +/* Does the argument event refer to an IO delay? */ +static bool +IsDelayEvent(const Event* const event) +{ + return ("ereport.fs.zfs.delay" == event->Value("type")); +} + bool CaseFile::ShouldDegrade() const { @@ -1153,8 +1161,14 @@ CaseFile::ShouldDegrade() const bool CaseFile::ShouldFault() const { - return (std::count_if(m_events.begin(), m_events.end(), - IsIOEvent) > ZFS_DEGRADE_IO_COUNT); + bool should_fault_for_io, should_fault_for_delay; + + should_fault_for_io = std::count_if(m_events.begin(), m_events.end(), + IsIOEvent) > ZFS_DEGRADE_IO_COUNT; + should_fault_for_delay = std::count_if(m_events.begin(), m_events.end(), + IsDelayEvent) > ZFS_FAULT_DELAY_COUNT; + + return (should_fault_for_io || should_fault_for_delay); } nvlist_t * diff --git a/cddl/usr.sbin/zfsd/case_file.h b/cddl/usr.sbin/zfsd/case_file.h index 48eb33350a71..00d66ed76a67 100644 --- a/cddl/usr.sbin/zfsd/case_file.h +++ b/cddl/usr.sbin/zfsd/case_file.h @@ -224,7 +224,11 @@ protected: * to transition a vdev from healthy to degraded * status. */ - ZFS_DEGRADE_IO_COUNT = 50 + ZFS_DEGRADE_IO_COUNT = 50, + /** + * The number of delay errors on a vdev required to fault it + */ + ZFS_FAULT_DELAY_COUNT = 8, }; static CalloutFunc_t OnGracePeriodEnded; diff --git a/cddl/usr.sbin/zfsd/tests/zfsd_unittest.cc b/cddl/usr.sbin/zfsd/tests/zfsd_unittest.cc index caeb077a3de8..d76abb54c9ed 100644 --- a/cddl/usr.sbin/zfsd/tests/zfsd_unittest.cc +++ b/cddl/usr.sbin/zfsd/tests/zfsd_unittest.cc @@ -519,6 +519,72 @@ TEST_F(CaseFileTest, PoolDestroy) EXPECT_TRUE(m_caseFile->ReEvaluate(*zfs_event)); } +/* + * A Vdev with a very large number of Delay errors should fault + * For performance reasons, RefreshVdevState should be called at most once + */ +TEST_F(CaseFileTest, VeryManyDelayErrors) +{ + EXPECT_CALL(*m_caseFile, RefreshVdevState()) + .Times(::testing::AtMost(1)) + .WillRepeatedly(::testing::Return(true)); + + for(int i=0; i<100; i++) { + stringstream evStringStream; + evStringStream << + "!system=ZFS " + "class=ereport.fs.zfs.delay " + "ena=12091638756982918145 " + "parent_guid=13237004955564865395 " + "parent_type=raidz " + "pool=testpool.4415 " + "pool_context=0 " + "pool_failmode=wait " + "pool_guid=456 " + "pool_state= 0" + "subsystem=ZFS " + "time="; + evStringStream << i << "0000000000000000 "; + evStringStream << "timestamp=" << i << " "; + evStringStream << + "type=ereport.fs.zfs.delay " + "vdev_ashift=12 " + "vdev_cksum_errors=0 " + "vdev_complete_ts=948336226469 " + "vdev_delays=77 " + "vdev_delta_ts=123998485899 " + "vdev_guid=123 " + "vdev_path=/dev/da400 " + "vdev_read_errors=0 " + "vdev_spare_guids= " + "vdev_type=disk " + "vdev_write_errors=0 " + "zio_blkid=622 " + "zio_delay=31000041101 " + "zio_delta=123998485899 " + "zio_err=0 " + "zio_flags=1572992 " + "zio_level=-2 " + "zio_object=0 " + "zio_objset=37 " + "zio_offset=25598976 " + "zio_pipeline=48234496 " + "zio_priority=3 " + "zio_size=1024" + "zio_stage=33554432 " + "zio_timestamp=824337740570 "; + Event *event(Event::CreateEvent(*m_eventFactory, + evStringStream.str())); + ZfsEvent *zfs_event = static_cast(event); + EXPECT_TRUE(m_caseFile->ReEvaluate(*zfs_event)); + delete event; + } + + m_caseFile->SpliceEvents(); + EXPECT_FALSE(m_caseFile->ShouldDegrade()); + EXPECT_TRUE(m_caseFile->ShouldFault()); +} + /* * A Vdev with a very large number of IO errors should fault * For performance reasons, RefreshVdevState should be called at most once diff --git a/tests/sys/cddl/zfs/tests/zfsd/Makefile b/tests/sys/cddl/zfs/tests/zfsd/Makefile index c328cafecb6c..e2b709dc25c0 100644 --- a/tests/sys/cddl/zfs/tests/zfsd/Makefile +++ b/tests/sys/cddl/zfs/tests/zfsd/Makefile @@ -21,6 +21,7 @@ ${PACKAGE}FILES+= zfsd_autoreplace_003_pos.ksh ${PACKAGE}FILES+= zfsd_degrade_001_pos.ksh ${PACKAGE}FILES+= zfsd_degrade_002_pos.ksh ${PACKAGE}FILES+= zfsd_fault_001_pos.ksh +${PACKAGE}FILES+= zfsd_fault_002_pos.ksh ${PACKAGE}FILES+= zfsd_hotspare_001_pos.ksh ${PACKAGE}FILES+= zfsd_hotspare_002_pos.ksh ${PACKAGE}FILES+= zfsd_hotspare_003_pos.ksh diff --git a/tests/sys/cddl/zfs/tests/zfsd/zfsd_fault_002_pos.ksh b/tests/sys/cddl/zfs/tests/zfsd/zfsd_fault_002_pos.ksh new file mode 100644 index 000000000000..d8de1ceaff4b --- /dev/null +++ b/tests/sys/cddl/zfs/tests/zfsd/zfsd_fault_002_pos.ksh @@ -0,0 +1,99 @@ +#!/usr/local/bin/ksh93 -p +# +# CDDL HEADER START +# +# The contents of this file are subject to the terms of the +# Common Development and Distribution License (the "License"). +# You may not use this file except in compliance with the License. +# +# You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE +# or http://www.opensolaris.org/os/licensing. +# See the License for the specific language governing permissions +# and limitations under the License. +# +# When distributing Covered Code, include this CDDL HEADER in each +# file and include the License file at usr/src/OPENSOLARIS.LICENSE. +# If applicable, add the following below this CDDL HEADER, with the +# fields enclosed by brackets "[]" replaced with your own identifying +# information: Portions Copyright [yyyy] [name of copyright owner] +# +# CDDL HEADER END +# + +# +# Copyright (c) 2023 Axcient. All rights reserved. +# Use is subject to license terms. +# +# $FreeBSD$ + +. $STF_SUITE/include/libtest.kshlib +. $STF_SUITE/include/libgnop.kshlib + +################################################################################ +# +# __stc_assertion_start +# +# ID: zfsd_fault_002_pos +# +# DESCRIPTION: +# If a vdev experiences delayed I/O, it will become faulted. +# +# +# STRATEGY: +# 1. Create a storage pool. Use gnop vdevs so we can inject I/O delays. +# 2. Inject IO delays while doing IO to the pool. +# 3. Verify that the vdev becomes FAULTED. +# 4. ONLINE it and verify that it resilvers and joins the pool. +# +# TESTABILITY: explicit +# +# TEST_AUTOMATION_LEVEL: automated +# +# __stc_assertion_end +# +############################################################################### + +verify_runnable "global" + +log_assert "ZFS will fault a vdev that experiences delayed I/O" + +ensure_zfsd_running + +DISK0_NOP=${DISK0}.nop +DISK1_NOP=${DISK1}.nop + +log_must create_gnops $DISK0 $DISK1 + +for type in "raidz" "mirror"; do + log_note "Testing raid type $type" + + # Create a pool on the supplied disks + create_pool $TESTPOOL $type "$DISK0_NOP" "$DISK1_NOP" + log_must $ZFS create $TESTPOOL/$TESTFS + + # Cause some IO delays writing to the pool + while true; do + # ZFS currently considers an I/O to be "slow" if it's delayed + # for 30 seconds (zio_slow_io_ms). + log_must gnop configure -d 31000 -x 100 "$DISK1_NOP" + $DD if=/dev/zero bs=128k count=1 >> \ + /$TESTPOOL/$TESTFS/$TESTFILE 2> /dev/null + $FSYNC /$TESTPOOL/$TESTFS/$TESTFILE + # Check to see if the pool is faulted yet + $ZPOOL status $TESTPOOL | grep -q 'state: DEGRADED' + if [ $? == 0 ] + then + log_note "$TESTPOOL got degraded" + $ZPOOL status -s $TESTPOOL + break + fi + done + + log_must check_state $TESTPOOL $TMPDISK "FAULTED" + + log_must gnop configure -x 0 "$DISK1_NOP" + destroy_pool $TESTPOOL + log_must $RM -rf /$TESTPOOL +done + +log_pass diff --git a/tests/sys/cddl/zfs/tests/zfsd/zfsd_test.sh b/tests/sys/cddl/zfs/tests/zfsd/zfsd_test.sh index b6dcfe97dd7b..7aa808c9eb82 100755 --- a/tests/sys/cddl/zfs/tests/zfsd/zfsd_test.sh +++ b/tests/sys/cddl/zfs/tests/zfsd/zfsd_test.sh @@ -56,6 +56,37 @@ zfsd_fault_001_pos_cleanup() } +atf_test_case zfsd_fault_002_pos cleanup +zfsd_fault_002_pos_head() +{ + atf_set "descr" "ZFS will fault a vdev that experiences delayed I/O" + atf_set "require.progs" "ksh93 gnop zfs zpool zfsd" + atf_set "timeout" 300 +} +zfsd_fault_002_pos_body() +{ + . $(atf_get_srcdir)/../../include/default.cfg + . $(atf_get_srcdir)/../hotspare/hotspare.kshlib + . $(atf_get_srcdir)/../hotspare/hotspare.cfg + . $(atf_get_srcdir)/zfsd.cfg + + verify_disk_count "$DISKS" 2 + verify_zfsd_running + ksh93 $(atf_get_srcdir)/setup.ksh || atf_fail "Setup failed" + ksh93 $(atf_get_srcdir)/zfsd_fault_002_pos.ksh + if [[ $? != 0 ]]; then + save_artifacts + atf_fail "Testcase failed" + fi +} +zfsd_fault_002_pos_cleanup() +{ + . $(atf_get_srcdir)/../../include/default.cfg + . $(atf_get_srcdir)/zfsd.cfg + + ksh93 $(atf_get_srcdir)/cleanup.ksh || atf_fail "Cleanup failed" +} + atf_test_case zfsd_degrade_001_pos cleanup zfsd_degrade_001_pos_head() { @@ -575,6 +606,7 @@ zfsd_import_001_pos_cleanup() atf_init_test_cases() { atf_add_test_case zfsd_fault_001_pos + atf_add_test_case zfsd_fault_002_pos atf_add_test_case zfsd_degrade_001_pos atf_add_test_case zfsd_degrade_002_pos atf_add_test_case zfsd_hotspare_001_pos