From nobody Mon Aug 4 18:37:02 2025 X-Original-To: dev-commits-src-main@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4bwldp6wjVz63Mk3; Mon, 04 Aug 2025 18:37:02 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R10" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4bwldp5txpz48PF; Mon, 04 Aug 2025 18:37:02 +0000 (UTC) (envelope-from git@FreeBSD.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1754332622; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=wvoqAbh0qWyiO3xCluoljg749joLnm9a7wOmGr6Xwl4=; b=VPWqDfAhryPLAf9xJzZCIsLYdh8RYKxkECF+BsosWphyit3cZhTu+l99W44/Yh0j2DklJp ZE5Kri2GzaZqBsX1XIacSS2RIC8Dyg54RG8MVZp9Ujs0QWz0tImralCXyxMNQrQw5i7n3o jqj29Gy8sK4f9sb1pzYmpie5HzLlkp/42X5DMKBCWCqiGJ2zd3NxBiDZ5WdeY/pjLzOeIh JYI3s+qsp99sUHexup2lM/hY725E4cJHE42t/De82QvJB9Uka0nkbqGbre3twQKmUZpy2K cead/ycHQPhRgp4oAUZu0YV0kFVJUIf9Uk7wyGqmRu+IidW7pZh7xr696ROJGQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1754332622; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=wvoqAbh0qWyiO3xCluoljg749joLnm9a7wOmGr6Xwl4=; b=b+ODMGKnfoVyGWtA6f8cjloRdq7nXj4XyVAJHAeF/ho2RdybG+ZdVOqktIPVdLv1OnFSkq SZIO7s3Cs4uSTj8hSgNb/PW2GJuC8scq1icuoKOoOaTN9rSC/NMpBFsQ/gD+AnQdyFm+D8 pZXeGGK1tKIk8+ipdQS73qV7fnt6e+74OAF6JH2LZqPaVdoIqKKC+kO0R2D1m5vfe7wFb4 VEjYYbeL8zK1N3XpgQjtJfHk8tolG6qE6RNaQYlPNyQtIGKRa7WQeyC/nCvFWzuLd0W5sG 3A17oRSN2VwBPWMv7Fj25Kl1KIi2BvLPpIVaN/4tC8vA87N3CV8gcGQVUFp/cA== ARC-Authentication-Results: i=1; mx1.freebsd.org; none ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1754332622; a=rsa-sha256; cv=none; b=QCJndMm2Kr9bONBS/klk0C8wDjyrbg/DwdVAvW2k7pW9z6/Hqm2JvzS4vtNqcg8+/UPGfn Jjti9KdbPMnOpEAj/dEhPxgQXBDpGUMc4nq+gqEJPG2OeU4/EPzGi5Z8veEB3ydJgqKBIN t1K/fr4QaiVYuDlVolvN6tZFxp7Rk30oFHf7b/8LrBn4W6+HTE3aYwM1DhDWHnW8xTwXpb sQVyrjQj3MUmMVWqzsG2SnGPaFRZFL7NFNrgUBpc3ZdxVo/iepwklyEeVe/tajCbSB8uKX HBKtfZtZ607QbTxZlEkh6jb3onIMeEgnyo43DBH67WZyfD8p/v/cJi3ljdRd6g== Received: from gitrepo.freebsd.org (gitrepo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:5]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 4bwldp5CLgz11Lx; Mon, 04 Aug 2025 18:37:02 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from gitrepo.freebsd.org ([127.0.1.44]) by gitrepo.freebsd.org (8.18.1/8.18.1) with ESMTP id 574Ib2cl066997; Mon, 4 Aug 2025 18:37:02 GMT (envelope-from git@gitrepo.freebsd.org) Received: (from git@localhost) by gitrepo.freebsd.org (8.18.1/8.18.1/Submit) id 574Ib28u066994; Mon, 4 Aug 2025 18:37:02 GMT (envelope-from git) Date: Mon, 4 Aug 2025 18:37:02 GMT Message-Id: <202508041837.574Ib28u066994@gitrepo.freebsd.org> To: src-committers@FreeBSD.org, dev-commits-src-all@FreeBSD.org, dev-commits-src-main@FreeBSD.org From: Alan Somers Subject: git: a13ddd6210f3 - main - zfsd: don't try to fix an OFFLINE condition List-Id: Commit messages for the main branch of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-main List-Help: List-Post: List-Subscribe: List-Unsubscribe: X-BeenThere: dev-commits-src-main@freebsd.org Sender: owner-dev-commits-src-main@FreeBSD.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Git-Committer: asomers X-Git-Repository: src X-Git-Refname: refs/heads/main X-Git-Reftype: branch X-Git-Commit: a13ddd6210f349dec341eba0506b5f2395a7a2d6 Auto-Submitted: auto-generated The branch main has been updated by asomers: URL: https://cgit.FreeBSD.org/src/commit/?id=a13ddd6210f349dec341eba0506b5f2395a7a2d6 commit a13ddd6210f349dec341eba0506b5f2395a7a2d6 Author: Alan Somers AuthorDate: 2025-05-02 12:45:32 +0000 Commit: Alan Somers CommitDate: 2025-08-04 16:00:52 +0000 zfsd: don't try to fix an OFFLINE condition If the system administrator does "zpool offline", he's doing it for a reason. zfsd shouldn't consider an offline disk to be an event that requires automatic healing. Don't online it in response to a GEOM event, and don't try to activate a hotspare to take over from it. MFC after: 2 weeks Sponsored by: ConnectWise --- cddl/usr.sbin/zfsd/case_file.cc | 18 +++++- cddl/usr.sbin/zfsd/zfsd_event.cc | 7 +++ tests/sys/cddl/zfs/tests/zfsd/Makefile | 2 + .../cddl/zfs/tests/zfsd/zfsd_offline_001_neg.ksh | 64 +++++++++++++++++++++ .../cddl/zfs/tests/zfsd/zfsd_offline_002_neg.ksh | 66 ++++++++++++++++++++++ tests/sys/cddl/zfs/tests/zfsd/zfsd_test.sh | 60 ++++++++++++++++++++ 6 files changed, 215 insertions(+), 2 deletions(-) diff --git a/cddl/usr.sbin/zfsd/case_file.cc b/cddl/usr.sbin/zfsd/case_file.cc index 7adfb08b75c6..852767aeb227 100644 --- a/cddl/usr.sbin/zfsd/case_file.cc +++ b/cddl/usr.sbin/zfsd/case_file.cc @@ -299,6 +299,15 @@ CaseFile::ReEvaluate(const string &devPath, const string &physPath, Vdev *vdev) PoolGUIDString().c_str(), VdevGUIDString().c_str()); return (/*consumed*/false); } + if (VdevState() == VDEV_STATE_OFFLINE) { + /* + * OFFLINE is an administrative decision. No need for zfsd to + * do anything. + */ + syslog(LOG_INFO, "CaseFile::ReEvaluate(%s,%s): Pool/Vdev ignored", + PoolGUIDString().c_str(), VdevGUIDString().c_str()); + return (/*consumed*/false); + } if (vdev != NULL && ( vdev->PoolGUID() == m_poolGUID @@ -401,7 +410,8 @@ CaseFile::ReEvaluate(const ZfsEvent &event) return (/*consumed*/true); } else if (event.Value("type") == "sysevent.fs.zfs.config_sync") { RefreshVdevState(); - if (VdevState() < VDEV_STATE_HEALTHY) + if (VdevState() < VDEV_STATE_HEALTHY && + VdevState() != VDEV_STATE_OFFLINE) consumed = ActivateSpare(); } @@ -694,6 +704,11 @@ CaseFile::CloseIfSolved() switch (VdevState()) { case VDEV_STATE_HEALTHY: /* No need to keep cases for healthy vdevs */ + case VDEV_STATE_OFFLINE: + /* + * Offline is a deliberate administrative action. zfsd + * doesn't need to do anything for this state. + */ Close(); return (true); case VDEV_STATE_REMOVED: @@ -710,7 +725,6 @@ CaseFile::CloseIfSolved() */ case VDEV_STATE_UNKNOWN: case VDEV_STATE_CLOSED: - case VDEV_STATE_OFFLINE: /* * Keep open? This may not be the correct behavior, * but it's what we've always done diff --git a/cddl/usr.sbin/zfsd/zfsd_event.cc b/cddl/usr.sbin/zfsd/zfsd_event.cc index 7a19b95abeed..afdabd99a8c3 100644 --- a/cddl/usr.sbin/zfsd/zfsd_event.cc +++ b/cddl/usr.sbin/zfsd/zfsd_event.cc @@ -355,6 +355,13 @@ ZfsEvent::Process() const Vdev vdev(zpl.front(), vdevConfig); caseFile = &CaseFile::Create(vdev); + if (caseFile->VdevState() == VDEV_STATE_OFFLINE) { + /* + * An administrator did this deliberately. It's not considered + * an error that zfsd must fix. + */ + return (false); + } if (caseFile->ReEvaluate(*this) == false) { stringstream msg; int priority = LOG_INFO; diff --git a/tests/sys/cddl/zfs/tests/zfsd/Makefile b/tests/sys/cddl/zfs/tests/zfsd/Makefile index e34e24b40906..588ca6e6c145 100644 --- a/tests/sys/cddl/zfs/tests/zfsd/Makefile +++ b/tests/sys/cddl/zfs/tests/zfsd/Makefile @@ -30,6 +30,8 @@ ${PACKAGE}FILES+= zfsd_hotspare_006_pos.ksh ${PACKAGE}FILES+= zfsd_hotspare_007_pos.ksh ${PACKAGE}FILES+= zfsd_hotspare_008_neg.ksh ${PACKAGE}FILES+= zfsd_import_001_pos.ksh +${PACKAGE}FILES+= zfsd_offline_001_neg.ksh +${PACKAGE}FILES+= zfsd_offline_002_neg.ksh ${PACKAGE}FILES+= zfsd_replace_001_pos.ksh ${PACKAGE}FILES+= zfsd_replace_002_pos.ksh ${PACKAGE}FILES+= zfsd_replace_003_pos.ksh diff --git a/tests/sys/cddl/zfs/tests/zfsd/zfsd_offline_001_neg.ksh b/tests/sys/cddl/zfs/tests/zfsd/zfsd_offline_001_neg.ksh new file mode 100644 index 000000000000..de7996976504 --- /dev/null +++ b/tests/sys/cddl/zfs/tests/zfsd/zfsd_offline_001_neg.ksh @@ -0,0 +1,64 @@ +#!/usr/local/bin/ksh93 -p +# +# CDDL HEADER START +# +# The contents of this file are subject to the terms of the +# Common Development and Distribution License (the "License"). +# You may not use this file except in compliance with the License. +# +# You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE +# or http://www.opensolaris.org/os/licensing. +# See the License for the specific language governing permissions +# and limitations under the License. +# +# When distributing Covered Code, include this CDDL HEADER in each +# file and include the License file at usr/src/OPENSOLARIS.LICENSE. +# If applicable, add the following below this CDDL HEADER, with the +# fields enclosed by brackets "[]" replaced with your own identifying +# information: Portions Copyright [yyyy] [name of copyright owner] +# +# CDDL HEADER END +# + +# +# Copyright 2025 ConnectWise. All rights reserved. +# Use is subject to license terms. + +. $STF_SUITE/tests/hotspare/hotspare.kshlib + +verify_runnable "global" + +function cleanup +{ + $ZPOOL status $TESTPOOL + if poolexists $TESTPOOL ; then + destroy_pool $TESTPOOL + fi + + partition_cleanup +} + +function verify_assertion +{ + log_must $ZPOOL offline $TESTPOOL $FAULT_DISK + + # Wait a few seconds before verifying the state + $SLEEP 10 + log_must check_state $TESTPOOL "$FAULT_DISK" "OFFLINE" +} + +log_onexit cleanup + +log_assert "ZFSD will not automatically reactivate a disk which has been administratively offlined" + +ensure_zfsd_running + +typeset FAULT_DISK=$DISK0 +typeset POOLDEVS="$DISK0 $DISK1 $DISK2" +set -A MY_KEYWORDS mirror raidz1 +for keyword in "${MY_KEYWORDS[@]}" ; do + log_must create_pool $TESTPOOL $keyword $POOLDEVS + verify_assertion + + destroy_pool "$TESTPOOL" +done diff --git a/tests/sys/cddl/zfs/tests/zfsd/zfsd_offline_002_neg.ksh b/tests/sys/cddl/zfs/tests/zfsd/zfsd_offline_002_neg.ksh new file mode 100644 index 000000000000..7d8dfc62d365 --- /dev/null +++ b/tests/sys/cddl/zfs/tests/zfsd/zfsd_offline_002_neg.ksh @@ -0,0 +1,66 @@ +#!/usr/local/bin/ksh93 -p +# +# CDDL HEADER START +# +# The contents of this file are subject to the terms of the +# Common Development and Distribution License (the "License"). +# You may not use this file except in compliance with the License. +# +# You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE +# or http://www.opensolaris.org/os/licensing. +# See the License for the specific language governing permissions +# and limitations under the License. +# +# When distributing Covered Code, include this CDDL HEADER in each +# file and include the License file at usr/src/OPENSOLARIS.LICENSE. +# If applicable, add the following below this CDDL HEADER, with the +# fields enclosed by brackets "[]" replaced with your own identifying +# information: Portions Copyright [yyyy] [name of copyright owner] +# +# CDDL HEADER END +# + +# +# Copyright 2025 ConnectWise. All rights reserved. +# Use is subject to license terms. + +. $STF_SUITE/tests/hotspare/hotspare.kshlib + +verify_runnable "global" + +function cleanup +{ + $ZPOOL status $TESTPOOL + if poolexists $TESTPOOL ; then + destroy_pool $TESTPOOL + fi + + partition_cleanup +} + +function verify_assertion +{ + log_must $ZPOOL offline $TESTPOOL $FAULT_DISK + + # Wait a few seconds before verifying the state + $SLEEP 10 + log_must check_state $TESTPOOL "$FAULT_DISK" "OFFLINE" + log_must check_state $TESTPOOL "$SPARE_DISK" "AVAIL" +} + +log_onexit cleanup + +log_assert "ZFSD will not automatically activate a spare when a disk has been administratively offlined" + +ensure_zfsd_running + +typeset FAULT_DISK=$DISK0 +typeset SPARE_DISK=$DISK3 +typeset POOLDEVS="$DISK0 $DISK1 $DISK2" +set -A MY_KEYWORDS mirror raidz1 +for keyword in "${MY_KEYWORDS[@]}" ; do + log_must create_pool $TESTPOOL $keyword $POOLDEVS spare $SPARE_DISK + verify_assertion + + destroy_pool "$TESTPOOL" +done diff --git a/tests/sys/cddl/zfs/tests/zfsd/zfsd_test.sh b/tests/sys/cddl/zfs/tests/zfsd/zfsd_test.sh index fe4ac4866ed3..b9924500a298 100755 --- a/tests/sys/cddl/zfs/tests/zfsd/zfsd_test.sh +++ b/tests/sys/cddl/zfs/tests/zfsd/zfsd_test.sh @@ -483,6 +483,64 @@ zfsd_autoreplace_003_pos_cleanup() ksh93 $(atf_get_srcdir)/cleanup.ksh || atf_fail "Cleanup failed" } +atf_test_case zfsd_offline_001_neg cleanup +zfsd_offline_001_neg_head() +{ + atf_set "descr" "ZFSD will not automatically reactivate a disk which has been administratively offlined" + atf_set "require.progs" "ksh93 zpool zfs" +} +zfsd_offline_001_neg_body() +{ + . $(atf_get_srcdir)/../../include/default.cfg + . $(atf_get_srcdir)/../hotspare/hotspare.cfg + . $(atf_get_srcdir)/zfsd.cfg + + verify_disk_count "$DISKS" 3 + verify_zfsd_running + ksh93 $(atf_get_srcdir)/setup.ksh || atf_fail "Setup failed" + ksh93 $(atf_get_srcdir)/zfsd_offline_001_neg.ksh + if [[ $? != 0 ]]; then + save_artifacts + atf_fail "Testcase failed" + fi +} +zfsd_offline_001_neg_cleanup() +{ + . $(atf_get_srcdir)/../../include/default.cfg + . $(atf_get_srcdir)/zfsd.cfg + + ksh93 $(atf_get_srcdir)/cleanup.ksh || atf_fail "Cleanup failed" +} + +atf_test_case zfsd_offline_002_neg cleanup +zfsd_offline_002_neg_head() +{ + atf_set "descr" "ZFSD will not automatically activate a spare when a disk has been administratively offlined" + atf_set "require.progs" "ksh93 zpool zfs" +} +zfsd_offline_002_neg_body() +{ + . $(atf_get_srcdir)/../../include/default.cfg + . $(atf_get_srcdir)/../hotspare/hotspare.cfg + . $(atf_get_srcdir)/zfsd.cfg + + verify_disk_count "$DISKS" 4 + verify_zfsd_running + ksh93 $(atf_get_srcdir)/setup.ksh || atf_fail "Setup failed" + ksh93 $(atf_get_srcdir)/zfsd_offline_002_neg.ksh + if [[ $? != 0 ]]; then + save_artifacts + atf_fail "Testcase failed" + fi +} +zfsd_offline_002_neg_cleanup() +{ + . $(atf_get_srcdir)/../../include/default.cfg + . $(atf_get_srcdir)/zfsd.cfg + + ksh93 $(atf_get_srcdir)/cleanup.ksh || atf_fail "Cleanup failed" +} + atf_test_case zfsd_replace_001_pos cleanup zfsd_replace_001_pos_head() { @@ -676,6 +734,8 @@ atf_init_test_cases() atf_add_test_case zfsd_autoreplace_001_neg atf_add_test_case zfsd_autoreplace_002_pos atf_add_test_case zfsd_autoreplace_003_pos + atf_add_test_case zfsd_offline_001_neg + atf_add_test_case zfsd_offline_002_neg atf_add_test_case zfsd_replace_001_pos atf_add_test_case zfsd_replace_002_pos atf_add_test_case zfsd_replace_003_pos