From nobody Wed Aug 20 14:53:41 2025 X-Original-To: dev-commits-src-main@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4c6Twj5xLsz64XD6; Wed, 20 Aug 2025 14:53:41 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R10" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4c6Twj4ZBhz3m2b; Wed, 20 Aug 2025 14:53:41 +0000 (UTC) (envelope-from git@FreeBSD.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1755701621; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=JKEb69a5YlJTcfKJS6lBiBsTpSsxWwUmkz6onPG66eI=; b=OktFtj/5PQaL0CFIqIRlTeUNrXC88qBgb7/qkz7ifqrU4qV77m+ikgbS6dM+O6xnH+eHQW b9kRHNJruBdArA+bMuHLMar/kXSVbOqZVcOVnccwq3p4ZMjVCrD7+Gj575AlionbrLZR1C SdwC+Y6kNNotRECFXNUr6k7Xd3tzw4iT9YgiVa6nmHEFpdHb8N/xqJTEoCMttcA+V2te+I tatd2CLkKqtuPnUfMP5/qmXvC00sur0/75mG/0e2miP0k67HgceInDXL+zpfgocfL7so7U uexC5oqoqIymmSCAFVwlOkMNvhplVXc8ZekbG/gKY1ryQJv69PdVrR7MotrTKQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1755701621; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=JKEb69a5YlJTcfKJS6lBiBsTpSsxWwUmkz6onPG66eI=; b=sHvWpeVSV6NRelesEwxIbKFO/n/1006IYHyT6oG9bXW1ahNRGOTwizfxnbP1N7Otmf644J 4d61y/zrwiVfbV/VMSRcGY6CyaU3p7z5Sh3anNHc/+nvRJ33i5lUFYJbk3to52wSBFNULS i/fFc+EwO2gToLkYNc8IP3bw2vpXwLc33ROh45Cs0ka2qZSTAOWupmu5g2iXg9QITIKuIM KEtHF22qNah820TbLXJ6PFQg83lvL5+7/D6Es8VTbwlcsf8vmQ3q+PakU1jKRzofbKAxsP Rj3Agaj7tjsySgwXqvEZoYxrwLUT1/0uWOwkim+SqUVnrkJrkiu5unCEC/kpbA== ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1755701621; a=rsa-sha256; cv=none; b=XIATaJIwBRkpQhQdKB1Ronx73JNZfd1Sw6twkNKp9oJjldmyDNiZUv5IHcmps0hQ0a7EGd yCuLHNdeLeFeiP47bLl/AgxKaT2cgcHzVG7zU/3Kujw/f3zZDP3WfTYt4LqyozflevVXHi WGi27mOUUKOfitmQ7BXmErVjKqnNUn0peRtMgPxsOPiOYp5/GDYJoj2bknGBK7aR8/pg6m /3nkNhuYGy0oRcIZd+4qZPAAgVajCDNsOCqNigkpk1j6GgmMTY4pFEqB8GhJ9tcEJhe+my 7Y0352k6iiQMF3KRYQyfI7Q3lX4yma0YXaNaRgTn4T0TKE27PShbuceQkpVlLQ== ARC-Authentication-Results: i=1; mx1.freebsd.org; none Received: from gitrepo.freebsd.org (gitrepo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:5]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 4c6Twj3CDbz2v7; Wed, 20 Aug 2025 14:53:41 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from gitrepo.freebsd.org ([127.0.1.44]) by gitrepo.freebsd.org (8.18.1/8.18.1) with ESMTP id 57KErfpj028551; Wed, 20 Aug 2025 14:53:41 GMT (envelope-from git@gitrepo.freebsd.org) Received: (from git@localhost) by gitrepo.freebsd.org (8.18.1/8.18.1/Submit) id 57KErfKE028548; Wed, 20 Aug 2025 14:53:41 GMT (envelope-from git) Date: Wed, 20 Aug 2025 14:53:41 GMT Message-Id: <202508201453.57KErfKE028548@gitrepo.freebsd.org> To: src-committers@FreeBSD.org, dev-commits-src-all@FreeBSD.org, dev-commits-src-main@FreeBSD.org From: Gleb Smirnoff Subject: git: 9dbd8ee11675 - main - libsa/zfs: refactor vdev tree for better resiliency against stale disks List-Id: Commit messages for the main branch of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-main List-Help: List-Post: List-Subscribe: List-Unsubscribe: X-BeenThere: dev-commits-src-main@freebsd.org Sender: owner-dev-commits-src-main@FreeBSD.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Git-Committer: glebius X-Git-Repository: src X-Git-Refname: refs/heads/main X-Git-Reftype: branch X-Git-Commit: 9dbd8ee1167520eafdd373f6a995f1232e024b1e Auto-Submitted: auto-generated The branch main has been updated by glebius: URL: https://cgit.FreeBSD.org/src/commit/?id=9dbd8ee1167520eafdd373f6a995f1232e024b1e commit 9dbd8ee1167520eafdd373f6a995f1232e024b1e Author: Gleb Smirnoff AuthorDate: 2025-08-20 14:51:21 +0000 Commit: Gleb Smirnoff CommitDate: 2025-08-20 14:51:21 +0000 libsa/zfs: refactor vdev tree for better resiliency against stale disks Before this change in vdev_insert() we would avoid inserting a duplicate vdev to the list of children, however this duplicate being unlinked from the parent is still stored on the global list with initialized v_guid. Such leaked duplicate can later be returned by vdev_find(). After 6dd0803ffd31 such leaked vdev may be freed or pointing to a freed parent, which leads to a loader crash. Note that the leak problem was there before 6dd0803ffd31. First, in vdev_insert() free conflicting vdev and return the existing one. Update callers accordingly. There is only one caller that actually may encounter this condition. Second, eliminate global list of vdevs and make vdev_find() to work recursively on the tree that a caller must provide. Of course, a chance of GUID collision between members of different pools is extremely low. The main motivation here is just to increase code robustness and fully isolate the data structures of different pools being tasted by the loader, and make easier debugging of bugs like the one being fixed. Reviewed by: mav, imp Differential Revision: https://reviews.freebsd.org/D51912 Fixes: 6dd0803ffd31c60a84488d06928813353c6303d3 --- stand/libsa/zfs/zfsimpl.c | 50 ++++++++++++++++++++------------------------- sys/cddl/boot/zfs/zfsimpl.h | 1 - 2 files changed, 22 insertions(+), 29 deletions(-) diff --git a/stand/libsa/zfs/zfsimpl.c b/stand/libsa/zfs/zfsimpl.c index d923abeee56e..fb5b5fb4606f 100644 --- a/stand/libsa/zfs/zfsimpl.c +++ b/stand/libsa/zfs/zfsimpl.c @@ -106,11 +106,6 @@ typedef struct indirect_vsd { list_t iv_splits; /* list of indirect_split_t's */ } indirect_vsd_t; -/* - * List of all vdevs, chained through v_alllink. - */ -static vdev_list_t zfs_vdevs; - /* * List of supported read-incompatible ZFS features. Do not add here features * marked as ZFEATURE_FLAG_READONLY_COMPAT, they are irrelevant for read-only! @@ -167,7 +162,6 @@ vdev_indirect_mapping_entry_phys_t * static void zfs_init(void) { - STAILQ_INIT(&zfs_vdevs); STAILQ_INIT(&zfs_pools); dnode_cache_buf = malloc(SPA_MAXBLOCKSIZE); @@ -840,15 +834,18 @@ vdev_replacing_read(vdev_t *vdev, const blkptr_t *bp, void *buf, } static vdev_t * -vdev_find(uint64_t guid) +vdev_find(vdev_list_t *list, uint64_t guid) { - vdev_t *vdev; + vdev_t *vdev, *safe; - STAILQ_FOREACH(vdev, &zfs_vdevs, v_alllink) + STAILQ_FOREACH_SAFE(vdev, list, v_childlink, safe) { if (vdev->v_guid == guid) return (vdev); + if ((vdev = vdev_find(&vdev->v_children, guid)) != NULL) + return (vdev); + } - return (0); + return (NULL); } static vdev_t * @@ -871,7 +868,6 @@ vdev_create(uint64_t guid, vdev_read_t *_read) if (_read != NULL) { vic = &vdev->vdev_indirect_config; vic->vic_prev_indirect_vdev = UINT64_MAX; - STAILQ_INSERT_TAIL(&zfs_vdevs, vdev, v_alllink); } } @@ -1069,7 +1065,7 @@ vdev_child_count(vdev_t *vdev) /* * Insert vdev into top_vdev children list. List is ordered by v_id. */ -static void +static vdev_t * vdev_insert(vdev_t *top_vdev, vdev_t *vdev) { vdev_t *previous; @@ -1091,7 +1087,8 @@ vdev_insert(vdev_t *top_vdev, vdev_t *vdev) * This vdev was configured from label config, * do not insert duplicate. */ - return; + free(vdev); + return (previous); } else { STAILQ_INSERT_AFTER(&top_vdev->v_children, previous, vdev, v_childlink); @@ -1100,6 +1097,7 @@ vdev_insert(vdev_t *top_vdev, vdev_t *vdev) count = vdev_child_count(top_vdev); if (top_vdev->v_nchildren < count) top_vdev->v_nchildren = count; + return (vdev); } static int @@ -1111,7 +1109,7 @@ vdev_from_nvlist(spa_t *spa, uint64_t top_guid, uint64_t txg, int rc, nkids; /* Get top vdev. */ - top_vdev = vdev_find(top_guid); + top_vdev = vdev_find(&spa->spa_root_vdev->v_children, top_guid); if (top_vdev == NULL) { rc = vdev_init(top_guid, nvlist, &top_vdev); if (rc != 0) @@ -1119,7 +1117,7 @@ vdev_from_nvlist(spa_t *spa, uint64_t top_guid, uint64_t txg, top_vdev->v_spa = spa; top_vdev->v_top = top_vdev; top_vdev->v_txg = txg; - vdev_insert(spa->spa_root_vdev, top_vdev); + (void )vdev_insert(spa->spa_root_vdev, top_vdev); } /* Add children if there are any. */ @@ -1140,7 +1138,7 @@ vdev_from_nvlist(spa_t *spa, uint64_t top_guid, uint64_t txg, vdev->v_spa = spa; vdev->v_top = top_vdev; - vdev_insert(top_vdev, vdev); + vdev = vdev_insert(top_vdev, vdev); } } else { /* @@ -1227,14 +1225,14 @@ vdev_set_state(vdev_t *vdev) } static int -vdev_update_from_nvlist(uint64_t top_guid, const nvlist_t *nvlist) +vdev_update_from_nvlist(vdev_t *root, uint64_t top_guid, const nvlist_t *nvlist) { vdev_t *vdev; nvlist_t **kids = NULL; int rc, nkids; /* Update top vdev. */ - vdev = vdev_find(top_guid); + vdev = vdev_find(&root->v_children, top_guid); if (vdev != NULL) vdev_set_initial_state(vdev, nvlist); @@ -1250,7 +1248,7 @@ vdev_update_from_nvlist(uint64_t top_guid, const nvlist_t *nvlist) if (rc != 0) break; - vdev = vdev_find(guid); + vdev = vdev_find(&root->v_children, guid); if (vdev != NULL) vdev_set_initial_state(vdev, kids[i]); } @@ -1266,10 +1264,6 @@ vdev_update_from_nvlist(uint64_t top_guid, const nvlist_t *nvlist) return (rc); } -/* - * Shall not be called on root vdev, that is not linked into zfs_vdevs. - * See comment in vdev_create(). - */ static void vdev_free(struct vdev *vdev) { @@ -1277,7 +1271,6 @@ vdev_free(struct vdev *vdev) STAILQ_FOREACH_SAFE(kid, &vdev->v_children, v_childlink, safe) vdev_free(kid); - STAILQ_REMOVE(&zfs_vdevs, vdev, vdev, v_alllink); free(vdev); } @@ -1324,7 +1317,7 @@ vdev_init_from_nvlist(spa_t *spa, const nvlist_t *nvlist) NULL, &guid, NULL); if (rc != 0) break; - vdev = vdev_find(guid); + vdev = vdev_find(&spa->spa_root_vdev->v_children, guid); /* * Top level vdev is missing, create it. * XXXGL: how can this happen? @@ -1332,7 +1325,8 @@ vdev_init_from_nvlist(spa_t *spa, const nvlist_t *nvlist) if (vdev == NULL) rc = vdev_from_nvlist(spa, guid, 0, kids[i]); else - rc = vdev_update_from_nvlist(guid, kids[i]); + rc = vdev_update_from_nvlist(spa->spa_root_vdev, guid, + kids[i]); if (rc != 0) break; } @@ -2138,7 +2132,7 @@ vdev_probe(vdev_phys_read_t *_read, vdev_phys_write_t *_write, void *priv, * be some kind of alias (overlapping slices, dangerously dedicated * disks etc). */ - vdev = vdev_find(guid); + vdev = vdev_find(&spa->spa_root_vdev->v_children, guid); /* Has this vdev already been inited? */ if (vdev && vdev->v_phys_read) { nvlist_destroy(nvl); @@ -2154,7 +2148,7 @@ vdev_probe(vdev_phys_read_t *_read, vdev_phys_write_t *_write, void *priv, * We should already have created an incomplete vdev for this * vdev. Find it and initialise it with our read proc. */ - vdev = vdev_find(guid); + vdev = vdev_find(&spa->spa_root_vdev->v_children, guid); if (vdev != NULL) { vdev->v_phys_read = _read; vdev->v_phys_write = _write; diff --git a/sys/cddl/boot/zfs/zfsimpl.h b/sys/cddl/boot/zfs/zfsimpl.h index 915aeeda3c9e..d3bc090738ff 100644 --- a/sys/cddl/boot/zfs/zfsimpl.h +++ b/sys/cddl/boot/zfs/zfsimpl.h @@ -2021,7 +2021,6 @@ typedef struct vdev_indirect_config { typedef struct vdev { STAILQ_ENTRY(vdev) v_childlink; /* link in parent's child list */ - STAILQ_ENTRY(vdev) v_alllink; /* link in global vdev list */ vdev_list_t v_children; /* children of this vdev */ const char *v_name; /* vdev name */ uint64_t v_guid; /* vdev guid */