Date: Fri, 04 Feb 2011 10:50:45 -0700 From: "Justin T. Gibbs" <gibbs@scsiguy.com> To: fs@FreeBSD.org Subject: [PATCH] ZFS: Access vdevs by GUIDs when necessary/appropriate. Message-ID: <4D4C3C75.2000803@scsiguy.com>
next in thread | raw e-mail | index | archive | help
This is a multi-part message in MIME format. --------------050803040204080800080206 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit The attached patch addresses three issues: o The Zpool command (via libzfs) does not display vdev GUID information in all cases where it is the only safe way to refer to a vdev. For example, once a vdev is removed, there is no guarantee that it will return via the same devfs node. The Zpool command already returned GUID information for vdevs found to be missing at system boot time. This change does this for vdevs that go missing some time after a pool is initially imported/mounted. o The Zpool command does not allow a vdev to be referenced by GUID for operations such as "online" or "replace". This is especially frustrating when the vdev you wish to replace was "last seen at" a devfs path that is currently owned by an active member of the pool. In this situation I know of no way to properly replace a vdev, or in the case of an online, to insure that the device is associated with the correct in-core vdev metadata. o The ZFS vdev geom provider uses a incomplete heuristic to determine if a vdev open is for a "new device" or one that has been seen/labeled before. This heuristic fails during a "zpool online" operation allowing the open to associate with a geom provider at the "last seen at" device path when a device exists in the system (at a different device path) with a proper match of the pool and vdev GUIDs referenced in the open. These patches are against -current. Checkin comments indicating what I changed and why are also in the attached file. I haven't reviewed the userland aspects of the pending v28 import yet, but I know that the vdev geom changes are appropriate there too. I would appreciate review and comments on these changes before I commit them to -current and -stable. Thanks, Justin --------------050803040204080800080206 Content-Type: text/plain; name="vdev_guids.diffs" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="vdev_guids.diffs" Change 477305 by justing@justing-ns1 on 2011/02/03 14:32:58 Allow zpool operations on vdev's specified by GUID. cddl/contrib/opensolaris/cmd/zpool/zpool_main.c: The "zpool status" command reports the "last seen at" device node path when the vdev name is being reported by GUID. Augment this code to assume a GUID is reported when a device goes missing after initial boot in addition to the previous behavior of doing this for devices that weren't seen at boot. cddl/contrib/opensolaris/lib/libzfs/common/libzfs_pool.c: o In zpool_vdev_name(), report recently missing devices by GUID. There is no guarantee they will return at their previous location. o In vdev_to_nvlist_iter(), remove the test for the ZPOOL_CONFIG_NOT_PRESENT vdev attribute before allowing a search by GUID. The search parameter unambiguously indicates the desired search behavior and this test made it impossible to search for an active (or removed but active at boot) device by GUID. Affected files ... ... //depot/SpectraBSD/head/cddl/contrib/opensolaris/cmd/zpool/zpool_main.c#5 edit ... //depot/SpectraBSD/head/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_pool.c#6 edit Differences ... ==== //depot/SpectraBSD/head/cddl/contrib/opensolaris/cmd/zpool/zpool_main.c#5 (text) ==== @@ -1071,7 +1071,8 @@ } if (nvlist_lookup_uint64(nv, ZPOOL_CONFIG_NOT_PRESENT, - ¬present) == 0) { + ¬present) == 0 || + vs->vs_state <= VDEV_STATE_CANT_OPEN) { char *path; verify(nvlist_lookup_string(nv, ZPOOL_CONFIG_PATH, &path) == 0); (void) printf(" was %s", path); ==== //depot/SpectraBSD/head/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_pool.c#6 (text) ==== @@ -1391,11 +1391,9 @@ verify(nvlist_lookup_uint64(nv, ZPOOL_CONFIG_GUID, &theguid) == 0); - if (search == NULL && - nvlist_lookup_uint64(nv, ZPOOL_CONFIG_NOT_PRESENT, &present) == 0) { + if (search == NULL) { /* - * If the device has never been present since import, the only - * reliable way to match the vdev is by GUID. + * Search by GUID. */ if (theguid == guid) return (nv); @@ -2396,9 +2394,17 @@ char buf[64]; vdev_stat_t *vs; uint_t vsc; + int have_stats; - if (nvlist_lookup_uint64(nv, ZPOOL_CONFIG_NOT_PRESENT, - &value) == 0) { + have_stats = nvlist_lookup_uint64_array(nv, ZPOOL_CONFIG_STATS, + (uint64_t **)&vs, &vsc) == 0; + + /* + * If the device is not currently present, assume it will not + * come back at the same device path. Display the device by GUID. + */ + if (nvlist_lookup_uint64(nv, ZPOOL_CONFIG_NOT_PRESENT, &value) == 0 || + have_stats != 0 && vs->vs_state <= VDEV_STATE_CANT_OPEN) { verify(nvlist_lookup_uint64(nv, ZPOOL_CONFIG_GUID, &value) == 0); (void) snprintf(buf, sizeof (buf), "%llu", @@ -2412,8 +2418,7 @@ * open a misbehaving device, which can have undesirable * effects. */ - if ((nvlist_lookup_uint64_array(nv, ZPOOL_CONFIG_STATS, - (uint64_t **)&vs, &vsc) != 0 || + if ((have_stats == 0 || vs->vs_state >= VDEV_STATE_DEGRADED) && zhp != NULL && nvlist_lookup_string(nv, ZPOOL_CONFIG_DEVID, &devid) == 0) { Change 477306 by justing@justing-ns1 on 2011/02/03 14:49:16 Modify the geom vdev provider's open behavior so that it will only unconditionally open a device by path if the open is part of a pool create or device add operation, and a search of all known geom provider's label data doesn't yield a device with matching pool and vdev GUIDs. This fixes a bug where the wrong disk could be associated with a vdev's configuration data when device devfs paths change due to insert and remove events. While, ZFS detects this kind of coding mixup and immediately flags the device as faulted before the confusion can cause permanent data loss, a reboot was necessary in order to resurrect the configuration. sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c: o When opening by GUID, require both the pool and vdev GUIDs to match. While it is highly unlikely for two vdevs to have the same vdev GUIDs, the ZFS storage pool allocator only guarantees they are unique within a pool. o Modify the open behavior to: - Open by recorded device path with GUID matching - If that fails, search all geom providers for a device with matching GUIDs. - If that fails and we are opening a "new to a pool configuration" vdev, open by path. - Otherwise fail the open. Affected files ... ... //depot/SpectraBSD/head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c#8 edit Differences ... ==== //depot/SpectraBSD/head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c#8 (text) ==== @@ -160,20 +160,26 @@ } } -static uint64_t -nvlist_get_guid(nvlist_t *list) +static void +nvlist_get_guids(nvlist_t *list, uint64_t *pguid, uint64_t *vguid) { nvpair_t *elem = NULL; - uint64_t value; + *vguid = 0; + *pguid = 0; while ((elem = nvlist_next_nvpair(list, elem)) != NULL) { - if (nvpair_type(elem) == DATA_TYPE_UINT64 && - strcmp(nvpair_name(elem), "guid") == 0) { - VERIFY(nvpair_value_uint64(elem, &value) == 0); - return (value); + if (nvpair_type(elem) != DATA_TYPE_UINT64) + continue; + + if (strcmp(nvpair_name(elem), ZPOOL_CONFIG_POOL_GUID) == 0) { + VERIFY(nvpair_value_uint64(elem, pguid) == 0); + } else if (strcmp(nvpair_name(elem), ZPOOL_CONFIG_GUID) == 0) { + VERIFY(nvpair_value_uint64(elem, vguid) == 0); } + + if (*pguid != 0 && *vguid != 0) + break; } - return (0); } static int @@ -211,8 +217,8 @@ return (error); } -static uint64_t -vdev_geom_read_guid(struct g_consumer *cp) +static void +vdev_geom_read_guids(struct g_consumer *cp, uint64_t *pguid, uint64_t *vguid) { struct g_provider *pp; vdev_label_t *label; @@ -220,16 +226,17 @@ size_t buflen; uint64_t psize; off_t offset, size; - uint64_t guid; int error, l, len, iszvol; g_topology_assert_not(); + *pguid = 0; + *vguid = 0; pp = cp->provider; ZFS_LOG(1, "Reading guid from %s...", pp->name); if (g_getattr("ZFS::iszvol", cp, &iszvol) == 0 && iszvol) { ZFS_LOG(1, "Skipping ZVOL-based provider %s.", pp->name); - return (0); + return; } psize = pp->mediasize; @@ -238,7 +245,6 @@ size = sizeof(*label) + pp->sectorsize - ((sizeof(*label) - 1) % pp->sectorsize) - 1; - guid = 0; label = kmem_alloc(size, KM_SLEEP); buflen = sizeof(label->vl_vdev_phys.vp_nvlist); @@ -256,20 +262,21 @@ if (nvlist_unpack(buf, buflen, &config, 0) != 0) continue; - guid = nvlist_get_guid(config); + nvlist_get_guids(config, pguid, vguid); nvlist_free(config); - if (guid != 0) + if (*pguid != 0 && *vguid != 0) break; } kmem_free(label, size); - if (guid != 0) - ZFS_LOG(1, "guid for %s is %ju", pp->name, (uintmax_t)guid); - return (guid); + if (*pguid != 0 && *vguid != 0) + ZFS_LOG(1, "guid for %s is %ju:%ju", pp->name, + (uintmax_t)*pguid, (uintmax_t)*vguid); } struct vdev_geom_find { - uint64_t guid; + uint64_t pool_guid; + uint64_t vdev_guid; struct g_consumer *cp; }; @@ -289,7 +296,8 @@ struct g_geom *gp, *zgp; struct g_provider *pp; struct g_consumer *zcp; - uint64_t guid; + uint64_t pguid; + uint64_t vguid; g_topology_assert(); @@ -315,15 +323,16 @@ continue; } g_topology_unlock(); - guid = vdev_geom_read_guid(zcp); + vdev_geom_read_guids(zcp, &pguid, &vguid); g_topology_lock(); g_access(zcp, -1, 0, 0); g_detach(zcp); - if (guid != ap->guid) + if (pguid != ap->pool_guid || + vguid != ap->vdev_guid) continue; ap->cp = vdev_geom_attach(pp); if (ap->cp == NULL) { - printf("ZFS WARNING: Unable to attach to %s.\n", + ZFS_LOG(1, "ZFS WARNING: Unable to attach to %s.", pp->name); continue; } @@ -338,13 +347,14 @@ } static struct g_consumer * -vdev_geom_attach_by_guid(uint64_t guid) +vdev_geom_attach_by_guid(uint64_t pool_guid, uint64_t vdev_guid) { struct vdev_geom_find *ap; struct g_consumer *cp; ap = kmem_zalloc(sizeof(*ap), KM_SLEEP); - ap->guid = guid; + ap->pool_guid = pool_guid; + ap->vdev_guid = vdev_guid; g_waitfor_event(vdev_geom_attach_by_guid_event, ap, M_WAITOK, NULL); cp = ap->cp; kmem_free(ap, sizeof(*ap)); @@ -352,14 +362,14 @@ } static struct g_consumer * -vdev_geom_open_by_guid(vdev_t *vd) +vdev_geom_open_by_guids(vdev_t *vd) { struct g_consumer *cp; char *buf; size_t len; ZFS_LOG(1, "Searching by guid [%ju].", (uintmax_t)vd->vdev_guid); - cp = vdev_geom_attach_by_guid(vd->vdev_guid); + cp = vdev_geom_attach_by_guid(spa_guid(vd->vdev_spa), vd->vdev_guid); if (cp != NULL) { len = strlen(cp->provider->name) + strlen("/dev/") + 1; buf = kmem_alloc(len, KM_SLEEP); @@ -368,10 +378,12 @@ spa_strfree(vd->vdev_path); vd->vdev_path = buf; - ZFS_LOG(1, "Attach by guid [%ju] succeeded, provider %s.", + ZFS_LOG(1, "Attach by guid [%ju:%ju] succeeded, provider %s.", + (uintmax_t)spa_guid(vd->vdev_spa), (uintmax_t)vd->vdev_guid, vd->vdev_path); } else { - ZFS_LOG(1, "Search by guid [%ju] failed.", + ZFS_LOG(1, "Search by guid [%ju:%ju] failed.", + (uintmax_t)spa_guid(vd->vdev_spa), (uintmax_t)vd->vdev_guid); } @@ -383,7 +395,8 @@ { struct g_provider *pp; struct g_consumer *cp; - uint64_t guid; + uint64_t pguid; + uint64_t vguid; cp = NULL; g_topology_lock(); @@ -393,14 +406,17 @@ cp = vdev_geom_attach(pp); if (cp != NULL && check_guid) { g_topology_unlock(); - guid = vdev_geom_read_guid(cp); + vdev_geom_read_guids(cp, &pguid, &vguid); g_topology_lock(); - if (guid != vd->vdev_guid) { + if (pguid != spa_guid(vd->vdev_spa) || + vguid != vd->vdev_guid) { vdev_geom_detach(cp, 0); cp = NULL; ZFS_LOG(1, "guid mismatch for provider %s: " - "%ju != %ju.", vd->vdev_path, - (uintmax_t)vd->vdev_guid, (uintmax_t)guid); + "%ju:%ju != %ju:%ju.", vd->vdev_path, + (uintmax_t)spa_guid(vd->vdev_spa), + (uintmax_t)vd->vdev_guid, + (uintmax_t)pguid, (uintmax_t)vguid); } else { ZFS_LOG(1, "guid match for provider %s.", vd->vdev_path); @@ -434,22 +450,42 @@ error = 0; /* - * If we're creating pool, just find GEOM provider by its name - * and ignore GUID mismatches. + * Try using the recorded path for this device, but only + * accept it if its label data contains the expected GUIDs. */ - if (vd->vdev_spa->spa_load_state == SPA_LOAD_NONE) + cp = vdev_geom_open_by_path(vd, 1); + if (cp == NULL) { + /* + * The device at vd->vdev_path doesn't have the + * expected GUIDs. The disks might have merely + * moved around so try all other GEOM providers + * to find one with the right GUIDs. + */ + cp = vdev_geom_open_by_guids(vd); + } + + if (cp == NULL && + vd->vdev_prevstate == VDEV_STATE_UNKNOWN && + vd->vdev_spa->spa_load_state == SPA_LOAD_NONE) { + /* + * We are dealing with a vdev that hasn't been previosly + * opened (since boot), and we are not loading an + * existing pool configuration. This looks like a + * vdev add operation to a new or existing pool. + * Assume the user knows what he/she is doing and find + * GEOM provider by its name, ignoring GUID mismatches. + * + * XXPOLICY: It would be safer to only allow a device + * that is devoid of label information to be + * opened in this fashion. This would require + * a new option to the zpool command line tool + * allowing the label information to be wiped + * on a device, and augmented error reporting + * so the user can understand why their request + * failed and the required steps to repurpose + * the device. + */ cp = vdev_geom_open_by_path(vd, 0); - else { - cp = vdev_geom_open_by_path(vd, 1); - if (cp == NULL) { - /* - * The device at vd->vdev_path doesn't have the - * expected guid. The disks might have merely - * moved around so try all other GEOM providers - * to find one with the right guid. - */ - cp = vdev_geom_open_by_guid(vd); - } } if (cp == NULL) { --------------050803040204080800080206--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4D4C3C75.2000803>