Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 22 May 2020 16:51:00 +0000 (UTC)
From:      Andriy Gapon <avg@FreeBSD.org>
To:        src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-stable@freebsd.org, svn-src-stable-12@freebsd.org
Subject:   svn commit: r361391 - in stable/12: cddl/contrib/opensolaris/cmd/zdb cddl/contrib/opensolaris/cmd/zfs cddl/contrib/opensolaris/cmd/zpool cddl/contrib/opensolaris/cmd/ztest cddl/contrib/opensolaris/...
Message-ID:  <202005221651.04MGp0ET010395@repo.freebsd.org>

next in thread | raw e-mail | index | archive | help
Author: avg
Date: Fri May 22 16:51:00 2020
New Revision: 361391
URL: https://svnweb.freebsd.org/changeset/base/361391

Log:
  MFC r354941,r354948: 10601 10757 Pool allocation classes
  
  MFV r354382,r354385: 10601 10757 Pool allocation classes
  
  illumos/illumos-gate@663207adb1669640c01c5ec6949ce78fd806efae
  https://github.com/illumos/illumos-gate/commit/663207adb1669640c01c5ec6949ce78fd806efae
  
  10601 Pool allocation classes
  https://www.illumos.org/issues/10601
    illumos port of ZoL Pool allocation classes. Includes at least these two
    commits:
    441709695 Pool allocation classes misplacing small file blocks
    cc99f275a Pool allocation classes
  
  10757 Add -gLp to zpool subcommands for alt vdev names
  https://www.illumos.org/issues/10757
    Port from ZoL of
    d2f3e292d Add -gLp to zpool subcommands for alt vdev names
    Note that a subsequent ZoL commit changed -p to -P
    a77f29f93 Change full path subcommand flag from -p to -P
  
  Portions contributed by: Jerry Jelinek <jerry.jelinek@joyent.com>
  Portions contributed by: HÃ¥kan Johansson <f96hajo@chalmers.se>
  Portions contributed by: Richard Yao <ryao@gentoo.org>
  Portions contributed by: Chunwei Chen <david.chen@nutanix.com>
  Portions contributed by: loli10K <ezomori.nozomu@gmail.com>
  Author: Don Brady <don.brady@delphix.com>
  
  11541 allocation_classes feature must be enabled to add log device
  
  illumos/illumos-gate@c1064fd7ce62fe763a4475e9988ffea3b22137de
  https://github.com/illumos/illumos-gate/commit/c1064fd7ce62fe763a4475e9988ffea3b22137de
  
  https://www.illumos.org/issues/11541
    After the allocation_classes feature was integrated, one can no longer add a
    log device to a pool unless that feature is enabled. There is an explicit check
    for this, but it is unnecessary in the case of log devices, so we should handle
    this better instead of forcing the feature to be enabled.
  
  Author: Jerry Jelinek <jerry.jelinek@joyent.com>
  
  FreeBSD notes.
  I faithfully added the new -g, -L, -P flags, but only -g does something:
  vdev GUIDs are displayed instead of device names.  -L, resolve symlinks,
  and -P, display full disk paths, do nothing at the moment.
  The use of special vdevs is backward compatible for read-only access, so
  root pools should be bootable, but exercise caution.
  
  MFV r354383: 10592 misc. metaslab and vdev related ZoL bug fixes
  
  illumos/illumos-gate@555d674d5d4b8191dc83723188349d28278b2431
  https://github.com/illumos/illumos-gate/commit/555d674d5d4b8191dc83723188349d28278b2431
  
  https://www.illumos.org/issues/10592
    This is a collection of recent fixes from ZoL:
    8eef997679b Error path in metaslab_load_impl() forgets to drop ms_sync_lock
    928e8ad47d3 Introduce auxiliary metaslab histograms
    425d3237ee8 Get rid of space_map_update() for ms_synced_length
    6c926f426a2 Simplify log vdev removal code
    21e7cf5da89 zdb -L should skip leak detection altogether
    df72b8bebe0 Rename range_tree_verify to range_tree_verify_not_present
    75058f33034 Remove unused vdev_t fields

Modified:
  stable/12/cddl/contrib/opensolaris/cmd/zdb/zdb.8
  stable/12/cddl/contrib/opensolaris/cmd/zdb/zdb.c
  stable/12/cddl/contrib/opensolaris/cmd/zfs/zfs.8
  stable/12/cddl/contrib/opensolaris/cmd/zpool/zpool-features.7
  stable/12/cddl/contrib/opensolaris/cmd/zpool/zpool.8
  stable/12/cddl/contrib/opensolaris/cmd/zpool/zpool_main.c
  stable/12/cddl/contrib/opensolaris/cmd/zpool/zpool_vdev.c
  stable/12/cddl/contrib/opensolaris/cmd/ztest/ztest.c
  stable/12/cddl/contrib/opensolaris/lib/libzfs/common/libzfs.h
  stable/12/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_dataset.c
  stable/12/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_pool.c
  stable/12/cddl/contrib/opensolaris/lib/libzpool/common/util.c
  stable/12/stand/libsa/zfs/zfsimpl.c
  stable/12/sys/cddl/contrib/opensolaris/common/zfs/zfeature_common.c
  stable/12/sys/cddl/contrib/opensolaris/common/zfs/zfeature_common.h
  stable/12/sys/cddl/contrib/opensolaris/common/zfs/zfs_prop.c
  stable/12/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c
  stable/12/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_objset.c
  stable/12/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/metaslab.c
  stable/12/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/range_tree.c
  stable/12/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c
  stable/12/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_checkpoint.c
  stable/12/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_misc.c
  stable/12/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/space_map.c
  stable/12/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dmu.h
  stable/12/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dmu_objset.h
  stable/12/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/metaslab.h
  stable/12/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/metaslab_impl.h
  stable/12/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/range_tree.h
  stable/12/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/spa.h
  stable/12/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/spa_impl.h
  stable/12/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/space_map.h
  stable/12/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/vdev.h
  stable/12/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/vdev_impl.h
  stable/12/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio.h
  stable/12/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c
  stable/12/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_indirect.c
  stable/12/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_indirect_mapping.c
  stable/12/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_initialize.c
  stable/12/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_label.c
  stable/12/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_removal.c
  stable/12/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c
  stable/12/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c
  stable/12/sys/cddl/contrib/opensolaris/uts/common/sys/fs/zfs.h
Directory Properties:
  stable/12/   (props changed)

Modified: stable/12/cddl/contrib/opensolaris/cmd/zdb/zdb.8
==============================================================================
--- stable/12/cddl/contrib/opensolaris/cmd/zdb/zdb.8	Fri May 22 16:29:09 2020	(r361390)
+++ stable/12/cddl/contrib/opensolaris/cmd/zdb/zdb.8	Fri May 22 16:51:00 2020	(r361391)
@@ -10,7 +10,7 @@
 .\"
 .\"
 .\" Copyright 2012, Richard Lowe.
-.\" Copyright (c) 2012, 2017 by Delphix. All rights reserved.
+.\" Copyright (c) 2012, 2018 by Delphix. All rights reserved.
 .\" Copyright 2017 Nexenta Systems, Inc.
 .\"
 .Dd October 06, 2017
@@ -187,7 +187,7 @@ If the
 .Fl u
 option is also specified, also display the uberblocks on this device.
 .It Fl L
-Disable leak tracing and the loading of space maps.
+Disable leak detection and the loading of space maps.
 By default,
 .Nm
 verifies that all non-free blocks are referenced, which can be very expensive.

Modified: stable/12/cddl/contrib/opensolaris/cmd/zdb/zdb.c
==============================================================================
--- stable/12/cddl/contrib/opensolaris/cmd/zdb/zdb.c	Fri May 22 16:29:09 2020	(r361390)
+++ stable/12/cddl/contrib/opensolaris/cmd/zdb/zdb.c	Fri May 22 16:51:00 2020	(r361391)
@@ -21,7 +21,7 @@
 
 /*
  * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
- * Copyright (c) 2011, 2017 by Delphix. All rights reserved.
+ * Copyright (c) 2011, 2018 by Delphix. All rights reserved.
  * Copyright (c) 2014 Integros [integros.com]
  * Copyright 2017 Nexenta Systems, Inc.
  * Copyright (c) 2017, 2018 Lawrence Livermore National Security, LLC.
@@ -785,18 +785,21 @@ dump_spacemap(objset_t *os, space_map_t *sm)
 		return;
 
 	(void) printf("space map object %llu:\n",
-	    (longlong_t)sm->sm_phys->smp_object);
-	(void) printf("  smp_objsize = 0x%llx\n",
-	    (longlong_t)sm->sm_phys->smp_objsize);
+	    (longlong_t)sm->sm_object);
+	(void) printf("  smp_length = 0x%llx\n",
+	    (longlong_t)sm->sm_phys->smp_length);
 	(void) printf("  smp_alloc = 0x%llx\n",
 	    (longlong_t)sm->sm_phys->smp_alloc);
 
+	if (dump_opt['d'] < 6 && dump_opt['m'] < 4)
+		return;
+
 	/*
 	 * Print out the freelist entries in both encoded and decoded form.
 	 */
 	uint8_t mapshift = sm->sm_shift;
 	int64_t alloc = 0;
-	uint64_t word;
+	uint64_t word, entry_id = 0;
 	for (uint64_t offset = 0; offset < space_map_length(sm);
 	    offset += sizeof (word)) {
 
@@ -804,11 +807,12 @@ dump_spacemap(objset_t *os, space_map_t *sm)
 		    sizeof (word), &word, DMU_READ_PREFETCH));
 
 		if (sm_entry_is_debug(word)) {
-			(void) printf("\t    [%6llu] %s: txg %llu, pass %llu\n",
-			    (u_longlong_t)(offset / sizeof (word)),
+			(void) printf("\t    [%6llu] %s: txg %llu pass %llu\n",
+			    (u_longlong_t)entry_id,
 			    ddata[SM_DEBUG_ACTION_DECODE(word)],
 			    (u_longlong_t)SM_DEBUG_TXG_DECODE(word),
 			    (u_longlong_t)SM_DEBUG_SYNCPASS_DECODE(word));
+			entry_id++;
 			continue;
 		}
 
@@ -846,7 +850,7 @@ dump_spacemap(objset_t *os, space_map_t *sm)
 
 		(void) printf("\t    [%6llu]    %c  range:"
 		    " %010llx-%010llx  size: %06llx vdev: %06llu words: %u\n",
-		    (u_longlong_t)(offset / sizeof (word)),
+		    (u_longlong_t)entry_id,
 		    entry_type, (u_longlong_t)entry_off,
 		    (u_longlong_t)(entry_off + entry_run),
 		    (u_longlong_t)entry_run,
@@ -856,8 +860,9 @@ dump_spacemap(objset_t *os, space_map_t *sm)
 			alloc += entry_run;
 		else
 			alloc -= entry_run;
+		entry_id++;
 	}
-	if ((uint64_t)alloc != space_map_allocated(sm)) {
+	if (alloc != space_map_allocated(sm)) {
 		(void) printf("space_map_object alloc (%lld) INCONSISTENT "
 		    "with space map summary (%lld)\n",
 		    (longlong_t)space_map_allocated(sm), (longlong_t)alloc);
@@ -921,23 +926,30 @@ dump_metaslab(metaslab_t *msp)
 		    SPACE_MAP_HISTOGRAM_SIZE, sm->sm_shift);
 	}
 
-	if (dump_opt['d'] > 5 || dump_opt['m'] > 3) {
-		ASSERT(msp->ms_size == (1ULL << vd->vdev_ms_shift));
-
-		dump_spacemap(spa->spa_meta_objset, msp->ms_sm);
-	}
+	ASSERT(msp->ms_size == (1ULL << vd->vdev_ms_shift));
+	dump_spacemap(spa->spa_meta_objset, msp->ms_sm);
 }
 
 static void
 print_vdev_metaslab_header(vdev_t *vd)
 {
-	(void) printf("\tvdev %10llu\n\t%-10s%5llu   %-19s   %-15s   %-10s\n",
-	    (u_longlong_t)vd->vdev_id,
+	vdev_alloc_bias_t alloc_bias = vd->vdev_alloc_bias;
+	const char *bias_str;
+
+	bias_str = (alloc_bias == VDEV_BIAS_LOG || vd->vdev_islog) ?
+	    VDEV_ALLOC_BIAS_LOG :
+	    (alloc_bias == VDEV_BIAS_SPECIAL) ? VDEV_ALLOC_BIAS_SPECIAL :
+	    (alloc_bias == VDEV_BIAS_DEDUP) ? VDEV_ALLOC_BIAS_DEDUP :
+	    vd->vdev_islog ? "log" : "";
+
+	(void) printf("\tvdev %10llu   %s\n"
+	    "\t%-10s%5llu   %-19s   %-15s   %-12s\n",
+	    (u_longlong_t)vd->vdev_id, bias_str,
 	    "metaslabs", (u_longlong_t)vd->vdev_ms_count,
 	    "offset", "spacemap", "free");
-	(void) printf("\t%15s   %19s   %15s   %10s\n",
+	(void) printf("\t%15s   %19s   %15s   %12s\n",
 	    "---------------", "-------------------",
-	    "---------------", "-------------");
+	    "---------------", "------------");
 }
 
 static void
@@ -953,7 +965,7 @@ dump_metaslab_groups(spa_t *spa)
 		vdev_t *tvd = rvd->vdev_child[c];
 		metaslab_group_t *mg = tvd->vdev_mg;
 
-		if (mg->mg_class != mc)
+		if (mg == NULL || mg->mg_class != mc)
 			continue;
 
 		metaslab_group_histogram_verify(mg);
@@ -2807,6 +2819,7 @@ typedef struct zdb_blkstats {
 	uint64_t zb_count;
 	uint64_t zb_gangs;
 	uint64_t zb_ditto_samevdev;
+	uint64_t zb_ditto_same_ms;
 	uint64_t zb_psize_histogram[PSIZE_HISTO_SIZE];
 } zdb_blkstats_t;
 
@@ -2846,6 +2859,16 @@ typedef struct zdb_cb {
 	uint32_t	**zcb_vd_obsolete_counts;
 } zdb_cb_t;
 
+/* test if two DVA offsets from same vdev are within the same metaslab */
+static boolean_t
+same_metaslab(spa_t *spa, uint64_t vdev, uint64_t off1, uint64_t off2)
+{
+	vdev_t *vd = vdev_lookup_top(spa, vdev);
+	uint64_t ms_shift = vd->vdev_ms_shift;
+
+	return ((off1 >> ms_shift) == (off2 >> ms_shift));
+}
+
 static void
 zdb_count_block(zdb_cb_t *zcb, zilog_t *zilog, const blkptr_t *bp,
     dmu_object_type_t type)
@@ -2857,6 +2880,8 @@ zdb_count_block(zdb_cb_t *zcb, zilog_t *zilog, const b
 	if (zilog && zil_bp_tree_add(zilog, bp) != 0)
 		return;
 
+	spa_config_enter(zcb->zcb_spa, SCL_CONFIG, FTAG, RW_READER);
+
 	for (int i = 0; i < 4; i++) {
 		int l = (i < 2) ? BP_GET_LEVEL(bp) : ZB_TOTAL;
 		int t = (i & 1) ? type : ZDB_OT_TOTAL;
@@ -2882,8 +2907,15 @@ zdb_count_block(zdb_cb_t *zcb, zilog_t *zilog, const b
 		switch (BP_GET_NDVAS(bp)) {
 		case 2:
 			if (DVA_GET_VDEV(&bp->blk_dva[0]) ==
-			    DVA_GET_VDEV(&bp->blk_dva[1]))
+			    DVA_GET_VDEV(&bp->blk_dva[1])) {
 				zb->zb_ditto_samevdev++;
+
+				if (same_metaslab(zcb->zcb_spa,
+				    DVA_GET_VDEV(&bp->blk_dva[0]),
+				    DVA_GET_OFFSET(&bp->blk_dva[0]),
+				    DVA_GET_OFFSET(&bp->blk_dva[1])))
+					zb->zb_ditto_same_ms++;
+			}
 			break;
 		case 3:
 			equal = (DVA_GET_VDEV(&bp->blk_dva[0]) ==
@@ -2892,13 +2924,37 @@ zdb_count_block(zdb_cb_t *zcb, zilog_t *zilog, const b
 			    DVA_GET_VDEV(&bp->blk_dva[2])) +
 			    (DVA_GET_VDEV(&bp->blk_dva[1]) ==
 			    DVA_GET_VDEV(&bp->blk_dva[2]));
-			if (equal != 0)
+			if (equal != 0) {
 				zb->zb_ditto_samevdev++;
+
+				if (DVA_GET_VDEV(&bp->blk_dva[0]) ==
+				    DVA_GET_VDEV(&bp->blk_dva[1]) &&
+				    same_metaslab(zcb->zcb_spa,
+				    DVA_GET_VDEV(&bp->blk_dva[0]),
+				    DVA_GET_OFFSET(&bp->blk_dva[0]),
+				    DVA_GET_OFFSET(&bp->blk_dva[1])))
+					zb->zb_ditto_same_ms++;
+				else if (DVA_GET_VDEV(&bp->blk_dva[0]) ==
+				    DVA_GET_VDEV(&bp->blk_dva[2]) &&
+				    same_metaslab(zcb->zcb_spa,
+				    DVA_GET_VDEV(&bp->blk_dva[0]),
+				    DVA_GET_OFFSET(&bp->blk_dva[0]),
+				    DVA_GET_OFFSET(&bp->blk_dva[2])))
+					zb->zb_ditto_same_ms++;
+				else if (DVA_GET_VDEV(&bp->blk_dva[1]) ==
+				    DVA_GET_VDEV(&bp->blk_dva[2]) &&
+				    same_metaslab(zcb->zcb_spa,
+				    DVA_GET_VDEV(&bp->blk_dva[1]),
+				    DVA_GET_OFFSET(&bp->blk_dva[1]),
+				    DVA_GET_OFFSET(&bp->blk_dva[2])))
+					zb->zb_ditto_same_ms++;
+			}
 			break;
 		}
-
 	}
 
+	spa_config_exit(zcb->zcb_spa, SCL_CONFIG, FTAG);
+
 	if (BP_IS_EMBEDDED(bp)) {
 		zcb->zcb_embedded_blocks[BPE_GET_ETYPE(bp)]++;
 		zcb->zcb_embedded_histogram[BPE_GET_ETYPE(bp)]
@@ -3086,6 +3142,8 @@ zdb_ddt_leak_init(spa_t *spa, zdb_cb_t *zcb)
 	ddt_entry_t dde;
 	int error;
 
+	ASSERT(!dump_opt['L']);
+
 	bzero(&ddb, sizeof (ddb));
 	while ((error = ddt_walk(spa, &ddb, &dde)) == 0) {
 		blkptr_t blk;
@@ -3109,12 +3167,10 @@ zdb_ddt_leak_init(spa_t *spa, zdb_cb_t *zcb)
 				zcb->zcb_dedup_blocks++;
 			}
 		}
-		if (!dump_opt['L']) {
-			ddt_t *ddt = spa->spa_ddt[ddb.ddb_checksum];
-			ddt_enter(ddt);
-			VERIFY(ddt_lookup(ddt, &blk, B_TRUE) != NULL);
-			ddt_exit(ddt);
-		}
+		ddt_t *ddt = spa->spa_ddt[ddb.ddb_checksum];
+		ddt_enter(ddt);
+		VERIFY(ddt_lookup(ddt, &blk, B_TRUE) != NULL);
+		ddt_exit(ddt);
 	}
 
 	ASSERT(error == ENOENT);
@@ -3156,6 +3212,9 @@ claim_segment_cb(void *arg, uint64_t offset, uint64_t 
 static void
 zdb_claim_removing(spa_t *spa, zdb_cb_t *zcb)
 {
+	if (dump_opt['L'])
+		return;
+
 	if (spa->spa_vdev_removal == NULL)
 		return;
 
@@ -3247,7 +3306,6 @@ zdb_load_obsolete_counts(vdev_t *vd)
 		space_map_t *prev_obsolete_sm = NULL;
 		VERIFY0(space_map_open(&prev_obsolete_sm, spa->spa_meta_objset,
 		    scip->scip_prev_obsolete_sm_object, 0, vd->vdev_asize, 0));
-		space_map_update(prev_obsolete_sm);
 		vdev_indirect_mapping_load_obsolete_spacemap(vim, counts,
 		    prev_obsolete_sm);
 		space_map_close(prev_obsolete_sm);
@@ -3341,9 +3399,9 @@ zdb_leak_init_vdev_exclude_checkpoint(vdev_t *vd, zdb_
 
 	VERIFY0(space_map_open(&checkpoint_sm, spa_meta_objset(spa),
 	    checkpoint_sm_obj, 0, vd->vdev_asize, vd->vdev_ashift));
-	space_map_update(checkpoint_sm);
 
 	VERIFY0(space_map_iterate(checkpoint_sm,
+	    space_map_length(checkpoint_sm),
 	    checkpoint_sm_exclude_entry_cb, &cseea));
 	space_map_close(checkpoint_sm);
 
@@ -3353,6 +3411,8 @@ zdb_leak_init_vdev_exclude_checkpoint(vdev_t *vd, zdb_
 static void
 zdb_leak_init_exclude_checkpoint(spa_t *spa, zdb_cb_t *zcb)
 {
+	ASSERT(!dump_opt['L']);
+
 	vdev_t *rvd = spa->spa_root_vdev;
 	for (uint64_t c = 0; c < rvd->vdev_children; c++) {
 		ASSERT3U(c, ==, rvd->vdev_child[c]->vdev_id);
@@ -3449,6 +3509,8 @@ load_indirect_ms_allocatable_tree(vdev_t *vd, metaslab
 static void
 zdb_leak_init_prepare_indirect_vdevs(spa_t *spa, zdb_cb_t *zcb)
 {
+	ASSERT(!dump_opt['L']);
+
 	vdev_t *rvd = spa->spa_root_vdev;
 	for (uint64_t c = 0; c < rvd->vdev_children; c++) {
 		vdev_t *vd = rvd->vdev_child[c];
@@ -3495,67 +3557,63 @@ zdb_leak_init(spa_t *spa, zdb_cb_t *zcb)
 {
 	zcb->zcb_spa = spa;
 
-	if (!dump_opt['L']) {
-		dsl_pool_t *dp = spa->spa_dsl_pool;
-		vdev_t *rvd = spa->spa_root_vdev;
+	if (dump_opt['L'])
+		return;
 
-		/*
-		 * We are going to be changing the meaning of the metaslab's
-		 * ms_allocatable.  Ensure that the allocator doesn't try to
-		 * use the tree.
-		 */
-		spa->spa_normal_class->mc_ops = &zdb_metaslab_ops;
-		spa->spa_log_class->mc_ops = &zdb_metaslab_ops;
+	dsl_pool_t *dp = spa->spa_dsl_pool;
+	vdev_t *rvd = spa->spa_root_vdev;
 
-		zcb->zcb_vd_obsolete_counts =
-		    umem_zalloc(rvd->vdev_children * sizeof (uint32_t *),
-		    UMEM_NOFAIL);
+	/*
+	 * We are going to be changing the meaning of the metaslab's
+	 * ms_allocatable.  Ensure that the allocator doesn't try to
+	 * use the tree.
+	 */
+	spa->spa_normal_class->mc_ops = &zdb_metaslab_ops;
+	spa->spa_log_class->mc_ops = &zdb_metaslab_ops;
 
-		/*
-		 * For leak detection, we overload the ms_allocatable trees
-		 * to contain allocated segments instead of free segments.
-		 * As a result, we can't use the normal metaslab_load/unload
-		 * interfaces.
-		 */
-		zdb_leak_init_prepare_indirect_vdevs(spa, zcb);
-		load_concrete_ms_allocatable_trees(spa, SM_ALLOC);
+	zcb->zcb_vd_obsolete_counts =
+	    umem_zalloc(rvd->vdev_children * sizeof (uint32_t *),
+	    UMEM_NOFAIL);
 
-		/*
-		 * On load_concrete_ms_allocatable_trees() we loaded all the
-		 * allocated entries from the ms_sm to the ms_allocatable for
-		 * each metaslab. If the pool has a checkpoint or is in the
-		 * middle of discarding a checkpoint, some of these blocks
-		 * may have been freed but their ms_sm may not have been
-		 * updated because they are referenced by the checkpoint. In
-		 * order to avoid false-positives during leak-detection, we
-		 * go through the vdev's checkpoint space map and exclude all
-		 * its entries from their relevant ms_allocatable.
-		 *
-		 * We also aggregate the space held by the checkpoint and add
-		 * it to zcb_checkpoint_size.
-		 *
-		 * Note that at this point we are also verifying that all the
-		 * entries on the checkpoint_sm are marked as allocated in
-		 * the ms_sm of their relevant metaslab.
-		 * [see comment in checkpoint_sm_exclude_entry_cb()]
-		 */
-		zdb_leak_init_exclude_checkpoint(spa, zcb);
+	/*
+	 * For leak detection, we overload the ms_allocatable trees
+	 * to contain allocated segments instead of free segments.
+	 * As a result, we can't use the normal metaslab_load/unload
+	 * interfaces.
+	 */
+	zdb_leak_init_prepare_indirect_vdevs(spa, zcb);
+	load_concrete_ms_allocatable_trees(spa, SM_ALLOC);
 
-		/* for cleaner progress output */
-		(void) fprintf(stderr, "\n");
+	/*
+	 * On load_concrete_ms_allocatable_trees() we loaded all the
+	 * allocated entries from the ms_sm to the ms_allocatable for
+	 * each metaslab. If the pool has a checkpoint or is in the
+	 * middle of discarding a checkpoint, some of these blocks
+	 * may have been freed but their ms_sm may not have been
+	 * updated because they are referenced by the checkpoint. In
+	 * order to avoid false-positives during leak-detection, we
+	 * go through the vdev's checkpoint space map and exclude all
+	 * its entries from their relevant ms_allocatable.
+	 *
+	 * We also aggregate the space held by the checkpoint and add
+	 * it to zcb_checkpoint_size.
+	 *
+	 * Note that at this point we are also verifying that all the
+	 * entries on the checkpoint_sm are marked as allocated in
+	 * the ms_sm of their relevant metaslab.
+	 * [see comment in checkpoint_sm_exclude_entry_cb()]
+	 */
+	zdb_leak_init_exclude_checkpoint(spa, zcb);
+	ASSERT3U(zcb->zcb_checkpoint_size, ==, spa_get_checkpoint_space(spa));
 
-		if (bpobj_is_open(&dp->dp_obsolete_bpobj)) {
-			ASSERT(spa_feature_is_enabled(spa,
-			    SPA_FEATURE_DEVICE_REMOVAL));
-			(void) bpobj_iterate_nofree(&dp->dp_obsolete_bpobj,
-			    increment_indirect_mapping_cb, zcb, NULL);
-		}
-	} else {
-		/*
-		 * If leak tracing is disabled, we still need to consider
-		 * any checkpointed space in our space verification.
-		 */
-		zcb->zcb_checkpoint_size += spa_get_checkpoint_space(spa);
+	/* for cleaner progress output */
+	(void) fprintf(stderr, "\n");
+
+	if (bpobj_is_open(&dp->dp_obsolete_bpobj)) {
+		ASSERT(spa_feature_is_enabled(spa,
+		    SPA_FEATURE_DEVICE_REMOVAL));
+		(void) bpobj_iterate_nofree(&dp->dp_obsolete_bpobj,
+		    increment_indirect_mapping_cb, zcb, NULL);
 	}
 
 	spa_config_enter(spa, SCL_CONFIG, FTAG, RW_READER);
@@ -3636,52 +3694,58 @@ zdb_check_for_obsolete_leaks(vdev_t *vd, zdb_cb_t *zcb
 static boolean_t
 zdb_leak_fini(spa_t *spa, zdb_cb_t *zcb)
 {
+	if (dump_opt['L'])
+		return (B_FALSE);
+
 	boolean_t leaks = B_FALSE;
-	if (!dump_opt['L']) {
-		vdev_t *rvd = spa->spa_root_vdev;
-		for (unsigned c = 0; c < rvd->vdev_children; c++) {
-			vdev_t *vd = rvd->vdev_child[c];
-			metaslab_group_t *mg = vd->vdev_mg;
 
-			if (zcb->zcb_vd_obsolete_counts[c] != NULL) {
-				leaks |= zdb_check_for_obsolete_leaks(vd, zcb);
-			}
+	vdev_t *rvd = spa->spa_root_vdev;
+	for (unsigned c = 0; c < rvd->vdev_children; c++) {
+		vdev_t *vd = rvd->vdev_child[c];
+#if DEBUG
+		metaslab_group_t *mg = vd->vdev_mg;
+#endif
 
-			for (uint64_t m = 0; m < vd->vdev_ms_count; m++) {
-				metaslab_t *msp = vd->vdev_ms[m];
-				ASSERT3P(mg, ==, msp->ms_group);
+		if (zcb->zcb_vd_obsolete_counts[c] != NULL) {
+			leaks |= zdb_check_for_obsolete_leaks(vd, zcb);
+		}
 
-				/*
-				 * ms_allocatable has been overloaded
-				 * to contain allocated segments. Now that
-				 * we finished traversing all blocks, any
-				 * block that remains in the ms_allocatable
-				 * represents an allocated block that we
-				 * did not claim during the traversal.
-				 * Claimed blocks would have been removed
-				 * from the ms_allocatable.  For indirect
-				 * vdevs, space remaining in the tree
-				 * represents parts of the mapping that are
-				 * not referenced, which is not a bug.
-				 */
-				if (vd->vdev_ops == &vdev_indirect_ops) {
-					range_tree_vacate(msp->ms_allocatable,
-					    NULL, NULL);
-				} else {
-					range_tree_vacate(msp->ms_allocatable,
-					    zdb_leak, vd);
-				}
+		for (uint64_t m = 0; m < vd->vdev_ms_count; m++) {
+			metaslab_t *msp = vd->vdev_ms[m];
+			ASSERT3P(mg, ==, msp->ms_group);
 
-				if (msp->ms_loaded) {
-					msp->ms_loaded = B_FALSE;
-				}
+			/*
+			 * ms_allocatable has been overloaded
+			 * to contain allocated segments. Now that
+			 * we finished traversing all blocks, any
+			 * block that remains in the ms_allocatable
+			 * represents an allocated block that we
+			 * did not claim during the traversal.
+			 * Claimed blocks would have been removed
+			 * from the ms_allocatable.  For indirect
+			 * vdevs, space remaining in the tree
+			 * represents parts of the mapping that are
+			 * not referenced, which is not a bug.
+			 */
+			if (vd->vdev_ops == &vdev_indirect_ops) {
+				range_tree_vacate(msp->ms_allocatable,
+				    NULL, NULL);
+			} else {
+				range_tree_vacate(msp->ms_allocatable,
+				    zdb_leak, vd);
 			}
+
+			if (msp->ms_loaded) {
+				msp->ms_loaded = B_FALSE;
+			}
 		}
 
-		umem_free(zcb->zcb_vd_obsolete_counts,
-		    rvd->vdev_children * sizeof (uint32_t *));
-		zcb->zcb_vd_obsolete_counts = NULL;
 	}
+
+	umem_free(zcb->zcb_vd_obsolete_counts,
+	    rvd->vdev_children * sizeof (uint32_t *));
+	zcb->zcb_vd_obsolete_counts = NULL;
+
 	return (leaks);
 }
 
@@ -3709,6 +3773,7 @@ dump_block_stats(spa_t *spa)
 	uint64_t norm_alloc, norm_space, total_alloc, total_found;
 	int flags = TRAVERSE_PRE | TRAVERSE_PREFETCH_METADATA | TRAVERSE_HARD;
 	boolean_t leaks = B_FALSE;
+	int err;
 
 	bzero(&zcb, sizeof (zcb));
 	(void) printf("\nTraversing all blocks %s%s%s%s%s...\n\n",
@@ -3719,13 +3784,18 @@ dump_block_stats(spa_t *spa)
 	    !dump_opt['L'] ? "nothing leaked " : "");
 
 	/*
-	 * Load all space maps as SM_ALLOC maps, then traverse the pool
-	 * claiming each block we discover.  If the pool is perfectly
-	 * consistent, the space maps will be empty when we're done.
-	 * Anything left over is a leak; any block we can't claim (because
-	 * it's not part of any space map) is a double allocation,
-	 * reference to a freed block, or an unclaimed log block.
+	 * When leak detection is enabled we load all space maps as SM_ALLOC
+	 * maps, then traverse the pool claiming each block we discover. If
+	 * the pool is perfectly consistent, the segment trees will be empty
+	 * when we're done. Anything left over is a leak; any block we can't
+	 * claim (because it's not part of any space map) is a double
+	 * allocation, reference to a freed block, or an unclaimed log block.
+	 *
+	 * When leak detection is disabled (-L option) we still traverse the
+	 * pool claiming each block we discover, but we skip opening any space
+	 * maps.
 	 */
+	bzero(&zcb, sizeof (zdb_cb_t));
 	zdb_leak_init(spa, &zcb);
 
 	/*
@@ -3751,8 +3821,10 @@ dump_block_stats(spa_t *spa)
 		flags |= TRAVERSE_PREFETCH_DATA;
 
 	zcb.zcb_totalasize = metaslab_class_get_alloc(spa_normal_class(spa));
+	zcb.zcb_totalasize += metaslab_class_get_alloc(spa_special_class(spa));
+	zcb.zcb_totalasize += metaslab_class_get_alloc(spa_dedup_class(spa));
 	zcb.zcb_start = zcb.zcb_lastprint = gethrtime();
-	zcb.zcb_haderrors |= traverse_pool(spa, 0, flags, zdb_blkptr_cb, &zcb);
+	err = traverse_pool(spa, 0, flags, zdb_blkptr_cb, &zcb);
 
 	/*
 	 * If we've traversed the data blocks then we need to wait for those
@@ -3768,6 +3840,12 @@ dump_block_stats(spa_t *spa)
 		}
 	}
 
+	/*
+	 * Done after zio_wait() since zcb_haderrors is modified in
+	 * zdb_blkptr_done()
+	 */
+	zcb.zcb_haderrors |= err;
+
 	if (zcb.zcb_haderrors) {
 		(void) printf("\nError counts:\n\n");
 		(void) printf("\t%5s  %s\n", "errno", "count");
@@ -3789,15 +3867,17 @@ dump_block_stats(spa_t *spa)
 	norm_alloc = metaslab_class_get_alloc(spa_normal_class(spa));
 	norm_space = metaslab_class_get_space(spa_normal_class(spa));
 
-	total_alloc = norm_alloc + metaslab_class_get_alloc(spa_log_class(spa));
+	total_alloc = norm_alloc +
+	    metaslab_class_get_alloc(spa_log_class(spa)) +
+	    metaslab_class_get_alloc(spa_special_class(spa)) +
+	    metaslab_class_get_alloc(spa_dedup_class(spa));
 	total_found = tzb->zb_asize - zcb.zcb_dedup_asize +
 	    zcb.zcb_removing_size + zcb.zcb_checkpoint_size;
 
-	if (total_found == total_alloc) {
-		if (!dump_opt['L'])
-			(void) printf("\n\tNo leaks (block sum matches space"
-			    " maps exactly)\n");
-	} else {
+	if (total_found == total_alloc && !dump_opt['L']) {
+		(void) printf("\n\tNo leaks (block sum matches space"
+		    " maps exactly)\n");
+	} else if (!dump_opt['L']) {
 		(void) printf("block traversal size %llu != alloc %llu "
 		    "(%s %lld)\n",
 		    (u_longlong_t)total_found,
@@ -3811,31 +3891,50 @@ dump_block_stats(spa_t *spa)
 		return (2);
 
 	(void) printf("\n");
-	(void) printf("\tbp count:      %10llu\n",
+	(void) printf("\t%-16s %14llu\n", "bp count:",
 	    (u_longlong_t)tzb->zb_count);
-	(void) printf("\tganged count:  %10llu\n",
+	(void) printf("\t%-16s %14llu\n", "ganged count:",
 	    (longlong_t)tzb->zb_gangs);
-	(void) printf("\tbp logical:    %10llu      avg: %6llu\n",
+	(void) printf("\t%-16s %14llu      avg: %6llu\n", "bp logical:",
 	    (u_longlong_t)tzb->zb_lsize,
 	    (u_longlong_t)(tzb->zb_lsize / tzb->zb_count));
-	(void) printf("\tbp physical:   %10llu      avg:"
-	    " %6llu     compression: %6.2f\n",
-	    (u_longlong_t)tzb->zb_psize,
+	(void) printf("\t%-16s %14llu      avg: %6llu     compression: %6.2f\n",
+	    "bp physical:", (u_longlong_t)tzb->zb_psize,
 	    (u_longlong_t)(tzb->zb_psize / tzb->zb_count),
 	    (double)tzb->zb_lsize / tzb->zb_psize);
-	(void) printf("\tbp allocated:  %10llu      avg:"
-	    " %6llu     compression: %6.2f\n",
-	    (u_longlong_t)tzb->zb_asize,
+	(void) printf("\t%-16s %14llu      avg: %6llu     compression: %6.2f\n",
+	    "bp allocated:", (u_longlong_t)tzb->zb_asize,
 	    (u_longlong_t)(tzb->zb_asize / tzb->zb_count),
 	    (double)tzb->zb_lsize / tzb->zb_asize);
-	(void) printf("\tbp deduped:    %10llu    ref>1:"
-	    " %6llu   deduplication: %6.2f\n",
-	    (u_longlong_t)zcb.zcb_dedup_asize,
+	(void) printf("\t%-16s %14llu    ref>1: %6llu   deduplication: %6.2f\n",
+	    "bp deduped:", (u_longlong_t)zcb.zcb_dedup_asize,
 	    (u_longlong_t)zcb.zcb_dedup_blocks,
 	    (double)zcb.zcb_dedup_asize / tzb->zb_asize + 1.0);
-	(void) printf("\tSPA allocated: %10llu     used: %5.2f%%\n",
+	(void) printf("\t%-16s %14llu     used: %5.2f%%\n", "Normal class:",
 	    (u_longlong_t)norm_alloc, 100.0 * norm_alloc / norm_space);
 
+	if (spa_special_class(spa)->mc_rotor != NULL) {
+		uint64_t alloc = metaslab_class_get_alloc(
+		    spa_special_class(spa));
+		uint64_t space = metaslab_class_get_space(
+		    spa_special_class(spa));
+
+		(void) printf("\t%-16s %14llu     used: %5.2f%%\n",
+		    "Special class", (u_longlong_t)alloc,
+		    100.0 * alloc / space);
+	}
+
+	if (spa_dedup_class(spa)->mc_rotor != NULL) {
+		uint64_t alloc = metaslab_class_get_alloc(
+		    spa_dedup_class(spa));
+		uint64_t space = metaslab_class_get_space(
+		    spa_dedup_class(spa));
+
+		(void) printf("\t%-16s %14llu     used: %5.2f%%\n",
+		    "Dedup class", (u_longlong_t)alloc,
+		    100.0 * alloc / space);
+	}
+
 	for (bp_embedded_type_t i = 0; i < NUM_BP_EMBEDDED_TYPES; i++) {
 		if (zcb.zcb_embedded_blocks[i] == 0)
 			continue;
@@ -3857,6 +3956,10 @@ dump_block_stats(spa_t *spa)
 		(void) printf("\tDittoed blocks on same vdev: %llu\n",
 		    (longlong_t)tzb->zb_ditto_samevdev);
 	}
+	if (tzb->zb_ditto_same_ms != 0) {
+		(void) printf("\tDittoed blocks in same metaslab: %llu\n",
+		    (longlong_t)tzb->zb_ditto_same_ms);
+	}
 
 	for (uint64_t v = 0; v < spa->spa_root_vdev->vdev_children; v++) {
 		vdev_t *vd = spa->spa_root_vdev->vdev_child[v];
@@ -4114,7 +4217,6 @@ verify_device_removal_feature_counts(spa_t *spa)
 			    spa->spa_meta_objset,
 			    scip->scip_prev_obsolete_sm_object,
 			    0, vd->vdev_asize, 0));
-			space_map_update(prev_obsolete_sm);
 			dump_spacemap(spa->spa_meta_objset, prev_obsolete_sm);
 			(void) printf("\n");
 			space_map_close(prev_obsolete_sm);
@@ -4320,7 +4422,8 @@ verify_checkpoint_sm_entry_cb(space_map_entry_t *sme, 
 	 * their respective ms_allocateable trees should not contain them.
 	 */
 	mutex_enter(&ms->ms_lock);
-	range_tree_verify(ms->ms_allocatable, sme->sme_offset, sme->sme_run);
+	range_tree_verify_not_present(ms->ms_allocatable,
+	    sme->sme_offset, sme->sme_run);
 	mutex_exit(&ms->ms_lock);
 
 	return (0);
@@ -4383,7 +4486,6 @@ verify_checkpoint_vdev_spacemaps(spa_t *checkpoint, sp
 		VERIFY0(space_map_open(&checkpoint_sm, spa_meta_objset(current),
 		    checkpoint_sm_obj, 0, current_vd->vdev_asize,
 		    current_vd->vdev_ashift));
-		space_map_update(checkpoint_sm);
 
 		verify_checkpoint_sm_entry_cb_arg_t vcsec;
 		vcsec.vcsec_vd = ckpoint_vd;
@@ -4391,6 +4493,7 @@ verify_checkpoint_vdev_spacemaps(spa_t *checkpoint, sp
 		vcsec.vcsec_num_entries =
 		    space_map_length(checkpoint_sm) / sizeof (uint64_t);
 		VERIFY0(space_map_iterate(checkpoint_sm,
+		    space_map_length(checkpoint_sm),
 		    verify_checkpoint_sm_entry_cb, &vcsec));
 		dump_spacemap(current->spa_meta_objset, checkpoint_sm);
 		space_map_close(checkpoint_sm);
@@ -4470,7 +4573,7 @@ verify_checkpoint_ms_spacemaps(spa_t *checkpoint, spa_
 			 * are part of the checkpoint were freed by mistake.
 			 */
 			range_tree_walk(ckpoint_msp->ms_allocatable,
-			    (range_tree_func_t *)range_tree_verify,
+			    (range_tree_func_t *)range_tree_verify_not_present,
 			    current_msp->ms_allocatable);
 		}
 	}
@@ -4482,6 +4585,8 @@ verify_checkpoint_ms_spacemaps(spa_t *checkpoint, spa_
 static void
 verify_checkpoint_blocks(spa_t *spa)
 {
+	ASSERT(!dump_opt['L']);
+
 	spa_t *checkpoint_spa;
 	char *checkpoint_pool;
 	nvlist_t *config = NULL;
@@ -4547,7 +4652,6 @@ dump_leftover_checkpoint_blocks(spa_t *spa)
 
 		VERIFY0(space_map_open(&checkpoint_sm, spa_meta_objset(spa),
 		    checkpoint_sm_obj, 0, vd->vdev_asize, vd->vdev_ashift));
-		space_map_update(checkpoint_sm);
 		dump_spacemap(spa->spa_meta_objset, checkpoint_sm);
 		space_map_close(checkpoint_sm);
 	}

Modified: stable/12/cddl/contrib/opensolaris/cmd/zfs/zfs.8
==============================================================================
--- stable/12/cddl/contrib/opensolaris/cmd/zfs/zfs.8	Fri May 22 16:29:09 2020	(r361390)
+++ stable/12/cddl/contrib/opensolaris/cmd/zfs/zfs.8	Fri May 22 16:51:00 2020	(r361391)
@@ -1134,8 +1134,23 @@ This feature must be enabled to be used
 .Po see
 .Xr zpool-features 7
 .Pc .
+.It Sy special_small_blocks Ns = Ns Ar size
+This value represents the threshold block size for including small file
+blocks into the special allocation class.
+Blocks smaller than or equal to this value will be assigned to the special
+allocation class while greater blocks will be assigned to the regular class.
+Valid values are zero or a power of two from 512B up to 128K.
+The default size is 0 which means no small file blocks will be allocated in
+the special class.
+.Pp
+Before setting this property, a special class vdev must be added to the
+pool.
+See
+.Xr zpool 8
+for more details on the special allocation class.
 .It Sy mountpoint Ns = Ns Ar path | Cm none | legacy
-Controls the mount point used for this file system. See the
+Controls the mount point used for this file system.
+See the
 .Qq Sx Mount Points
 section for more information on how this property is used.
 .Pp
@@ -3021,7 +3036,7 @@ property of the filesystem or volume which is received
 To use this flag, the storage pool must have the
 .Sy extensible_dataset
 feature enabled.  See
-.Xr zpool-features 5
+.Xr zpool-features 7
 for details on ZFS feature flags.
 .El
 .It Xo

Modified: stable/12/cddl/contrib/opensolaris/cmd/zpool/zpool-features.7
==============================================================================
--- stable/12/cddl/contrib/opensolaris/cmd/zpool/zpool-features.7	Fri May 22 16:29:09 2020	(r361390)
+++ stable/12/cddl/contrib/opensolaris/cmd/zpool/zpool-features.7	Fri May 22 16:51:00 2020	(r361391)
@@ -632,6 +632,25 @@ and will return to being
 once all filesystems that have ever had their checksum set to
 .Sy skein
 are destroyed.
+.It Sy allocation_classes
+.Bl -column "READ\-ONLY COMPATIBLE" "com.intel:allocation_classes"
+.It GUID Ta com.intel:allocation_classes
+.It READ\-ONLY COMPATIBLE Ta yes
+.It DEPENDENCIES Ta none
+.El
+.Pp
+This feature enables support for separate allocation classes.
+.Pp
+This feature becomes
+.Sy active
+when a dedicated allocation class vdev
+(dedup or special) is created with
+.Dq zpool create
+or
+.Dq zpool add .
+With device removal, it can be returned to the
+.Sy enabled
+state if all the top-level vdevs from an allocation class are removed.
 .El
 .Sh SEE ALSO
 .Xr zpool 8

Modified: stable/12/cddl/contrib/opensolaris/cmd/zpool/zpool.8
==============================================================================
--- stable/12/cddl/contrib/opensolaris/cmd/zpool/zpool.8	Fri May 22 16:29:09 2020	(r361390)
+++ stable/12/cddl/contrib/opensolaris/cmd/zpool/zpool.8	Fri May 22 16:51:00 2020	(r361391)
@@ -24,6 +24,8 @@
 .\" Copyright (c) 2012, 2017 by Delphix. All Rights Reserved.
 .\" Copyright 2017 Nexenta Systems, Inc.
 .\" Copyright (c) 2017 Datto Inc.
+.\" Copyright (c) 2017 George Melikov. All Rights Reserved.
+.\" Copyright 2019 Joyent, Inc.
 .\"
 .\" $FreeBSD$
 .\"
@@ -38,7 +40,7 @@
 .Op Fl \&?
 .Nm
 .Cm add
-.Op Fl fn
+.Op Fl fgLnP
 .Ar pool vdev ...
 .Nm
 .Cm attach
@@ -127,17 +129,19 @@
 .Op Ar device Ns ...
 .Nm
 .Cm iostat
-.Op Fl T Cm d Ns | Ns Cm u
 .Op Fl v
+.Op Fl T Cm d Ns | Ns Cm u
+.Op Fl gLP
 .Op Ar pool
 .Ar ...
+.Op Ar inverval Op Ar count
 .Nm
 .Cm labelclear
 .Op Fl f
 .Ar device
 .Nm
 .Cm list
-.Op Fl Hpv
+.Op Fl HgLpPv
 .Op Fl o Ar property Ns Op , Ns Ar ...
 .Op Fl T Cm d Ns | Ns Cm u
 .Op Ar pool
@@ -179,7 +183,7 @@
 .Ar property Ns = Ns Ar value pool
 .Nm
 .Cm split
-.Op Fl n
+.Op Fl gLnP
 .Op Fl R Ar altroot
 .Op Fl o Ar mntopts
 .Op Fl o Ar property Ns = Ns Ar value
@@ -187,7 +191,7 @@
 .Op Ar device ...
 .Nm
 .Cm status
-.Op Fl Dvx
+.Op Fl DgLPvx
 .Op Fl T Cm d Ns | Ns Cm u
 .Op Ar pool
 .Ar ...
@@ -320,11 +324,27 @@ types are not supported for the intent log. For more i
 see the
 .Qq Sx Intent Log
 section.
+.It Sy dedup
+A device dedicated solely for allocating dedup data.
+The redundancy of this device should match the redundancy of the other normal
+devices in the pool.
+If more than one dedup device is specified, then allocations are load-balanced
+between devices.
+.It Sy special
+A device dedicated solely for allocating various kinds of internal metadata,
+and optionally small file data.
+The redundancy of this device should match the redundancy of the other normal
+devices in the pool.
+If more than one special device is specified, then allocations are
+load-balanced between devices.
+.Pp
+For more information on special allocations, see the
+.Sx Special Allocation Class
+section.
 .It Sy cache
-A device used to cache storage pool data. A cache device cannot be configured
-as a mirror or
-.No raidz
-group. For more information, see the
+A device used to cache storage pool data.
+A cache device cannot be configured as a mirror or raidz group.
+For more information, see the
 .Qq Sx Cache Devices
 section.
 .El
@@ -602,6 +622,31 @@ zfs properties) may be unenforceable while a checkpoin
 checkpoint is allowed to consume the dataset's reservation.
 Finally, data that is part of the checkpoint but has been freed in the
 current state of the pool won't be scanned during a scrub.
+.Ss Special Allocation Class
+The allocations in the special class are dedicated to specific block types.
+By default this includes all metadata, the indirect blocks of user data, and
+any dedup data.
+The class can also be provisioned to accept a limited percentage of small file
+data blocks.
+.Pp
+A pool must always have at least one general (non-specified) vdev before
+other devices can be assigned to the special class.
+If the special class becomes full, then allocations intended for it will spill
+back into the normal class.
+.Pp
+Dedup data can be excluded from the special class by setting the
+.Sy vfs.zfs.ddt_data_is_special
+sysctl to false (0).
+.Pp
+Inclusion of small file blocks in the special class is opt-in.
+Each dataset can control the size of small file blocks allowed in the special
+class by setting the
+.Sy special_small_blocks
+dataset property.
+It defaults to zero so you must opt-in by setting it to a non-zero value.
+See
+.Xr zfs 1M
+for more info on setting this property.
 .Ss Properties
 Each pool has several properties associated with it. Some properties are
 read-only statistics while others are configurable and change the behavior of
@@ -872,7 +917,7 @@ Displays a help message.
 .It Xo
 .Nm
 .Cm add
-.Op Fl fn
+.Op Fl fgLnP
 .Ar pool vdev ...
 .Xc
 .Pp
@@ -891,11 +936,30 @@ Forces use of
 .Ar vdev ,
 even if they appear in use or specify a conflicting replication level.
 Not all devices can be overridden in this manner.
+.It Fl g
+Display
+.Ar vdev ,
+GUIDs instead of the normal device names.
+These GUIDs can be used in place of
+device names for the zpool detach/offline/remove/replace commands.
+.It Fl L
+Display real paths for
+.Ar vdev Ns s
+resolving all symbolic links.
+This can be used to look up the current block
+device name regardless of the /dev/disk/ path used to open it.
 .It Fl n
 Displays the configuration that would be used without actually adding the
 .Ar vdev Ns s.
-The actual pool creation can still fail due to insufficient privileges or device
-sharing.
+The actual pool creation can still fail due to insufficient privileges or
+device sharing.
+.It Fl P
+Display real paths for
+.Ar vdev Ns s
+instead of only the last component of the path.
+This can be used in conjunction with the
+.Fl L
+flag.
 .El
 .It Xo
 .Nm
@@ -1512,7 +1576,7 @@ with no flags on the relevant target devices.
 .Nm
 .Cm iostat
 .Op Fl T Cm d Ns | Ns Cm u
-.Op Fl v
+.Op Fl gLPv
 .Op Ar pool
 .Ar ...
 .Op Ar interval Op Ar count
@@ -1544,10 +1608,25 @@ Use modifier
 .Cm u
 for unixtime
 .Pq equals Qq Ic date +%s .
+.It Fl g
+Display vdev GUIDs instead of the normal device names.
+These GUIDs can be used in place of device names for the zpool
+detach/offline/remove/replace commands.
+.It Fl L
+Display real paths for vdevs resolving all symbolic links.
+This can be used to look up the current block device name regardless of the
+.Pa /dev/disk/
+path used to open it.
+.It Fl P
+Display full paths for vdevs instead of only the last component of
+the path.
+This can be used in conjunction with the
+.Fl L
+flag.
 .It Fl v
-Verbose statistics. Reports usage statistics for individual
-.No vdev Ns s
-within the pool, in addition to the pool-wide statistics.
+Verbose statistics.

*** DIFF OUTPUT TRUNCATED AT 1000 LINES ***



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?202005221651.04MGp0ET010395>