From owner-svn-src-all@FreeBSD.ORG Mon Dec 10 14:37:23 2012 Return-Path: Delivered-To: svn-src-all@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 2E2376A5; Mon, 10 Dec 2012 14:37:23 +0000 (UTC) (envelope-from mm@FreeBSD.org) Received: from svn.freebsd.org (svn.freebsd.org [IPv6:2001:1900:2254:2068::e6a:0]) by mx1.freebsd.org (Postfix) with ESMTP id 0E5348FC08; Mon, 10 Dec 2012 14:37:23 +0000 (UTC) Received: from svn.freebsd.org (localhost [127.0.0.1]) by svn.freebsd.org (8.14.5/8.14.5) with ESMTP id qBAEbNDM034019; Mon, 10 Dec 2012 14:37:23 GMT (envelope-from mm@svn.freebsd.org) Received: (from mm@localhost) by svn.freebsd.org (8.14.5/8.14.5/Submit) id qBAEbIYY033985; Mon, 10 Dec 2012 14:37:18 GMT (envelope-from mm@svn.freebsd.org) Message-Id: <201212101437.qBAEbIYY033985@svn.freebsd.org> From: Martin Matuska Date: Mon, 10 Dec 2012 14:37:18 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-stable@freebsd.org, svn-src-stable-8@freebsd.org Subject: svn commit: r244088 - in stable/8: cddl/contrib/opensolaris/cmd/zfs cddl/contrib/opensolaris/cmd/zpool cddl/contrib/opensolaris/cmd/ztest sys/cddl/contrib/opensolaris/common/zfs sys/cddl/contrib/op... X-SVN-Group: stable-8 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 10 Dec 2012 14:37:23 -0000 Author: mm Date: Mon Dec 10 14:37:18 2012 New Revision: 244088 URL: http://svnweb.freebsd.org/changeset/base/244088 Log: MFC recent ZFS changes from illumos: 243503, 243524, 243525, 243560, 243561 MFC r243503: Illumos 13879:4eac7a87eff2 3329 spa_sync() spends 10-20% of its time in spa_free_sync_cb() 3330 space_seg_t should have its own kmem_cache 3331 deferred frees should happen after sync_pass 1 3335 make SYNC_PASS_* constants tunable New loader-only tunables: vfs.zfs.sync_pass_deferred_free vfs.zfs.sync_pass_dont_compress vfs.zfs.sync_pass_rewrite References: https://www.illumos.org/issues/3329 https://www.illumos.org/issues/3330 https://www.illumos.org/issues/3331 https://www.illumos.org/issues/3335 MFC r243524: Import the zio nop-write improvement from Illumos. To reduce I/O, nop-write omits overwriting data if the checksum (cryptographically secure) of new data matches the checksum of existing data. It also saves space if snapshots are in use. It currently works only on datasets with enabled compression, disabled deduplication and sha256 checksums. IllumOS 13887:196932ec9e6a and 13888:7204b3392a58 3236 zio nop-write References: https://www.illumos.org/issues/3236 MFC r243525: Add loader(8) tunable to enable/disable nopwrite functionality: vfs.zfs.nopwrite_enabled MFC r243560: Introduce a new dataset aclmode setting "restricted" to protect ACL's being destroyed or corrupted by a drive-by chmod. illumos-gate 13889:a67716f16746 3254 add support in zfs for aclmode=restricted MFC r243561: Update manpage dates in zfs.8 and zpool.8 Modified: stable/8/cddl/contrib/opensolaris/cmd/zfs/zfs.8 stable/8/cddl/contrib/opensolaris/cmd/zpool/zpool.8 stable/8/cddl/contrib/opensolaris/cmd/ztest/ztest.c stable/8/sys/cddl/contrib/opensolaris/common/zfs/zfs_prop.c stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/metaslab.c stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_misc.c stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/space_map.c stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dbuf.h stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dmu.h stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/metaslab_impl.h stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/spa.h stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/space_map.h stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio.h stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio_impl.h stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c Directory Properties: stable/8/cddl/contrib/opensolaris/ (props changed) stable/8/cddl/contrib/opensolaris/cmd/zfs/ (props changed) stable/8/sys/ (props changed) stable/8/sys/cddl/ (props changed) stable/8/sys/cddl/contrib/opensolaris/ (props changed) Modified: stable/8/cddl/contrib/opensolaris/cmd/zfs/zfs.8 ============================================================================== --- stable/8/cddl/contrib/opensolaris/cmd/zfs/zfs.8 Mon Dec 10 14:36:48 2012 (r244087) +++ stable/8/cddl/contrib/opensolaris/cmd/zfs/zfs.8 Mon Dec 10 14:37:18 2012 (r244088) @@ -26,7 +26,7 @@ .\" .\" $FreeBSD$ .\" -.Dd September 5, 2012 +.Dd November 26, 2012 .Dt ZFS 8 .Os .Sh NAME @@ -753,7 +753,7 @@ If no inheritable .Tn ACE Ns s exist that affect the mode, then the mode is set in accordance to the requested mode from the application. -.It Sy aclmode Ns = Ns Cm discard | groupmask | passthrough +.It Sy aclmode Ns = Ns Cm discard | groupmask | passthrough | restricted Controls how an .Tn ACL is modified during @@ -783,6 +783,32 @@ indicates that no changes are made to th other than creating or updating the necessary .Tn ACL entries to represent the new mode of the file or directory. +An +.Sy aclmode +property of +.Cm restricted +will cause the +.Xr chmod 2 +operation to return an error when used on any file or directory which has +a non-trivial +.Tn ACL +whose entries can not be represented by a mode. +.Xr chmod 2 +is required to change the set user ID, set group ID, or sticky bits on a file +or directory, as they do not have equivalent +.Tn ACL +entries. +In order to use +.Xr chmod 2 +on a file or directory with a non-trivial +.Tn ACL +when +.Sy aclmode +is set to +.Cm restricted , +you must first remove all +.Tn ACL +entries which do not represent the current mode. .It Sy atime Ns = Ns Cm on | off Controls whether the access time for files is updated when they are read. Turning this property off avoids producing write traffic when reading files and Modified: stable/8/cddl/contrib/opensolaris/cmd/zpool/zpool.8 ============================================================================== --- stable/8/cddl/contrib/opensolaris/cmd/zpool/zpool.8 Mon Dec 10 14:36:48 2012 (r244087) +++ stable/8/cddl/contrib/opensolaris/cmd/zpool/zpool.8 Mon Dec 10 14:37:18 2012 (r244088) @@ -25,7 +25,7 @@ .\" .\" $FreeBSD$ .\" -.Dd November 28, 2011 +.Dd November 15, 2012 .Dt ZPOOL 8 .Os .Sh NAME Modified: stable/8/cddl/contrib/opensolaris/cmd/ztest/ztest.c ============================================================================== --- stable/8/cddl/contrib/opensolaris/cmd/ztest/ztest.c Mon Dec 10 14:36:48 2012 (r244087) +++ stable/8/cddl/contrib/opensolaris/cmd/ztest/ztest.c Mon Dec 10 14:37:18 2012 (r244088) @@ -204,6 +204,7 @@ enum ztest_io_type { ZTEST_IO_WRITE_ZEROES, ZTEST_IO_TRUNCATE, ZTEST_IO_SETATTR, + ZTEST_IO_REWRITE, ZTEST_IO_TYPES }; @@ -1859,6 +1860,12 @@ ztest_get_data(void *arg, lr_write_t *lr DMU_READ_NO_PREFETCH); if (error == 0) { + blkptr_t *obp = dmu_buf_get_blkptr(db); + if (obp) { + ASSERT(BP_IS_HOLE(bp)); + *bp = *obp; + } + zgd->zgd_db = db; zgd->zgd_bp = bp; @@ -2004,6 +2011,9 @@ ztest_remove(ztest_ds_t *zd, ztest_od_t continue; } + /* + * No object was found. + */ if (od->od_object == 0) continue; @@ -2119,6 +2129,7 @@ ztest_prealloc(ztest_ds_t *zd, uint64_t static void ztest_io(ztest_ds_t *zd, uint64_t object, uint64_t offset) { + int err; ztest_block_tag_t wbt; dmu_object_info_t doi; enum ztest_io_type io_type; @@ -2171,6 +2182,25 @@ ztest_io(ztest_ds_t *zd, uint64_t object case ZTEST_IO_SETATTR: (void) ztest_setattr(zd, object); break; + + case ZTEST_IO_REWRITE: + (void) rw_rdlock(&ztest_name_lock); + err = ztest_dsl_prop_set_uint64(zd->zd_name, + ZFS_PROP_CHECKSUM, spa_dedup_checksum(ztest_spa), + B_FALSE); + VERIFY(err == 0 || err == ENOSPC); + err = ztest_dsl_prop_set_uint64(zd->zd_name, + ZFS_PROP_COMPRESSION, + ztest_random_dsl_prop(ZFS_PROP_COMPRESSION), + B_FALSE); + VERIFY(err == 0 || err == ENOSPC); + (void) rw_unlock(&ztest_name_lock); + + VERIFY0(dmu_read(zd->zd_os, object, offset, blocksize, data, + DMU_READ_NO_PREFETCH)); + + (void) ztest_write(zd, object, offset, blocksize, data); + break; } (void) rw_unlock(&zd->zd_zilog_lock); @@ -2258,7 +2288,12 @@ ztest_zil_remount(ztest_ds_t *zd, uint64 { objset_t *os = zd->zd_os; - VERIFY(mutex_lock(&zd->zd_dirobj_lock) == 0); + /* + * We grab the zd_dirobj_lock to ensure that no other thread is + * updating the zil (i.e. adding in-memory log records) and the + * zd_zilog_lock to block any I/O. + */ + VERIFY0(mutex_lock(&zd->zd_dirobj_lock)); (void) rw_wrlock(&zd->zd_zilog_lock); /* zfsvfs_teardown() */ @@ -4906,8 +4941,8 @@ ztest_ddt_repair(ztest_ds_t *zd, uint64_ */ for (int i = 0; i < copies; i++) { uint64_t offset = i * blocksize; - VERIFY(dmu_buf_hold(os, object, offset, FTAG, &db, - DMU_READ_NO_PREFETCH) == 0); + VERIFY0(dmu_buf_hold(os, object, offset, FTAG, &db, + DMU_READ_NO_PREFETCH)); ASSERT(db->db_offset == offset); ASSERT(db->db_size == blocksize); ASSERT(ztest_pattern_match(db->db_data, db->db_size, pattern) || @@ -4923,8 +4958,8 @@ ztest_ddt_repair(ztest_ds_t *zd, uint64_ /* * Find out what block we got. */ - VERIFY(dmu_buf_hold(os, object, 0, FTAG, &db, - DMU_READ_NO_PREFETCH) == 0); + VERIFY0(dmu_buf_hold(os, object, 0, FTAG, &db, + DMU_READ_NO_PREFETCH)); blk = *((dmu_buf_impl_t *)db)->db_blkptr; dmu_buf_rele(db, FTAG); @@ -5602,6 +5637,8 @@ ztest_freeze(void) kernel_init(FREAD | FWRITE); VERIFY3U(0, ==, spa_open(ztest_opts.zo_pool, &spa, FTAG)); VERIFY3U(0, ==, ztest_dataset_open(0)); + spa->spa_debug = B_TRUE; + ztest_spa = spa; /* * Force the first log block to be transactionally allocated. Modified: stable/8/sys/cddl/contrib/opensolaris/common/zfs/zfs_prop.c ============================================================================== --- stable/8/sys/cddl/contrib/opensolaris/common/zfs/zfs_prop.c Mon Dec 10 14:36:48 2012 (r244087) +++ stable/8/sys/cddl/contrib/opensolaris/common/zfs/zfs_prop.c Mon Dec 10 14:37:18 2012 (r244088) @@ -109,6 +109,7 @@ zfs_prop_init(void) { "discard", ZFS_ACL_DISCARD }, { "groupmask", ZFS_ACL_GROUPMASK }, { "passthrough", ZFS_ACL_PASSTHROUGH }, + { "restricted", ZFS_ACL_RESTRICTED }, { NULL } }; @@ -217,7 +218,8 @@ zfs_prop_init(void) "hidden | visible", "SNAPDIR", snapdir_table); zprop_register_index(ZFS_PROP_ACLMODE, "aclmode", ZFS_ACL_DISCARD, PROP_INHERIT, ZFS_TYPE_FILESYSTEM, - "discard | groupmask | passthrough", "ACLMODE", acl_mode_table); + "discard | groupmask | passthrough | restricted", "ACLMODE", + acl_mode_table); zprop_register_index(ZFS_PROP_ACLINHERIT, "aclinherit", ZFS_ACL_RESTRICTED, PROP_INHERIT, ZFS_TYPE_FILESYSTEM, "discard | noallow | restricted | passthrough | passthrough-x", Modified: stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c ============================================================================== --- stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c Mon Dec 10 14:36:48 2012 (r244087) +++ stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c Mon Dec 10 14:37:18 2012 (r244088) @@ -3606,6 +3606,12 @@ arc_write_done(zio_t *zio) arc_hdr_destroy(exists); exists = buf_hash_insert(hdr, &hash_lock); ASSERT3P(exists, ==, NULL); + } else if (zio->io_flags & ZIO_FLAG_NOPWRITE) { + /* nopwrite */ + ASSERT(zio->io_prop.zp_nopwrite); + if (!BP_EQUAL(&zio->io_bp_orig, zio->io_bp)) + panic("bad nopwrite, hdr=%p exists=%p", + (void *)hdr, (void *)exists); } else { /* Dedup */ ASSERT(hdr->b_datacnt == 1); Modified: stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c ============================================================================== --- stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c Mon Dec 10 14:36:48 2012 (r244087) +++ stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c Mon Dec 10 14:37:18 2012 (r244088) @@ -768,13 +768,15 @@ dbuf_unoverride(dbuf_dirty_record_t *dr) ASSERT(db->db_data_pending != dr); /* free this block */ - if (!BP_IS_HOLE(bp)) { + if (!BP_IS_HOLE(bp) && !dr->dt.dl.dr_nopwrite) { spa_t *spa; DB_GET_SPA(&spa, db); zio_free(spa, txg, bp); } dr->dt.dl.dr_override_state = DR_NOT_OVERRIDDEN; + dr->dt.dl.dr_nopwrite = B_FALSE; + /* * Release the already-written buffer, so we leave it in * a consistent dirty state. Note that all callers are @@ -2172,6 +2174,13 @@ dmu_buf_freeable(dmu_buf_t *dbuf) return (res); } +blkptr_t * +dmu_buf_get_blkptr(dmu_buf_t *db) +{ + dmu_buf_impl_t *dbi = (dmu_buf_impl_t *)db; + return (dbi->db_blkptr); +} + static void dbuf_check_blkptr(dnode_t *dn, dmu_buf_impl_t *db) { @@ -2514,7 +2523,11 @@ dbuf_write_done(zio_t *zio, arc_buf_t *b ASSERT0(zio->io_error); ASSERT(db->db_blkptr == bp); - if (zio->io_flags & ZIO_FLAG_IO_REWRITE) { + /* + * For nopwrites and rewrites we ensure that the bp matches our + * original and bypass all the accounting. + */ + if (zio->io_flags & (ZIO_FLAG_IO_REWRITE | ZIO_FLAG_NOPWRITE)) { ASSERT(BP_EQUAL(bp, bp_orig)); } else { objset_t *os; @@ -2705,7 +2718,7 @@ dbuf_write(dbuf_dirty_record_t *dr, arc_ mutex_enter(&db->db_mtx); dr->dt.dl.dr_override_state = DR_NOT_OVERRIDDEN; zio_write_override(dr->dr_zio, &dr->dt.dl.dr_overridden_by, - dr->dt.dl.dr_copies); + dr->dt.dl.dr_copies, dr->dt.dl.dr_nopwrite); mutex_exit(&db->db_mtx); } else if (db->db_state == DB_NOFILL) { ASSERT(zp.zp_checksum == ZIO_CHECKSUM_OFF); Modified: stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c ============================================================================== --- stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c Mon Dec 10 14:36:48 2012 (r244087) +++ stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c Mon Dec 10 14:37:18 2012 (r244088) @@ -40,11 +40,21 @@ #include #include #include +#include #include #ifdef _KERNEL #include #endif +/* + * Enable/disable nopwrite feature. + */ +int zfs_nopwrite_enabled = 1; +SYSCTL_DECL(_vfs_zfs); +TUNABLE_INT("vfs.zfs.nopwrite_enabled", &zfs_nopwrite_enabled); +SYSCTL_INT(_vfs_zfs, OID_AUTO, nopwrite_enabled, CTLFLAG_RDTUN, + &zfs_nopwrite_enabled, 0, "Enable nopwrite feature"); + const dmu_object_type_info_t dmu_ot[DMU_OT_NUMTYPES] = { { DMU_BSWAP_UINT8, TRUE, "unallocated" }, { DMU_BSWAP_ZAP, TRUE, "object directory" }, @@ -1287,6 +1297,16 @@ dmu_sync_done(zio_t *zio, arc_buf_t *buf mutex_enter(&db->db_mtx); ASSERT(dr->dt.dl.dr_override_state == DR_IN_DMU_SYNC); if (zio->io_error == 0) { + dr->dt.dl.dr_nopwrite = !!(zio->io_flags & ZIO_FLAG_NOPWRITE); + if (dr->dt.dl.dr_nopwrite) { + blkptr_t *bp = zio->io_bp; + blkptr_t *bp_orig = &zio->io_bp_orig; + uint8_t chksum = BP_GET_CHECKSUM(bp_orig); + + ASSERT(BP_EQUAL(bp, bp_orig)); + ASSERT(zio->io_prop.zp_compress != ZIO_COMPRESS_OFF); + ASSERT(zio_checksum_table[chksum].ci_dedup); + } dr->dt.dl.dr_overridden_by = *zio->io_bp; dr->dt.dl.dr_override_state = DR_OVERRIDDEN; dr->dt.dl.dr_copies = zio->io_prop.zp_copies; @@ -1308,11 +1328,22 @@ dmu_sync_late_arrival_done(zio_t *zio) { blkptr_t *bp = zio->io_bp; dmu_sync_arg_t *dsa = zio->io_private; + blkptr_t *bp_orig = &zio->io_bp_orig; if (zio->io_error == 0 && !BP_IS_HOLE(bp)) { - ASSERT(zio->io_bp->blk_birth == zio->io_txg); - ASSERT(zio->io_txg > spa_syncing_txg(zio->io_spa)); - zio_free(zio->io_spa, zio->io_txg, zio->io_bp); + /* + * If we didn't allocate a new block (i.e. ZIO_FLAG_NOPWRITE) + * then there is nothing to do here. Otherwise, free the + * newly allocated block in this txg. + */ + if (zio->io_flags & ZIO_FLAG_NOPWRITE) { + ASSERT(BP_EQUAL(bp, bp_orig)); + } else { + ASSERT(BP_IS_HOLE(bp_orig) || !BP_EQUAL(bp, bp_orig)); + ASSERT(zio->io_bp->blk_birth == zio->io_txg); + ASSERT(zio->io_txg > spa_syncing_txg(zio->io_spa)); + zio_free(zio->io_spa, zio->io_txg, zio->io_bp); + } } dmu_tx_commit(dsa->dsa_tx); @@ -1357,7 +1388,7 @@ dmu_sync_late_arrival(zio_t *pio, objset * * Return values: * - * EEXIST: this txg has already been synced, so there's nothing to to. + * EEXIST: this txg has already been synced, so there's nothing to do. * The caller should not log the write. * * ENOENT: the block was dbuf_free_range()'d, so there's nothing to do. @@ -1389,7 +1420,6 @@ dmu_sync(zio_t *pio, uint64_t txg, dmu_s dnode_t *dn; ASSERT(pio != NULL); - ASSERT(BP_IS_HOLE(bp)); ASSERT(txg != 0); SET_BOOKMARK(&zb, ds->ds_object, @@ -1444,6 +1474,23 @@ dmu_sync(zio_t *pio, uint64_t txg, dmu_s return (ENOENT); } + ASSERT(dr->dr_next == NULL || dr->dr_next->dr_txg < txg); + + /* + * Assume the on-disk data is X, the current syncing data is Y, + * and the current in-memory data is Z (currently in dmu_sync). + * X and Z are identical but Y is has been modified. Normally, + * when X and Z are the same we will perform a nopwrite but if Y + * is different we must disable nopwrite since the resulting write + * of Y to disk can free the block containing X. If we allowed a + * nopwrite to occur the block pointing to Z would reference a freed + * block. Since this is a rare case we simplify this by disabling + * nopwrite if the current dmu_sync-ing dbuf has been modified in + * a previous transaction. + */ + if (dr->dr_next) + zp.zp_nopwrite = B_FALSE; + ASSERT(dr->dr_txg == txg); if (dr->dt.dl.dr_override_state == DR_IN_DMU_SYNC || dr->dt.dl.dr_override_state == DR_OVERRIDDEN) { @@ -1519,7 +1566,6 @@ dmu_object_set_compress(objset_t *os, ui int zfs_mdcomp_disable = 0; TUNABLE_INT("vfs.zfs.mdcomp_disable", &zfs_mdcomp_disable); -SYSCTL_DECL(_vfs_zfs); SYSCTL_INT(_vfs_zfs, OID_AUTO, mdcomp_disable, CTLFLAG_RW, &zfs_mdcomp_disable, 0, "Disable metadata compression"); @@ -1532,15 +1578,27 @@ dmu_write_policy(objset_t *os, dnode_t * enum zio_checksum checksum = os->os_checksum; enum zio_compress compress = os->os_compress; enum zio_checksum dedup_checksum = os->os_dedup_checksum; - boolean_t dedup; + boolean_t dedup = B_FALSE; + boolean_t nopwrite = B_FALSE; boolean_t dedup_verify = os->os_dedup_verify; int copies = os->os_copies; /* - * Determine checksum setting. + * We maintain different write policies for each of the following + * types of data: + * 1. metadata + * 2. preallocated blocks (i.e. level-0 blocks of a dump device) + * 3. all other level 0 blocks */ if (ismd) { /* + * XXX -- we should design a compression algorithm + * that specializes in arrays of bps. + */ + compress = zfs_mdcomp_disable ? ZIO_COMPRESS_EMPTY : + ZIO_COMPRESS_LZJB; + + /* * Metadata always gets checksummed. If the data * checksum is multi-bit correctable, and it's not a * ZBT-style checksum, then it's suitable for metadata @@ -1550,45 +1608,47 @@ dmu_write_policy(objset_t *os, dnode_t * if (zio_checksum_table[checksum].ci_correctable < 1 || zio_checksum_table[checksum].ci_eck) checksum = ZIO_CHECKSUM_FLETCHER_4; - } else { - checksum = zio_checksum_select(dn->dn_checksum, checksum); - } + } else if (wp & WP_NOFILL) { + ASSERT(level == 0); - /* - * Determine compression setting. - */ - if (ismd) { /* - * XXX -- we should design a compression algorithm - * that specializes in arrays of bps. + * If we're writing preallocated blocks, we aren't actually + * writing them so don't set any policy properties. These + * blocks are currently only used by an external subsystem + * outside of zfs (i.e. dump) and not written by the zio + * pipeline. */ - compress = zfs_mdcomp_disable ? ZIO_COMPRESS_EMPTY : - ZIO_COMPRESS_LZJB; + compress = ZIO_COMPRESS_OFF; + checksum = ZIO_CHECKSUM_OFF; } else { compress = zio_compress_select(dn->dn_compress, compress); - } - /* - * Determine dedup setting. If we are in dmu_sync(), we won't - * actually dedup now because that's all done in syncing context; - * but we do want to use the dedup checkum. If the checksum is not - * strong enough to ensure unique signatures, force dedup_verify. - */ - dedup = (!ismd && dedup_checksum != ZIO_CHECKSUM_OFF); - if (dedup) { - checksum = dedup_checksum; - if (!zio_checksum_table[checksum].ci_dedup) - dedup_verify = 1; - } + checksum = (dedup_checksum == ZIO_CHECKSUM_OFF) ? + zio_checksum_select(dn->dn_checksum, checksum) : + dedup_checksum; - if (wp & WP_DMU_SYNC) - dedup = 0; + /* + * Determine dedup setting. If we are in dmu_sync(), + * we won't actually dedup now because that's all + * done in syncing context; but we do want to use the + * dedup checkum. If the checksum is not strong + * enough to ensure unique signatures, force + * dedup_verify. + */ + if (dedup_checksum != ZIO_CHECKSUM_OFF) { + dedup = (wp & WP_DMU_SYNC) ? B_FALSE : B_TRUE; + if (!zio_checksum_table[checksum].ci_dedup) + dedup_verify = B_TRUE; + } - if (wp & WP_NOFILL) { - ASSERT(!ismd && level == 0); - checksum = ZIO_CHECKSUM_OFF; - compress = ZIO_COMPRESS_OFF; - dedup = B_FALSE; + /* + * Enable nopwrite if we have a cryptographically secure + * checksum that has no known collisions (i.e. SHA-256) + * and compression is enabled. We don't enable nopwrite if + * dedup is enabled as the two features are mutually exclusive. + */ + nopwrite = (!dedup && zio_checksum_table[checksum].ci_dedup && + compress != ZIO_COMPRESS_OFF && zfs_nopwrite_enabled); } zp->zp_checksum = checksum; @@ -1598,6 +1658,7 @@ dmu_write_policy(objset_t *os, dnode_t * zp->zp_copies = MIN(copies + ismd, spa_max_replication(os->os_spa)); zp->zp_dedup = dedup; zp->zp_dedup_verify = dedup && dedup_verify; + zp->zp_nopwrite = nopwrite; } int Modified: stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c ============================================================================== --- stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c Mon Dec 10 14:36:48 2012 (r244087) +++ stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c Mon Dec 10 14:37:18 2012 (r244088) @@ -440,7 +440,6 @@ dsl_pool_sync(dsl_pool_t *dp, uint64_t t * clean up our in-memory structures accumulated while syncing: * * - move dead blocks from the pending deadlist to the on-disk deadlist - * - clean up zil records * - release hold from dsl_dataset_dirty() */ while (ds = list_remove_head(&synced_datasets)) { Modified: stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/metaslab.c ============================================================================== --- stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/metaslab.c Mon Dec 10 14:36:48 2012 (r244087) +++ stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/metaslab.c Mon Dec 10 14:37:18 2012 (r244088) @@ -881,8 +881,9 @@ metaslab_activate(metaslab_t *msp, uint6 if ((msp->ms_weight & METASLAB_ACTIVE_MASK) == 0) { space_map_load_wait(sm); if (!sm->sm_loaded) { - int error = space_map_load(sm, sm_ops, SM_FREE, - &msp->ms_smo, + space_map_obj_t *smo = &msp->ms_smo; + + int error = space_map_load(sm, sm_ops, SM_FREE, smo, spa_meta_objset(msp->ms_group->mg_vd->vdev_spa)); if (error) { metaslab_group_sort(msp->ms_group, msp, 0); Modified: stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c ============================================================================== --- stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c Mon Dec 10 14:36:48 2012 (r244087) +++ stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c Mon Dec 10 14:37:18 2012 (r244088) @@ -138,6 +138,7 @@ boolean_t zio_taskq_sysdc = B_TRUE; /* u uint_t zio_taskq_basedc = 80; /* base duty cycle */ boolean_t spa_create_process = B_TRUE; /* no process ==> no sysdc */ +extern int zfs_sync_pass_deferred_free; /* * This (illegal) pool name is used when temporarily importing a spa_t in order @@ -6227,7 +6228,7 @@ spa_sync(spa_t *spa, uint64_t txg) spa_errlog_sync(spa, txg); dsl_pool_sync(dp, txg); - if (pass <= SYNC_PASS_DEFERRED_FREE) { + if (pass < zfs_sync_pass_deferred_free) { zio_t *zio = zio_root(spa, NULL, NULL, 0); bplist_iterate(free_bpl, spa_free_sync_cb, zio, tx); Modified: stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_misc.c ============================================================================== --- stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_misc.c Mon Dec 10 14:36:48 2012 (r244087) +++ stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_misc.c Mon Dec 10 14:37:18 2012 (r244088) @@ -1619,6 +1619,7 @@ spa_init(int mode) #endif /* illumos */ refcount_sysinit(); unique_init(); + space_map_init(); zio_init(); dmu_init(); zil_init(); @@ -1641,6 +1642,7 @@ spa_fini(void) zil_fini(); dmu_fini(); zio_fini(); + space_map_fini(); unique_fini(); refcount_fini(); Modified: stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/space_map.c ============================================================================== --- stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/space_map.c Mon Dec 10 14:36:48 2012 (r244087) +++ stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/space_map.c Mon Dec 10 14:37:18 2012 (r244088) @@ -32,6 +32,23 @@ #include #include +static kmem_cache_t *space_seg_cache; + +void +space_map_init(void) +{ + ASSERT(space_seg_cache == NULL); + space_seg_cache = kmem_cache_create("space_seg_cache", + sizeof (space_seg_t), 0, NULL, NULL, NULL, NULL, NULL, 0); +} + +void +space_map_fini(void) +{ + kmem_cache_destroy(space_seg_cache); + space_seg_cache = NULL; +} + /* * Space map routines. * NOTE: caller is responsible for all locking. @@ -124,7 +141,7 @@ space_map_add(space_map_t *sm, uint64_t avl_remove(sm->sm_pp_root, ss_after); } ss_after->ss_start = ss_before->ss_start; - kmem_free(ss_before, sizeof (*ss_before)); + kmem_cache_free(space_seg_cache, ss_before); ss = ss_after; } else if (merge_before) { ss_before->ss_end = end; @@ -137,7 +154,7 @@ space_map_add(space_map_t *sm, uint64_t avl_remove(sm->sm_pp_root, ss_after); ss = ss_after; } else { - ss = kmem_alloc(sizeof (*ss), KM_SLEEP); + ss = kmem_cache_alloc(space_seg_cache, KM_SLEEP); ss->ss_start = start; ss->ss_end = end; avl_insert(&sm->sm_root, ss, where); @@ -183,7 +200,7 @@ space_map_remove(space_map_t *sm, uint64 avl_remove(sm->sm_pp_root, ss); if (left_over && right_over) { - newseg = kmem_alloc(sizeof (*newseg), KM_SLEEP); + newseg = kmem_cache_alloc(space_seg_cache, KM_SLEEP); newseg->ss_start = end; newseg->ss_end = ss->ss_end; ss->ss_end = start; @@ -196,7 +213,7 @@ space_map_remove(space_map_t *sm, uint64 ss->ss_start = end; } else { avl_remove(&sm->sm_root, ss); - kmem_free(ss, sizeof (*ss)); + kmem_cache_free(space_seg_cache, ss); ss = NULL; } @@ -236,7 +253,7 @@ space_map_vacate(space_map_t *sm, space_ while ((ss = avl_destroy_nodes(&sm->sm_root, &cookie)) != NULL) { if (func != NULL) func(mdest, ss->ss_start, ss->ss_end - ss->ss_start); - kmem_free(ss, sizeof (*ss)); + kmem_cache_free(space_seg_cache, ss); } sm->sm_space = 0; } @@ -408,7 +425,7 @@ space_map_sync(space_map_t *sm, uint8_t spa_t *spa = dmu_objset_spa(os); void *cookie = NULL; space_seg_t *ss; - uint64_t bufsize, start, size, run_len; + uint64_t bufsize, start, size, run_len, delta, sm_space; uint64_t *entry, *entry_map, *entry_map_end; ASSERT(MUTEX_HELD(sm->sm_lock)); @@ -437,11 +454,13 @@ space_map_sync(space_map_t *sm, uint8_t SM_DEBUG_SYNCPASS_ENCODE(spa_sync_pass(spa)) | SM_DEBUG_TXG_ENCODE(dmu_tx_get_txg(tx)); + delta = 0; + sm_space = sm->sm_space; while ((ss = avl_destroy_nodes(&sm->sm_root, &cookie)) != NULL) { size = ss->ss_end - ss->ss_start; start = (ss->ss_start - sm->sm_start) >> sm->sm_shift; - sm->sm_space -= size; + delta += size; size >>= sm->sm_shift; while (size) { @@ -463,7 +482,7 @@ space_map_sync(space_map_t *sm, uint8_t start += run_len; size -= run_len; } - kmem_free(ss, sizeof (*ss)); + kmem_cache_free(space_seg_cache, ss); } if (entry != entry_map) { @@ -475,8 +494,15 @@ space_map_sync(space_map_t *sm, uint8_t smo->smo_objsize += size; } + /* + * Ensure that the space_map's accounting wasn't changed + * while we were in the middle of writing it out. + */ + VERIFY3U(sm->sm_space, ==, sm_space); + zio_buf_free(entry_map, bufsize); + sm->sm_space -= delta; VERIFY0(sm->sm_space); } Modified: stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dbuf.h ============================================================================== --- stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dbuf.h Mon Dec 10 14:36:48 2012 (r244087) +++ stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dbuf.h Mon Dec 10 14:37:18 2012 (r244088) @@ -20,6 +20,7 @@ */ /* * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved. + * Copyright (c) 2012 by Delphix. All rights reserved. */ #ifndef _SYS_DBUF_H @@ -130,6 +131,7 @@ typedef struct dbuf_dirty_record { blkptr_t dr_overridden_by; override_states_t dr_override_state; uint8_t dr_copies; + boolean_t dr_nopwrite; } dl; } dt; } dbuf_dirty_record_t; Modified: stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dmu.h ============================================================================== --- stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dmu.h Mon Dec 10 14:36:48 2012 (r244087) +++ stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dmu.h Mon Dec 10 14:37:18 2012 (r244088) @@ -505,6 +505,11 @@ void dmu_evict_user(objset_t *os, dmu_bu void *dmu_buf_get_user(dmu_buf_t *db); /* + * Returns the blkptr associated with this dbuf, or NULL if not set. + */ +struct blkptr *dmu_buf_get_blkptr(dmu_buf_t *db); + +/* * Indicate that you are going to modify the buffer's data (db_data). * * The transaction (tx) must be assigned to a txg (ie. you've called Modified: stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/metaslab_impl.h ============================================================================== --- stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/metaslab_impl.h Mon Dec 10 14:36:48 2012 (r244087) +++ stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/metaslab_impl.h Mon Dec 10 14:37:18 2012 (r244088) @@ -21,7 +21,10 @@ /* * Copyright 2009 Sun Microsystems, Inc. All rights reserved. * Use is subject to license terms. - * Copyright (c) 2011 by Delphix. All rights reserved. + */ + +/* + * Copyright (c) 2012 by Delphix. All rights reserved. */ #ifndef _SYS_METASLAB_IMPL_H Modified: stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/spa.h ============================================================================== --- stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/spa.h Mon Dec 10 14:36:48 2012 (r244087) +++ stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/spa.h Mon Dec 10 14:37:18 2012 (r244088) @@ -489,14 +489,6 @@ extern int spa_scan_stop(spa_t *spa); extern void spa_sync(spa_t *spa, uint64_t txg); /* only for DMU use */ extern void spa_sync_allpools(void); -/* - * DEFERRED_FREE must be large enough that regular blocks are not - * deferred. XXX so can't we change it back to 1? - */ -#define SYNC_PASS_DEFERRED_FREE 2 /* defer frees after this pass */ -#define SYNC_PASS_DONT_COMPRESS 4 /* don't compress after this pass */ -#define SYNC_PASS_REWRITE 1 /* rewrite new bps after this pass */ - /* spa namespace global mutex */ extern kmutex_t spa_namespace_lock; Modified: stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/space_map.h ============================================================================== --- stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/space_map.h Mon Dec 10 14:36:48 2012 (r244087) +++ stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/space_map.h Mon Dec 10 14:37:18 2012 (r244088) @@ -23,6 +23,10 @@ * Use is subject to license terms. */ +/* + * Copyright (c) 2012 by Delphix. All rights reserved. + */ + #ifndef _SYS_SPACE_MAP_H #define _SYS_SPACE_MAP_H @@ -136,6 +140,8 @@ struct space_map_ops { typedef void space_map_func_t(space_map_t *sm, uint64_t start, uint64_t size); +extern void space_map_init(void); +extern void space_map_fini(void); extern void space_map_create(space_map_t *sm, uint64_t start, uint64_t size, uint8_t shift, kmutex_t *lp); extern void space_map_destroy(space_map_t *sm); Modified: stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio.h ============================================================================== --- stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio.h Mon Dec 10 14:36:48 2012 (r244087) +++ stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio.h Mon Dec 10 14:37:18 2012 (r244088) @@ -186,7 +186,9 @@ enum zio_flag { ZIO_FLAG_RAW = 1 << 21, ZIO_FLAG_GANG_CHILD = 1 << 22, ZIO_FLAG_DDT_CHILD = 1 << 23, - ZIO_FLAG_GODFATHER = 1 << 24 + ZIO_FLAG_GODFATHER = 1 << 24, + ZIO_FLAG_NOPWRITE = 1 << 25, + ZIO_FLAG_REEXECUTED = 1 << 26, }; #define ZIO_FLAG_MUSTSUCCEED 0 @@ -285,8 +287,9 @@ typedef struct zio_prop { dmu_object_type_t zp_type; uint8_t zp_level; uint8_t zp_copies; - uint8_t zp_dedup; - uint8_t zp_dedup_verify; + boolean_t zp_dedup; + boolean_t zp_dedup_verify; + boolean_t zp_nopwrite; } zio_prop_t; typedef struct zio_cksum_report zio_cksum_report_t; @@ -454,7 +457,8 @@ extern zio_t *zio_rewrite(zio_t *pio, sp void *data, uint64_t size, zio_done_func_t *done, void *private, int priority, enum zio_flag flags, zbookmark_t *zb); -extern void zio_write_override(zio_t *zio, blkptr_t *bp, int copies); +extern void zio_write_override(zio_t *zio, blkptr_t *bp, int copies, + boolean_t nopwrite); extern void zio_free(spa_t *spa, uint64_t txg, const blkptr_t *bp); Modified: stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio_impl.h ============================================================================== --- stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio_impl.h Mon Dec 10 14:36:48 2012 (r244087) +++ stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio_impl.h Mon Dec 10 14:37:18 2012 (r244088) @@ -23,6 +23,10 @@ * Use is subject to license terms. */ +/* + * Copyright (c) 2012 by Delphix. All rights reserved. + */ + #ifndef _ZIO_IMPL_H #define _ZIO_IMPL_H @@ -34,6 +38,70 @@ extern "C" { #endif /* + * XXX -- Describe ZFS I/O pipleine here. Fill in as needed. + * + * The ZFS I/O pipeline is comprised of various stages which are defined + * in the zio_stage enum below. The individual stages are used to construct + * these basic I/O operations: Read, Write, Free, Claim, and Ioctl. + * + * I/O operations: (XXX - provide detail for each of the operations) + * + * Read: + * Write: + * Free: + * Claim: + * Ioctl: + * + * Although the most common pipeline are used by the basic I/O operations + * above, there are some helper pipelines (one could consider them + * sub-pipelines) which are used internally by the ZIO module and are + * explained below: + * + * Interlock Pipeline: + * The interlock pipeline is the most basic pipeline and is used by all + * of the I/O operations. The interlock pipeline does not perform any I/O + * and is used to coordinate the dependencies between I/Os that are being + * issued (i.e. the parent/child relationship). + * + * Vdev child Pipeline: + * The vdev child pipeline is responsible for performing the physical I/O. + * It is in this pipeline where the I/O are queued and possibly cached. + * + * In addition to performing I/O, the pipeline is also responsible for + * data transformations. The transformations performed are based on the + * specific properties that user may have selected and modify the + * behavior of the pipeline. Examples of supported transformations are + * compression, dedup, and nop writes. Transformations will either modify + * the data or the pipeline. This list below further describes each of + * the supported transformations: + * + * Compression: + * ZFS supports three different flavors of compression -- gzip, lzjb, and + * zle. Compression occurs as part of the write pipeline and is performed + * in the ZIO_STAGE_WRITE_BP_INIT stage. + * + * Dedup: + * Dedup reads are handled by the ZIO_STAGE_DDT_READ_START and + * ZIO_STAGE_DDT_READ_DONE stages. These stages are added to an existing + * read pipeline if the dedup bit is set on the block pointer. + * Writing a dedup block is performed by the ZIO_STAGE_DDT_WRITE stage + * and added to a write pipeline if a user has enabled dedup on that + * particular dataset. + * + * NOP Write: + * The NOP write feature is performed by the ZIO_STAGE_NOP_WRITE stage + * and is added to an existing write pipeline if a crypographically + * secure checksum (i.e. SHA256) is enabled and compression is turned on. + * The NOP write stage will compare the checksums of the current data + * on-disk (level-0 blocks only) and the data that is currently being written. + * If the checksum values are identical then the pipeline is converted to + * an interlock pipeline skipping block allocation and bypassing the + * physical I/O. The nop write feature can handle writes in either + * syncing or open context (i.e. zil writes) and as a result is mutually + * exclusive with dedup. + */ + +/* * zio pipeline stage definitions */ enum zio_stage { @@ -46,27 +114,29 @@ enum zio_stage { ZIO_STAGE_CHECKSUM_GENERATE = 1 << 5, /* -W--- */ - ZIO_STAGE_DDT_READ_START = 1 << 6, /* R---- */ - ZIO_STAGE_DDT_READ_DONE = 1 << 7, /* R---- */ - ZIO_STAGE_DDT_WRITE = 1 << 8, /* -W--- */ - ZIO_STAGE_DDT_FREE = 1 << 9, /* --F-- */ + ZIO_STAGE_NOP_WRITE = 1 << 6, /* -W--- */ - ZIO_STAGE_GANG_ASSEMBLE = 1 << 10, /* RWFC- */ - ZIO_STAGE_GANG_ISSUE = 1 << 11, /* RWFC- */ + ZIO_STAGE_DDT_READ_START = 1 << 7, /* R---- */ + ZIO_STAGE_DDT_READ_DONE = 1 << 8, /* R---- */ + ZIO_STAGE_DDT_WRITE = 1 << 9, /* -W--- */ + ZIO_STAGE_DDT_FREE = 1 << 10, /* --F-- */ - ZIO_STAGE_DVA_ALLOCATE = 1 << 12, /* -W--- */ - ZIO_STAGE_DVA_FREE = 1 << 13, /* --F-- */ - ZIO_STAGE_DVA_CLAIM = 1 << 14, /* ---C- */ + ZIO_STAGE_GANG_ASSEMBLE = 1 << 11, /* RWFC- */ + ZIO_STAGE_GANG_ISSUE = 1 << 12, /* RWFC- */ - ZIO_STAGE_READY = 1 << 15, /* RWFCI */ + ZIO_STAGE_DVA_ALLOCATE = 1 << 13, /* -W--- */ + ZIO_STAGE_DVA_FREE = 1 << 14, /* --F-- */ + ZIO_STAGE_DVA_CLAIM = 1 << 15, /* ---C- */ - ZIO_STAGE_VDEV_IO_START = 1 << 16, /* RW--I */ - ZIO_STAGE_VDEV_IO_DONE = 1 << 17, /* RW--- */ - ZIO_STAGE_VDEV_IO_ASSESS = 1 << 18, /* RW--I */ + ZIO_STAGE_READY = 1 << 16, /* RWFCI */ - ZIO_STAGE_CHECKSUM_VERIFY = 1 << 19, /* R---- */ + ZIO_STAGE_VDEV_IO_START = 1 << 17, /* RW--I */ + ZIO_STAGE_VDEV_IO_DONE = 1 << 18, /* RW--- */ + ZIO_STAGE_VDEV_IO_ASSESS = 1 << 19, /* RW--I */ - ZIO_STAGE_DONE = 1 << 20 /* RWFCI */ + ZIO_STAGE_CHECKSUM_VERIFY = 1 << 20, /* R---- */ + + ZIO_STAGE_DONE = 1 << 21 /* RWFCI */ }; #define ZIO_INTERLOCK_STAGES \ @@ -143,6 +213,7 @@ enum zio_stage { #define ZIO_FREE_PIPELINE \ (ZIO_INTERLOCK_STAGES | \ ZIO_STAGE_FREE_BP_INIT | \ + ZIO_STAGE_ISSUE_ASYNC | \ ZIO_STAGE_DVA_FREE) #define ZIO_DDT_FREE_PIPELINE \ Modified: stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c ============================================================================== --- stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c Mon Dec 10 14:36:48 2012 (r244087) +++ stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c Mon Dec 10 14:37:18 2012 (r244088) @@ -1207,6 +1207,12 @@ zfs_get_data(void *arg, lr_write_t *lr, DMU_READ_NO_PREFETCH); if (error == 0) { + blkptr_t *obp = dmu_buf_get_blkptr(db); + if (obp) { + ASSERT(BP_IS_HOLE(bp)); + *bp = *obp; + } + zgd->zgd_db = db; zgd->zgd_bp = bp; @@ -3255,6 +3261,12 @@ top: uint64_t acl_obj; new_mode = (pmode & S_IFMT) | (vap->va_mode & ~S_IFMT); + if (zp->z_zfsvfs->z_acl_mode == ZFS_ACL_RESTRICTED && + !(zp->z_pflags & ZFS_ACL_TRIVIAL)) { + err = EPERM; + goto out; + } + if (err = zfs_acl_chmod_setattr(zp, &aclp, new_mode)) goto out; Modified: stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c ============================================================================== --- stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c Mon Dec 10 14:36:48 2012 (r244087) +++ stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c Mon Dec 10 14:37:18 2012 (r244088) @@ -89,6 +89,31 @@ extern vmem_t *zio_alloc_arena; extern int zfs_mg_alloc_failures; /* + * The following actions directly effect the spa's sync-to-convergence logic. + * The values below define the sync pass when we start performing the action. + * Care should be taken when changing these values as they directly impact + * spa_sync() performance. Tuning these values may introduce subtle performance + * pathologies and should only be done in the context of performance analysis. + * These tunables will eventually be removed and replaced with #defines once + * enough analysis has been done to determine optimal values. + * + * The 'zfs_sync_pass_deferred_free' pass must be greater than 1 to ensure that + * regular blocks are not deferred. + */ +int zfs_sync_pass_deferred_free = 2; /* defer frees starting in this pass */ +TUNABLE_INT("vfs.zfs.sync_pass_deferred_free", &zfs_sync_pass_deferred_free); +SYSCTL_INT(_vfs_zfs, OID_AUTO, sync_pass_deferred_free, CTLFLAG_RDTUN, + &zfs_sync_pass_deferred_free, 0, "defer frees starting in this pass"); +int zfs_sync_pass_dont_compress = 5; /* don't compress starting in this pass */ +TUNABLE_INT("vfs.zfs.sync_pass_dont_compress", &zfs_sync_pass_dont_compress); +SYSCTL_INT(_vfs_zfs, OID_AUTO, sync_pass_dont_compress, CTLFLAG_RDTUN, + &zfs_sync_pass_dont_compress, 0, "don't compress starting in this pass"); +int zfs_sync_pass_rewrite = 2; /* rewrite new bps starting in this pass */ +TUNABLE_INT("vfs.zfs.sync_pass_rewrite", &zfs_sync_pass_rewrite); +SYSCTL_INT(_vfs_zfs, OID_AUTO, sync_pass_rewrite, CTLFLAG_RDTUN, + &zfs_sync_pass_rewrite, 0, "rewrite new bps starting in this pass"); + +/* * An allocating zio is one that either currently has the DVA allocate * stage set or will have it later in its lifetime. */ @@ -650,9 +675,7 @@ zio_write(zio_t *pio, spa_t *spa, uint64 *** DIFF OUTPUT TRUNCATED AT 1000 LINES ***