From owner-svn-src-all@FreeBSD.ORG Sat Nov 8 06:43:45 2014 Return-Path: Delivered-To: svn-src-all@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 43D50CE6; Sat, 8 Nov 2014 06:43:45 +0000 (UTC) Received: from svn.freebsd.org (svn.freebsd.org [IPv6:2001:1900:2254:2068::e6a:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 2BF0392F; Sat, 8 Nov 2014 06:43:45 +0000 (UTC) Received: from svn.freebsd.org ([127.0.1.70]) by svn.freebsd.org (8.14.9/8.14.9) with ESMTP id sA86hjUd028515; Sat, 8 Nov 2014 06:43:45 GMT (envelope-from delphij@FreeBSD.org) Received: (from delphij@localhost) by svn.freebsd.org (8.14.9/8.14.9/Submit) id sA86hbDc028453; Sat, 8 Nov 2014 06:43:37 GMT (envelope-from delphij@FreeBSD.org) Message-Id: <201411080643.sA86hbDc028453@svn.freebsd.org> X-Authentication-Warning: svn.freebsd.org: delphij set sender to delphij@FreeBSD.org using -f From: Xin LI Date: Sat, 8 Nov 2014 06:43:37 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-vendor@freebsd.org Subject: svn commit: r274273 - vendor-sys/illumos/dist/common/zfs vendor-sys/illumos/dist/uts/common/fs/zfs vendor-sys/illumos/dist/uts/common/fs/zfs/sys vendor-sys/illumos/dist/uts/common/sys/fs vendor/ill... X-SVN-Group: vendor-sys MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 08 Nov 2014 06:43:45 -0000 Author: delphij Date: Sat Nov 8 06:43:33 2014 New Revision: 274273 URL: https://svnweb.freebsd.org/changeset/base/274273 Log: 5027 zfs large block support Reviewed by: Alek Pinchuk Reviewed by: George Wilson Reviewed by: Josef 'Jeff' Sipek Reviewed by: Richard Elling Reviewed by: Saso Kiselkov Reviewed by: Brian Behlendorf Approved by: Dan McDonald Author: Matthew Ahrens illumos/illumos-gate@b515258426fed6c7311fd3f1dea697cfbd4085c6 Modified: vendor-sys/illumos/dist/common/zfs/zfeature_common.c vendor-sys/illumos/dist/common/zfs/zfeature_common.h vendor-sys/illumos/dist/common/zfs/zfs_prop.c vendor-sys/illumos/dist/common/zfs/zpool_prop.c vendor-sys/illumos/dist/uts/common/fs/zfs/bpobj.c vendor-sys/illumos/dist/uts/common/fs/zfs/bptree.c vendor-sys/illumos/dist/uts/common/fs/zfs/dbuf.c vendor-sys/illumos/dist/uts/common/fs/zfs/dmu_objset.c vendor-sys/illumos/dist/uts/common/fs/zfs/dmu_send.c vendor-sys/illumos/dist/uts/common/fs/zfs/dmu_tx.c vendor-sys/illumos/dist/uts/common/fs/zfs/dnode.c vendor-sys/illumos/dist/uts/common/fs/zfs/dsl_dataset.c vendor-sys/illumos/dist/uts/common/fs/zfs/dsl_deadlist.c vendor-sys/illumos/dist/uts/common/fs/zfs/dsl_destroy.c vendor-sys/illumos/dist/uts/common/fs/zfs/dsl_pool.c vendor-sys/illumos/dist/uts/common/fs/zfs/metaslab.c vendor-sys/illumos/dist/uts/common/fs/zfs/sa.c vendor-sys/illumos/dist/uts/common/fs/zfs/spa.c vendor-sys/illumos/dist/uts/common/fs/zfs/spa_history.c vendor-sys/illumos/dist/uts/common/fs/zfs/spa_misc.c vendor-sys/illumos/dist/uts/common/fs/zfs/sys/dmu.h vendor-sys/illumos/dist/uts/common/fs/zfs/sys/dmu_objset.h vendor-sys/illumos/dist/uts/common/fs/zfs/sys/dmu_send.h vendor-sys/illumos/dist/uts/common/fs/zfs/sys/dsl_dataset.h vendor-sys/illumos/dist/uts/common/fs/zfs/sys/spa.h vendor-sys/illumos/dist/uts/common/fs/zfs/sys/zap_impl.h vendor-sys/illumos/dist/uts/common/fs/zfs/sys/zfs_ioctl.h vendor-sys/illumos/dist/uts/common/fs/zfs/sys/zfs_znode.h vendor-sys/illumos/dist/uts/common/fs/zfs/sys/zil.h vendor-sys/illumos/dist/uts/common/fs/zfs/sys/zil_impl.h vendor-sys/illumos/dist/uts/common/fs/zfs/vdev.c vendor-sys/illumos/dist/uts/common/fs/zfs/vdev_queue.c vendor-sys/illumos/dist/uts/common/fs/zfs/vdev_raidz.c vendor-sys/illumos/dist/uts/common/fs/zfs/zap_micro.c vendor-sys/illumos/dist/uts/common/fs/zfs/zfs_ioctl.c vendor-sys/illumos/dist/uts/common/fs/zfs/zfs_log.c vendor-sys/illumos/dist/uts/common/fs/zfs/zfs_vfsops.c vendor-sys/illumos/dist/uts/common/fs/zfs/zfs_vnops.c vendor-sys/illumos/dist/uts/common/fs/zfs/zfs_znode.c vendor-sys/illumos/dist/uts/common/fs/zfs/zil.c vendor-sys/illumos/dist/uts/common/fs/zfs/zio.c vendor-sys/illumos/dist/uts/common/fs/zfs/zvol.c vendor-sys/illumos/dist/uts/common/sys/fs/zfs.h Changes in other areas also in this revision: Modified: vendor/illumos/dist/cmd/zdb/zdb.c vendor/illumos/dist/cmd/zfs/zfs_main.c vendor/illumos/dist/cmd/zstreamdump/zstreamdump.c vendor/illumos/dist/cmd/ztest/ztest.c vendor/illumos/dist/lib/libzfs/common/libzfs.h vendor/illumos/dist/lib/libzfs/common/libzfs_dataset.c vendor/illumos/dist/lib/libzfs/common/libzfs_sendrecv.c vendor/illumos/dist/lib/libzfs_core/common/libzfs_core.c vendor/illumos/dist/lib/libzfs_core/common/libzfs_core.h vendor/illumos/dist/man/man1m/zfs.1m vendor/illumos/dist/man/man5/zpool-features.5 Modified: vendor-sys/illumos/dist/common/zfs/zfeature_common.c ============================================================================== --- vendor-sys/illumos/dist/common/zfs/zfeature_common.c Sat Nov 8 06:34:37 2014 (r274272) +++ vendor-sys/illumos/dist/common/zfs/zfeature_common.c Sat Nov 8 06:43:33 2014 (r274273) @@ -57,7 +57,8 @@ valid_char(char c, boolean_t after_colon { return ((c >= 'a' && c <= 'z') || (c >= '0' && c <= '9') || - c == (after_colon ? '_' : '.')); + (after_colon && c == '_') || + (!after_colon && (c == '.' || c == '-'))); } /* @@ -221,4 +222,13 @@ zpool_feature_init(void) "com.delphix:embedded_data", "embedded_data", "Blocks which compress very well use even less space.", B_FALSE, B_TRUE, B_TRUE, NULL); + + static const spa_feature_t large_blocks_deps[] = { + SPA_FEATURE_EXTENSIBLE_DATASET, + SPA_FEATURE_NONE + }; + zfeature_register(SPA_FEATURE_LARGE_BLOCKS, + "org.open-zfs:large_blocks", "large_blocks", + "Support for blocks larger than 128KB.", B_FALSE, B_FALSE, B_FALSE, + large_blocks_deps); } Modified: vendor-sys/illumos/dist/common/zfs/zfeature_common.h ============================================================================== --- vendor-sys/illumos/dist/common/zfs/zfeature_common.h Sat Nov 8 06:34:37 2014 (r274272) +++ vendor-sys/illumos/dist/common/zfs/zfeature_common.h Sat Nov 8 06:43:33 2014 (r274273) @@ -51,6 +51,7 @@ typedef enum spa_feature { SPA_FEATURE_EMBEDDED_DATA, SPA_FEATURE_BOOKMARKS, SPA_FEATURE_FS_SS_LIMIT, + SPA_FEATURE_LARGE_BLOCKS, SPA_FEATURES } spa_feature_t; Modified: vendor-sys/illumos/dist/common/zfs/zfs_prop.c ============================================================================== --- vendor-sys/illumos/dist/common/zfs/zfs_prop.c Sat Nov 8 06:34:37 2014 (r274272) +++ vendor-sys/illumos/dist/common/zfs/zfs_prop.c Sat Nov 8 06:43:33 2014 (r274273) @@ -397,8 +397,8 @@ zfs_prop_init(void) /* inherit number properties */ zprop_register_number(ZFS_PROP_RECORDSIZE, "recordsize", - SPA_MAXBLOCKSIZE, PROP_INHERIT, - ZFS_TYPE_FILESYSTEM, "512 to 128k, power of 2", "RECSIZE"); + SPA_OLD_MAXBLOCKSIZE, PROP_INHERIT, + ZFS_TYPE_FILESYSTEM, "512 to 1M, power of 2", "RECSIZE"); /* hidden properties */ zprop_register_hidden(ZFS_PROP_CREATETXG, "createtxg", PROP_TYPE_NUMBER, Modified: vendor-sys/illumos/dist/common/zfs/zpool_prop.c ============================================================================== --- vendor-sys/illumos/dist/common/zfs/zpool_prop.c Sat Nov 8 06:34:37 2014 (r274272) +++ vendor-sys/illumos/dist/common/zfs/zpool_prop.c Sat Nov 8 06:43:33 2014 (r274273) @@ -127,6 +127,8 @@ zpool_prop_init(void) /* hidden properties */ zprop_register_hidden(ZPOOL_PROP_NAME, "name", PROP_TYPE_STRING, PROP_READONLY, ZFS_TYPE_POOL, "NAME"); + zprop_register_hidden(ZPOOL_PROP_MAXBLOCKSIZE, "maxblocksize", + PROP_TYPE_NUMBER, PROP_READONLY, ZFS_TYPE_POOL, "MAXBLOCKSIZE"); } /* Modified: vendor-sys/illumos/dist/uts/common/fs/zfs/bpobj.c ============================================================================== --- vendor-sys/illumos/dist/uts/common/fs/zfs/bpobj.c Sat Nov 8 06:34:37 2014 (r274272) +++ vendor-sys/illumos/dist/uts/common/fs/zfs/bpobj.c Sat Nov 8 06:43:33 2014 (r274273) @@ -43,7 +43,7 @@ bpobj_alloc_empty(objset_t *os, int bloc if (!spa_feature_is_active(spa, SPA_FEATURE_EMPTY_BPOBJ)) { ASSERT0(dp->dp_empty_bpobj); dp->dp_empty_bpobj = - bpobj_alloc(os, SPA_MAXBLOCKSIZE, tx); + bpobj_alloc(os, SPA_OLD_MAXBLOCKSIZE, tx); VERIFY(zap_add(os, DMU_POOL_DIRECTORY_OBJECT, DMU_POOL_EMPTY_BPOBJ, sizeof (uint64_t), 1, @@ -396,7 +396,8 @@ bpobj_enqueue_subobj(bpobj_t *bpo, uint6 dmu_buf_will_dirty(bpo->bpo_dbuf, tx); if (bpo->bpo_phys->bpo_subobjs == 0) { bpo->bpo_phys->bpo_subobjs = dmu_object_alloc(bpo->bpo_os, - DMU_OT_BPOBJ_SUBOBJ, SPA_MAXBLOCKSIZE, DMU_OT_NONE, 0, tx); + DMU_OT_BPOBJ_SUBOBJ, SPA_OLD_MAXBLOCKSIZE, + DMU_OT_NONE, 0, tx); } dmu_object_info_t doi; Modified: vendor-sys/illumos/dist/uts/common/fs/zfs/bptree.c ============================================================================== --- vendor-sys/illumos/dist/uts/common/fs/zfs/bptree.c Sat Nov 8 06:34:37 2014 (r274272) +++ vendor-sys/illumos/dist/uts/common/fs/zfs/bptree.c Sat Nov 8 06:43:33 2014 (r274273) @@ -65,7 +65,7 @@ bptree_alloc(objset_t *os, dmu_tx_t *tx) bptree_phys_t *bt; obj = dmu_object_alloc(os, DMU_OTN_UINT64_METADATA, - SPA_MAXBLOCKSIZE, DMU_OTN_UINT64_METADATA, + SPA_OLD_MAXBLOCKSIZE, DMU_OTN_UINT64_METADATA, sizeof (bptree_phys_t), tx); /* Modified: vendor-sys/illumos/dist/uts/common/fs/zfs/dbuf.c ============================================================================== --- vendor-sys/illumos/dist/uts/common/fs/zfs/dbuf.c Sat Nov 8 06:34:37 2014 (r274272) +++ vendor-sys/illumos/dist/uts/common/fs/zfs/dbuf.c Sat Nov 8 06:43:33 2014 (r274273) @@ -2022,10 +2022,8 @@ dbuf_spill_set_blksz(dmu_buf_t *db_fake, return (SET_ERROR(ENOTSUP)); if (blksz == 0) blksz = SPA_MINBLOCKSIZE; - if (blksz > SPA_MAXBLOCKSIZE) - blksz = SPA_MAXBLOCKSIZE; - else - blksz = P2ROUNDUP(blksz, SPA_MINBLOCKSIZE); + ASSERT3U(blksz, <=, spa_maxblocksize(dmu_objset_spa(db->db_objset))); + blksz = P2ROUNDUP(blksz, SPA_MINBLOCKSIZE); DB_DNODE_ENTER(db); dn = DB_DNODE(db); Modified: vendor-sys/illumos/dist/uts/common/fs/zfs/dmu_objset.c ============================================================================== --- vendor-sys/illumos/dist/uts/common/fs/zfs/dmu_objset.c Sat Nov 8 06:34:37 2014 (r274272) +++ vendor-sys/illumos/dist/uts/common/fs/zfs/dmu_objset.c Sat Nov 8 06:43:33 2014 (r274273) @@ -255,6 +255,14 @@ logbias_changed_cb(void *arg, uint64_t n zil_set_logbias(os->os_zil, newval); } +static void +recordsize_changed_cb(void *arg, uint64_t newval) +{ + objset_t *os = arg; + + os->os_recordsize = newval; +} + void dmu_objset_byteswap(void *buf, size_t size) { @@ -384,6 +392,11 @@ dmu_objset_open_impl(spa_t *spa, dsl_dat ZFS_PROP_REDUNDANT_METADATA), redundant_metadata_changed_cb, os); } + if (err == 0) { + err = dsl_prop_register(ds, + zfs_prop_to_name(ZFS_PROP_RECORDSIZE), + recordsize_changed_cb, os); + } } if (err != 0) { VERIFY(arc_buf_remove_ref(os->os_phys_buf, @@ -642,6 +655,9 @@ dmu_objset_evict(objset_t *os) VERIFY0(dsl_prop_unregister(ds, zfs_prop_to_name(ZFS_PROP_REDUNDANT_METADATA), redundant_metadata_changed_cb, os)); + VERIFY0(dsl_prop_unregister(ds, + zfs_prop_to_name(ZFS_PROP_RECORDSIZE), + recordsize_changed_cb, os)); } VERIFY0(dsl_prop_unregister(ds, zfs_prop_to_name(ZFS_PROP_PRIMARYCACHE), Modified: vendor-sys/illumos/dist/uts/common/fs/zfs/dmu_send.c ============================================================================== --- vendor-sys/illumos/dist/uts/common/fs/zfs/dmu_send.c Sat Nov 8 06:34:37 2014 (r274272) +++ vendor-sys/illumos/dist/uts/common/fs/zfs/dmu_send.c Sat Nov 8 06:43:33 2014 (r274273) @@ -206,11 +206,12 @@ dump_write(dmu_sendarg_t *dsp, dmu_objec drrw->drr_offset = offset; drrw->drr_length = blksz; drrw->drr_toguid = dsp->dsa_toguid; - if (BP_IS_EMBEDDED(bp)) { + if (bp == NULL || BP_IS_EMBEDDED(bp)) { /* - * There's no pre-computed checksum of embedded BP's, so - * (like fletcher4-checkummed blocks) userland will have - * to compute a dedup-capable checksum itself. + * There's no pre-computed checksum for partial-block + * writes or embedded BP's, so (like + * fletcher4-checkummed blocks) userland will have to + * compute a dedup-capable checksum itself. */ drrw->drr_checksumtype = ZIO_CHECKSUM_OFF; } else { @@ -372,6 +373,10 @@ dump_dnode(dmu_sendarg_t *dsp, uint64_t drro->drr_compress = dnp->dn_compress; drro->drr_toguid = dsp->dsa_toguid; + if (!(dsp->dsa_featureflags & DMU_BACKUP_FEATURE_LARGE_BLOCKS) && + drro->drr_blksz > SPA_OLD_MAXBLOCKSIZE) + drro->drr_blksz = SPA_OLD_MAXBLOCKSIZE; + if (dump_bytes(dsp, dsp->dsa_drr, sizeof (dmu_replay_record_t)) != 0) return (SET_ERROR(EINTR)); @@ -491,6 +496,7 @@ backup_cb(spa_t *spa, zilog_t *zilog, co uint32_t aflags = ARC_WAIT; arc_buf_t *abuf; int blksz = BP_GET_LSIZE(bp); + uint64_t offset; ASSERT3U(blksz, ==, dnp->dn_datablkszsec << SPA_MINBLOCKSHIFT); ASSERT0(zb->zb_level); @@ -511,8 +517,24 @@ backup_cb(spa_t *spa, zilog_t *zilog, co } } - err = dump_write(dsp, type, zb->zb_object, zb->zb_blkid * blksz, - blksz, bp, abuf->b_data); + offset = zb->zb_blkid * blksz; + + if (!(dsp->dsa_featureflags & + DMU_BACKUP_FEATURE_LARGE_BLOCKS) && + blksz > SPA_OLD_MAXBLOCKSIZE) { + char *buf = abuf->b_data; + while (blksz > 0 && err == 0) { + int n = MIN(blksz, SPA_OLD_MAXBLOCKSIZE); + err = dump_write(dsp, type, zb->zb_object, + offset, n, NULL, buf); + offset += n; + buf += n; + blksz -= n; + } + } else { + err = dump_write(dsp, type, zb->zb_object, + offset, blksz, bp, abuf->b_data); + } (void) arc_buf_remove_ref(abuf, &abuf); } @@ -526,7 +548,7 @@ backup_cb(spa_t *spa, zilog_t *zilog, co static int dmu_send_impl(void *tag, dsl_pool_t *dp, dsl_dataset_t *ds, zfs_bookmark_phys_t *fromzb, boolean_t is_clone, boolean_t embedok, - int outfd, vnode_t *vp, offset_t *off) + boolean_t large_block_ok, int outfd, vnode_t *vp, offset_t *off) { objset_t *os; dmu_replay_record_t *drr; @@ -561,6 +583,8 @@ dmu_send_impl(void *tag, dsl_pool_t *dp, } #endif + if (large_block_ok && ds->ds_large_blocks) + featureflags |= DMU_BACKUP_FEATURE_LARGE_BLOCKS; if (embedok && spa_feature_is_active(dp->dp_spa, SPA_FEATURE_EMBEDDED_DATA)) { featureflags |= DMU_BACKUP_FEATURE_EMBED_DATA; @@ -656,7 +680,8 @@ out: int dmu_send_obj(const char *pool, uint64_t tosnap, uint64_t fromsnap, - boolean_t embedok, int outfd, vnode_t *vp, offset_t *off) + boolean_t embedok, boolean_t large_block_ok, + int outfd, vnode_t *vp, offset_t *off) { dsl_pool_t *dp; dsl_dataset_t *ds; @@ -690,18 +715,19 @@ dmu_send_obj(const char *pool, uint64_t zb.zbm_guid = fromds->ds_phys->ds_guid; is_clone = (fromds->ds_dir != ds->ds_dir); dsl_dataset_rele(fromds, FTAG); - err = dmu_send_impl(FTAG, dp, ds, &zb, is_clone, embedok, - outfd, vp, off); + err = dmu_send_impl(FTAG, dp, ds, &zb, is_clone, + embedok, large_block_ok, outfd, vp, off); } else { - err = dmu_send_impl(FTAG, dp, ds, NULL, B_FALSE, embedok, - outfd, vp, off); + err = dmu_send_impl(FTAG, dp, ds, NULL, B_FALSE, + embedok, large_block_ok, outfd, vp, off); } dsl_dataset_rele(ds, FTAG); return (err); } int -dmu_send(const char *tosnap, const char *fromsnap, boolean_t embedok, +dmu_send(const char *tosnap, const char *fromsnap, + boolean_t embedok, boolean_t large_block_ok, int outfd, vnode_t *vp, offset_t *off) { dsl_pool_t *dp; @@ -768,11 +794,11 @@ dmu_send(const char *tosnap, const char dsl_pool_rele(dp, FTAG); return (err); } - err = dmu_send_impl(FTAG, dp, ds, &zb, is_clone, embedok, - outfd, vp, off); + err = dmu_send_impl(FTAG, dp, ds, &zb, is_clone, + embedok, large_block_ok, outfd, vp, off); } else { - err = dmu_send_impl(FTAG, dp, ds, NULL, B_FALSE, embedok, - outfd, vp, off); + err = dmu_send_impl(FTAG, dp, ds, NULL, B_FALSE, + embedok, large_block_ok, outfd, vp, off); } if (owned) dsl_dataset_disown(ds, FTAG); @@ -972,6 +998,15 @@ dmu_recv_begin_check(void *arg, dmu_tx_t !spa_feature_is_enabled(dp->dp_spa, SPA_FEATURE_LZ4_COMPRESS)) return (SET_ERROR(ENOTSUP)); + /* + * The receiving code doesn't know how to translate large blocks + * to smaller ones, so the pool must have the LARGE_BLOCKS + * feature enabled if the stream has LARGE_BLOCKS. + */ + if ((featureflags & DMU_BACKUP_FEATURE_LARGE_BLOCKS) && + !spa_feature_is_enabled(dp->dp_spa, SPA_FEATURE_LARGE_BLOCKS)) + return (SET_ERROR(ENOTSUP)); + error = dsl_dataset_hold(dp, tofs, FTAG, &ds); if (error == 0) { /* target fs already exists; recv into temp clone */ @@ -1097,6 +1132,13 @@ dmu_recv_begin_sync(void *arg, dmu_tx_t } VERIFY0(dsl_dataset_own_obj(dp, dsobj, dmu_recv_tag, &newds)); + if ((DMU_GET_FEATUREFLAGS(drrb->drr_versioninfo) & + DMU_BACKUP_FEATURE_LARGE_BLOCKS) && + !newds->ds_large_blocks) { + dsl_dataset_activate_large_blocks_sync_impl(dsobj, tx); + newds->ds_large_blocks = B_TRUE; + } + dmu_buf_will_dirty(newds->ds_dbuf, tx); newds->ds_phys->ds_flags |= DS_FLAG_INCONSISTENT; @@ -1222,6 +1264,7 @@ restore_read(struct restorearg *ra, int /* some things will require 8-byte alignment, so everything must */ ASSERT0(len % 8); + ASSERT3U(len, <=, ra->bufsize); while (done < len) { ssize_t resid; @@ -1361,7 +1404,7 @@ restore_object(struct restorearg *ra, ob drro->drr_compress >= ZIO_COMPRESS_FUNCTIONS || P2PHASE(drro->drr_blksz, SPA_MINBLOCKSIZE) || drro->drr_blksz < SPA_MINBLOCKSIZE || - drro->drr_blksz > SPA_MAXBLOCKSIZE || + drro->drr_blksz > spa_maxblocksize(dmu_objset_spa(os)) || drro->drr_bonuslen > DN_MAX_BONUSLEN) { return (SET_ERROR(EINVAL)); } @@ -1634,7 +1677,7 @@ restore_spill(struct restorearg *ra, obj int err; if (drrs->drr_length < SPA_MINBLOCKSIZE || - drrs->drr_length > SPA_MAXBLOCKSIZE) + drrs->drr_length > spa_maxblocksize(dmu_objset_spa(os))) return (SET_ERROR(EINVAL)); data = restore_read(ra, drrs->drr_length, NULL); @@ -1721,7 +1764,7 @@ dmu_recv_stream(dmu_recv_cookie_t *drc, ra.cksum = drc->drc_cksum; ra.vp = vp; ra.voff = *voffp; - ra.bufsize = 1<<20; + ra.bufsize = SPA_MAXBLOCKSIZE; ra.buf = kmem_alloc(ra.bufsize, KM_SLEEP); /* these were verified in dmu_recv_begin */ Modified: vendor-sys/illumos/dist/uts/common/fs/zfs/dmu_tx.c ============================================================================== --- vendor-sys/illumos/dist/uts/common/fs/zfs/dmu_tx.c Sat Nov 8 06:34:37 2014 (r274272) +++ vendor-sys/illumos/dist/uts/common/fs/zfs/dmu_tx.c Sat Nov 8 06:43:33 2014 (r274273) @@ -224,7 +224,7 @@ dmu_tx_count_write(dmu_tx_hold_t *txh, u return; min_bs = SPA_MINBLOCKSHIFT; - max_bs = SPA_MAXBLOCKSHIFT; + max_bs = highbit64(txh->txh_tx->tx_objset->os_recordsize) - 1; min_ibs = DN_MIN_INDBLKSHIFT; max_ibs = DN_MAX_INDBLKSHIFT; @@ -293,6 +293,14 @@ dmu_tx_count_write(dmu_tx_hold_t *txh, u */ ASSERT(dn->dn_datablkshift != 0); min_bs = max_bs = dn->dn_datablkshift; + } else { + /* + * The blocksize can increase up to the recordsize, + * or if it is already more than the recordsize, + * up to the next power of 2. + */ + min_bs = highbit64(dn->dn_datablksz - 1); + max_bs = MAX(max_bs, highbit64(dn->dn_datablksz - 1)); } /* @@ -750,11 +758,11 @@ dmu_tx_hold_zap(dmu_tx_t *tx, uint64_t o bp = &dn->dn_phys->dn_blkptr[0]; if (dsl_dataset_block_freeable(dn->dn_objset->os_dsl_dataset, bp, bp->blk_birth)) - txh->txh_space_tooverwrite += SPA_MAXBLOCKSIZE; + txh->txh_space_tooverwrite += MZAP_MAX_BLKSZ; else - txh->txh_space_towrite += SPA_MAXBLOCKSIZE; + txh->txh_space_towrite += MZAP_MAX_BLKSZ; if (!BP_IS_HOLE(bp)) - txh->txh_space_tounref += SPA_MAXBLOCKSIZE; + txh->txh_space_tounref += MZAP_MAX_BLKSZ; return; } @@ -1543,18 +1551,18 @@ dmu_tx_hold_spill(dmu_tx_t *tx, uint64_t /* If blkptr doesn't exist then add space to towrite */ if (!(dn->dn_phys->dn_flags & DNODE_FLAG_SPILL_BLKPTR)) { - txh->txh_space_towrite += SPA_MAXBLOCKSIZE; + txh->txh_space_towrite += SPA_OLD_MAXBLOCKSIZE; } else { blkptr_t *bp; bp = &dn->dn_phys->dn_spill; if (dsl_dataset_block_freeable(dn->dn_objset->os_dsl_dataset, bp, bp->blk_birth)) - txh->txh_space_tooverwrite += SPA_MAXBLOCKSIZE; + txh->txh_space_tooverwrite += SPA_OLD_MAXBLOCKSIZE; else - txh->txh_space_towrite += SPA_MAXBLOCKSIZE; + txh->txh_space_towrite += SPA_OLD_MAXBLOCKSIZE; if (!BP_IS_HOLE(bp)) - txh->txh_space_tounref += SPA_MAXBLOCKSIZE; + txh->txh_space_tounref += SPA_OLD_MAXBLOCKSIZE; } } Modified: vendor-sys/illumos/dist/uts/common/fs/zfs/dnode.c ============================================================================== --- vendor-sys/illumos/dist/uts/common/fs/zfs/dnode.c Sat Nov 8 06:34:37 2014 (r274272) +++ vendor-sys/illumos/dist/uts/common/fs/zfs/dnode.c Sat Nov 8 06:43:33 2014 (r274273) @@ -510,10 +510,10 @@ dnode_allocate(dnode_t *dn, dmu_object_t { int i; + ASSERT3U(blocksize, <=, + spa_maxblocksize(dmu_objset_spa(dn->dn_objset))); if (blocksize == 0) blocksize = 1 << zfs_default_bs; - else if (blocksize > SPA_MAXBLOCKSIZE) - blocksize = SPA_MAXBLOCKSIZE; else blocksize = P2ROUNDUP(blocksize, SPA_MINBLOCKSIZE); @@ -594,7 +594,8 @@ dnode_reallocate(dnode_t *dn, dmu_object int nblkptr; ASSERT3U(blocksize, >=, SPA_MINBLOCKSIZE); - ASSERT3U(blocksize, <=, SPA_MAXBLOCKSIZE); + ASSERT3U(blocksize, <=, + spa_maxblocksize(dmu_objset_spa(dn->dn_objset))); ASSERT0(blocksize % SPA_MINBLOCKSIZE); ASSERT(dn->dn_object != DMU_META_DNODE_OBJECT || dmu_tx_private_ok(tx)); ASSERT(tx->tx_txg != 0); @@ -1347,10 +1348,9 @@ dnode_set_blksz(dnode_t *dn, uint64_t si dmu_buf_impl_t *db; int err; + ASSERT3U(size, <=, spa_maxblocksize(dmu_objset_spa(dn->dn_objset))); if (size == 0) size = SPA_MINBLOCKSIZE; - if (size > SPA_MAXBLOCKSIZE) - size = SPA_MAXBLOCKSIZE; else size = P2ROUNDUP(size, SPA_MINBLOCKSIZE); Modified: vendor-sys/illumos/dist/uts/common/fs/zfs/dsl_dataset.c ============================================================================== --- vendor-sys/illumos/dist/uts/common/fs/zfs/dsl_dataset.c Sat Nov 8 06:34:37 2014 (r274272) +++ vendor-sys/illumos/dist/uts/common/fs/zfs/dsl_dataset.c Sat Nov 8 06:43:33 2014 (r274273) @@ -50,6 +50,17 @@ #include #include +/* + * The SPA supports block sizes up to 16MB. However, very large blocks + * can have an impact on i/o latency (e.g. tying up a spinning disk for + * ~300ms), and also potentially on the memory allocator. Therefore, + * we do not allow the recordsize to be set larger than zfs_max_recordsize + * (default 1MB). Larger blocks can be created by changing this tunable, + * and pools with larger blocks can always be imported and used, regardless + * of this setting. + */ +int zfs_max_recordsize = 1 * 1024 * 1024; + #define SWITCH64(x, y) \ { \ uint64_t __tmp = (x); \ @@ -59,8 +70,6 @@ #define DS_REF_MAX (1ULL << 62) -#define DSL_DEADLIST_BLOCKSIZE SPA_MAXBLOCKSIZE - /* * Figure out how much of this delta should be propogated to the dsl_dir * layer. If there's a refreservation, that space has already been @@ -110,6 +119,8 @@ dsl_dataset_block_born(dsl_dataset_t *ds ds->ds_phys->ds_compressed_bytes += compressed; ds->ds_phys->ds_uncompressed_bytes += uncompressed; ds->ds_phys->ds_unique_bytes += used; + if (BP_GET_LSIZE(bp) > SPA_OLD_MAXBLOCKSIZE) + ds->ds_need_large_blocks = B_TRUE; mutex_exit(&ds->ds_lock); dsl_dir_diduse_space(ds->ds_dir, DD_USED_HEAD, delta, compressed, uncompressed, tx); @@ -387,6 +398,14 @@ dsl_dataset_hold_obj(dsl_pool_t *dp, uin list_create(&ds->ds_sendstreams, sizeof (dmu_sendarg_t), offsetof(dmu_sendarg_t, dsa_link)); + if (doi.doi_type == DMU_OTN_ZAP_METADATA) { + err = zap_contains(mos, dsobj, DS_FIELD_LARGE_BLOCKS); + if (err == 0) + ds->ds_large_blocks = B_TRUE; + else + ASSERT3U(err, ==, ENOENT); + } + if (err == 0) { err = dsl_dir_hold_obj(dp, ds->ds_phys->ds_dir_obj, NULL, ds, &ds->ds_dir); @@ -700,6 +719,9 @@ dsl_dataset_create_sync_dd(dsl_dir_t *dd dsphys->ds_flags |= origin->ds_phys->ds_flags & (DS_FLAG_INCONSISTENT | DS_FLAG_CI_DATASET); + if (origin->ds_large_blocks) + dsl_dataset_activate_large_blocks_sync_impl(dsobj, tx); + dmu_buf_will_dirty(origin->ds_dbuf, tx); origin->ds_phys->ds_num_children++; @@ -1213,6 +1235,9 @@ dsl_dataset_snapshot_sync_impl(dsl_datas dsphys->ds_bp = ds->ds_phys->ds_bp; dmu_buf_rele(dbuf, FTAG); + if (ds->ds_large_blocks) + dsl_dataset_activate_large_blocks_sync_impl(dsobj, tx); + ASSERT3U(ds->ds_prev != 0, ==, ds->ds_phys->ds_prev_snap_obj != 0); if (ds->ds_prev) { uint64_t next_clones_obj = @@ -1486,6 +1511,11 @@ dsl_dataset_sync(dsl_dataset_t *ds, zio_ ds->ds_phys->ds_fsid_guid = ds->ds_fsid_guid; dmu_objset_sync(ds->ds_objset, zio, tx); + + if (ds->ds_need_large_blocks && !ds->ds_large_blocks) { + dsl_dataset_activate_large_blocks_sync_impl(ds->ds_object, tx); + ds->ds_large_blocks = B_TRUE; + } } static void @@ -3128,6 +3158,77 @@ dsl_dataset_space_wouldfree(dsl_dataset_ return (err); } +static int +dsl_dataset_activate_large_blocks_check(void *arg, dmu_tx_t *tx) +{ + const char *dsname = arg; + dsl_dataset_t *ds; + dsl_pool_t *dp = dmu_tx_pool(tx); + int error = 0; + + if (!spa_feature_is_enabled(dp->dp_spa, SPA_FEATURE_LARGE_BLOCKS)) + return (SET_ERROR(ENOTSUP)); + + ASSERT(spa_feature_is_enabled(dp->dp_spa, + SPA_FEATURE_EXTENSIBLE_DATASET)); + + error = dsl_dataset_hold(dp, dsname, FTAG, &ds); + if (error != 0) + return (error); + + if (ds->ds_large_blocks) + error = EALREADY; + dsl_dataset_rele(ds, FTAG); + + return (error); +} + +void +dsl_dataset_activate_large_blocks_sync_impl(uint64_t dsobj, dmu_tx_t *tx) +{ + spa_t *spa = dmu_tx_pool(tx)->dp_spa; + objset_t *mos = dmu_tx_pool(tx)->dp_meta_objset; + uint64_t zero = 0; + + spa_feature_incr(spa, SPA_FEATURE_LARGE_BLOCKS, tx); + dmu_object_zapify(mos, dsobj, DMU_OT_DSL_DATASET, tx); + + VERIFY0(zap_add(mos, dsobj, DS_FIELD_LARGE_BLOCKS, + sizeof (zero), 1, &zero, tx)); +} + +static void +dsl_dataset_activate_large_blocks_sync(void *arg, dmu_tx_t *tx) +{ + const char *dsname = arg; + dsl_dataset_t *ds; + + VERIFY0(dsl_dataset_hold(dmu_tx_pool(tx), dsname, FTAG, &ds)); + + dsl_dataset_activate_large_blocks_sync_impl(ds->ds_object, tx); + ASSERT(!ds->ds_large_blocks); + ds->ds_large_blocks = B_TRUE; + dsl_dataset_rele(ds, FTAG); +} + +int +dsl_dataset_activate_large_blocks(const char *dsname) +{ + int error; + + error = dsl_sync_task(dsname, + dsl_dataset_activate_large_blocks_check, + dsl_dataset_activate_large_blocks_sync, (void *)dsname, + 1, ZFS_SPACE_CHECK_RESERVED); + + /* + * EALREADY indicates that this dataset already supports large blocks. + */ + if (error == EALREADY) + error = 0; + return (error); +} + /* * Return TRUE if 'earlier' is an earlier snapshot in 'later's timeline. * For example, they could both be snapshots of the same filesystem, and Modified: vendor-sys/illumos/dist/uts/common/fs/zfs/dsl_deadlist.c ============================================================================== --- vendor-sys/illumos/dist/uts/common/fs/zfs/dsl_deadlist.c Sat Nov 8 06:34:37 2014 (r274272) +++ vendor-sys/illumos/dist/uts/common/fs/zfs/dsl_deadlist.c Sat Nov 8 06:43:33 2014 (r274273) @@ -143,7 +143,7 @@ uint64_t dsl_deadlist_alloc(objset_t *os, dmu_tx_t *tx) { if (spa_version(dmu_objset_spa(os)) < SPA_VERSION_DEADLISTS) - return (bpobj_alloc(os, SPA_MAXBLOCKSIZE, tx)); + return (bpobj_alloc(os, SPA_OLD_MAXBLOCKSIZE, tx)); return (zap_create(os, DMU_OT_DEADLIST, DMU_OT_DEADLIST_HDR, sizeof (dsl_deadlist_phys_t), tx)); } @@ -180,7 +180,7 @@ dle_enqueue(dsl_deadlist_t *dl, dsl_dead { if (dle->dle_bpobj.bpo_object == dmu_objset_pool(dl->dl_os)->dp_empty_bpobj) { - uint64_t obj = bpobj_alloc(dl->dl_os, SPA_MAXBLOCKSIZE, tx); + uint64_t obj = bpobj_alloc(dl->dl_os, SPA_OLD_MAXBLOCKSIZE, tx); bpobj_close(&dle->dle_bpobj); bpobj_decr_empty(dl->dl_os, tx); VERIFY3U(0, ==, bpobj_open(&dle->dle_bpobj, dl->dl_os, obj)); @@ -254,7 +254,7 @@ dsl_deadlist_add_key(dsl_deadlist_t *dl, dle = kmem_alloc(sizeof (*dle), KM_SLEEP); dle->dle_mintxg = mintxg; - obj = bpobj_alloc_empty(dl->dl_os, SPA_MAXBLOCKSIZE, tx); + obj = bpobj_alloc_empty(dl->dl_os, SPA_OLD_MAXBLOCKSIZE, tx); VERIFY3U(0, ==, bpobj_open(&dle->dle_bpobj, dl->dl_os, obj)); avl_add(&dl->dl_tree, dle); @@ -338,7 +338,7 @@ dsl_deadlist_clone(dsl_deadlist_t *dl, u if (dle->dle_mintxg >= maxtxg) break; - obj = bpobj_alloc_empty(dl->dl_os, SPA_MAXBLOCKSIZE, tx); + obj = bpobj_alloc_empty(dl->dl_os, SPA_OLD_MAXBLOCKSIZE, tx); VERIFY3U(0, ==, zap_add_int_key(dl->dl_os, newobj, dle->dle_mintxg, obj, tx)); } Modified: vendor-sys/illumos/dist/uts/common/fs/zfs/dsl_destroy.c ============================================================================== --- vendor-sys/illumos/dist/uts/common/fs/zfs/dsl_destroy.c Sat Nov 8 06:34:37 2014 (r274272) +++ vendor-sys/illumos/dist/uts/common/fs/zfs/dsl_destroy.c Sat Nov 8 06:43:33 2014 (r274273) @@ -264,6 +264,10 @@ dsl_destroy_snapshot_sync_impl(dsl_datas obj = ds->ds_object; + if (ds->ds_large_blocks) { + ASSERT0(zap_contains(mos, obj, DS_FIELD_LARGE_BLOCKS)); + spa_feature_decr(dp->dp_spa, SPA_FEATURE_LARGE_BLOCKS, tx); + } if (ds->ds_phys->ds_prev_snap_obj != 0) { ASSERT3P(ds->ds_prev, ==, NULL); VERIFY0(dsl_dataset_hold_obj(dp, @@ -720,6 +724,9 @@ dsl_destroy_head_sync_impl(dsl_dataset_t ASSERT0(ds->ds_reserved); } + if (ds->ds_large_blocks) + spa_feature_decr(dp->dp_spa, SPA_FEATURE_LARGE_BLOCKS, tx); + dsl_scan_ds_destroyed(ds, tx); obj = ds->ds_object; Modified: vendor-sys/illumos/dist/uts/common/fs/zfs/dsl_pool.c ============================================================================== --- vendor-sys/illumos/dist/uts/common/fs/zfs/dsl_pool.c Sat Nov 8 06:34:37 2014 (r274272) +++ vendor-sys/illumos/dist/uts/common/fs/zfs/dsl_pool.c Sat Nov 8 06:43:33 2014 (r274273) @@ -367,7 +367,7 @@ dsl_pool_create(spa_t *spa, nvlist_t *zp FREE_DIR_NAME, &dp->dp_free_dir)); /* create and open the free_bplist */ - obj = bpobj_alloc(dp->dp_meta_objset, SPA_MAXBLOCKSIZE, tx); + obj = bpobj_alloc(dp->dp_meta_objset, SPA_OLD_MAXBLOCKSIZE, tx); VERIFY(zap_add(dp->dp_meta_objset, DMU_POOL_DIRECTORY_OBJECT, DMU_POOL_FREE_BPOBJ, sizeof (uint64_t), 1, &obj, tx) == 0); VERIFY0(bpobj_open(&dp->dp_free_bpobj, @@ -792,7 +792,7 @@ dsl_pool_upgrade_dir_clones(dsl_pool_t * * subobj support. So call dmu_object_alloc() directly. */ obj = dmu_object_alloc(dp->dp_meta_objset, DMU_OT_BPOBJ, - SPA_MAXBLOCKSIZE, DMU_OT_BPOBJ_HDR, sizeof (bpobj_phys_t), tx); + SPA_OLD_MAXBLOCKSIZE, DMU_OT_BPOBJ_HDR, sizeof (bpobj_phys_t), tx); VERIFY0(zap_add(dp->dp_meta_objset, DMU_POOL_DIRECTORY_OBJECT, DMU_POOL_FREE_BPOBJ, sizeof (uint64_t), 1, &obj, tx)); VERIFY0(bpobj_open(&dp->dp_free_bpobj, dp->dp_meta_objset, obj)); Modified: vendor-sys/illumos/dist/uts/common/fs/zfs/metaslab.c ============================================================================== --- vendor-sys/illumos/dist/uts/common/fs/zfs/metaslab.c Sat Nov 8 06:34:37 2014 (r274272) +++ vendor-sys/illumos/dist/uts/common/fs/zfs/metaslab.c Sat Nov 8 06:43:33 2014 (r274273) @@ -125,7 +125,7 @@ int metaslab_debug_unload = 0; * an allocation of this size then it switches to using more * aggressive strategy (i.e search by size rather than offset). */ -uint64_t metaslab_df_alloc_threshold = SPA_MAXBLOCKSIZE; +uint64_t metaslab_df_alloc_threshold = SPA_OLD_MAXBLOCKSIZE; /* * The minimum free space, in percent, which must be available Modified: vendor-sys/illumos/dist/uts/common/fs/zfs/sa.c ============================================================================== --- vendor-sys/illumos/dist/uts/common/fs/zfs/sa.c Sat Nov 8 06:34:37 2014 (r274272) +++ vendor-sys/illumos/dist/uts/common/fs/zfs/sa.c Sat Nov 8 06:43:33 2014 (r274273) @@ -500,7 +500,7 @@ sa_resize_spill(sa_handle_t *hdl, uint32 if (size == 0) { blocksize = SPA_MINBLOCKSIZE; - } else if (size > SPA_MAXBLOCKSIZE) { + } else if (size > SPA_OLD_MAXBLOCKSIZE) { ASSERT(0); return (SET_ERROR(EFBIG)); } else { @@ -675,7 +675,7 @@ sa_build_layouts(sa_handle_t *hdl, sa_bu hdrsize = sa_find_sizes(sa, attr_desc, attr_count, hdl->sa_bonus, SA_BONUS, &i, &used, &spilling); - if (used > SPA_MAXBLOCKSIZE) + if (used > SPA_OLD_MAXBLOCKSIZE) return (SET_ERROR(EFBIG)); VERIFY(0 == dmu_set_bonus(hdl->sa_bonus, spilling ? @@ -699,7 +699,7 @@ sa_build_layouts(sa_handle_t *hdl, sa_bu attr_count - i, hdl->sa_spill, SA_SPILL, &i, &spill_used, &dummy); - if (spill_used > SPA_MAXBLOCKSIZE) + if (spill_used > SPA_OLD_MAXBLOCKSIZE) return (SET_ERROR(EFBIG)); buf_space = hdl->sa_spill->db_size - spillhdrsize; Modified: vendor-sys/illumos/dist/uts/common/fs/zfs/spa.c ============================================================================== --- vendor-sys/illumos/dist/uts/common/fs/zfs/spa.c Sat Nov 8 06:34:37 2014 (r274272) +++ vendor-sys/illumos/dist/uts/common/fs/zfs/spa.c Sat Nov 8 06:43:33 2014 (r274273) @@ -267,6 +267,14 @@ spa_prop_get_config(spa_t *spa, nvlist_t spa_prop_add_list(*nvp, ZPOOL_PROP_ALTROOT, spa->spa_root, 0, ZPROP_SRC_LOCAL); + if (spa_feature_is_enabled(spa, SPA_FEATURE_LARGE_BLOCKS)) { + spa_prop_add_list(*nvp, ZPOOL_PROP_MAXBLOCKSIZE, NULL, + MIN(zfs_max_recordsize, SPA_MAXBLOCKSIZE), ZPROP_SRC_NONE); + } else { + spa_prop_add_list(*nvp, ZPOOL_PROP_MAXBLOCKSIZE, NULL, + SPA_OLD_MAXBLOCKSIZE, ZPROP_SRC_NONE); + } + if ((dp = list_head(&spa->spa_config_list)) != NULL) { if (dp->scd_path == NULL) { spa_prop_add_list(*nvp, ZPOOL_PROP_CACHEFILE, @@ -481,7 +489,7 @@ spa_prop_validate(spa_t *spa, nvlist_t * if (!error) { objset_t *os; - uint64_t compress; + uint64_t propval; if (strval == NULL || strval[0] == '\0') { objnum = zpool_prop_default_numeric( @@ -492,15 +500,25 @@ spa_prop_validate(spa_t *spa, nvlist_t * if (error = dmu_objset_hold(strval, FTAG, &os)) break; - /* Must be ZPL and not gzip compressed. */ + /* + * Must be ZPL, and its property settings + * must be supported by GRUB (compression + * is not gzip, and large blocks are not used). + */ if (dmu_objset_type(os) != DMU_OST_ZFS) { error = SET_ERROR(ENOTSUP); } else if ((error = dsl_prop_get_int_ds(dmu_objset_ds(os), zfs_prop_to_name(ZFS_PROP_COMPRESSION), - &compress)) == 0 && - !BOOTFS_COMPRESS_VALID(compress)) { + &propval)) == 0 && + !BOOTFS_COMPRESS_VALID(propval)) { + error = SET_ERROR(ENOTSUP); + } else if ((error = + dsl_prop_get_int_ds(dmu_objset_ds(os), + zfs_prop_to_name(ZFS_PROP_RECORDSIZE), + &propval)) == 0 && + propval > SPA_OLD_MAXBLOCKSIZE) { error = SET_ERROR(ENOTSUP); } else { objnum = dmu_objset_id(os); Modified: vendor-sys/illumos/dist/uts/common/fs/zfs/spa_history.c ============================================================================== --- vendor-sys/illumos/dist/uts/common/fs/zfs/spa_history.c Sat Nov 8 06:34:37 2014 (r274272) +++ vendor-sys/illumos/dist/uts/common/fs/zfs/spa_history.c Sat Nov 8 06:43:33 2014 (r274273) @@ -90,7 +90,7 @@ spa_history_create_obj(spa_t *spa, dmu_t ASSERT(spa->spa_history == 0); spa->spa_history = dmu_object_alloc(mos, DMU_OT_SPA_HISTORY, - SPA_MAXBLOCKSIZE, DMU_OT_SPA_HISTORY_OFFSETS, + SPA_OLD_MAXBLOCKSIZE, DMU_OT_SPA_HISTORY_OFFSETS, sizeof (spa_history_phys_t), tx); VERIFY(zap_add(mos, DMU_POOL_DIRECTORY_OBJECT, Modified: vendor-sys/illumos/dist/uts/common/fs/zfs/spa_misc.c ============================================================================== --- vendor-sys/illumos/dist/uts/common/fs/zfs/spa_misc.c Sat Nov 8 06:34:37 2014 (r274272) +++ vendor-sys/illumos/dist/uts/common/fs/zfs/spa_misc.c Sat Nov 8 06:43:33 2014 (r274273) @@ -1963,3 +1963,12 @@ spa_debug_enabled(spa_t *spa) { return (spa->spa_debug); } + +int +spa_maxblocksize(spa_t *spa) +{ + if (spa_feature_is_enabled(spa, SPA_FEATURE_LARGE_BLOCKS)) + return (SPA_MAXBLOCKSIZE); + else + return (SPA_OLD_MAXBLOCKSIZE); +} Modified: vendor-sys/illumos/dist/uts/common/fs/zfs/sys/dmu.h ============================================================================== --- vendor-sys/illumos/dist/uts/common/fs/zfs/sys/dmu.h Sat Nov 8 06:34:37 2014 (r274272) +++ vendor-sys/illumos/dist/uts/common/fs/zfs/sys/dmu.h Sat Nov 8 06:43:33 2014 (r274273) @@ -249,7 +249,7 @@ void zfs_znode_byteswap(void *buf, size_ * The maximum number of bytes that can be accessed as part of one * operation, including metadata. */ -#define DMU_MAX_ACCESS (10<<20) /* 10MB */ +#define DMU_MAX_ACCESS (32 * 1024 * 1024) /* 32MB */ #define DMU_MAX_DELETEBLKCNT (20480) /* ~5MB of indirect blocks */ #define DMU_USERUSED_OBJECT (-1ULL) @@ -637,6 +637,7 @@ void xuio_stat_wbuf_copied(); void xuio_stat_wbuf_nocopy(); extern int zfs_prefetch_disable; +extern int zfs_max_recordsize; /* * Asynchronously try to read in the data. Modified: vendor-sys/illumos/dist/uts/common/fs/zfs/sys/dmu_objset.h ============================================================================== --- vendor-sys/illumos/dist/uts/common/fs/zfs/sys/dmu_objset.h Sat Nov 8 06:34:37 2014 (r274272) +++ vendor-sys/illumos/dist/uts/common/fs/zfs/sys/dmu_objset.h Sat Nov 8 06:43:33 2014 (r274273) @@ -95,6 +95,7 @@ struct objset { zfs_cache_type_t os_secondary_cache; zfs_sync_type_t os_sync; zfs_redundant_metadata_type_t os_redundant_metadata; + int os_recordsize; /* no lock needed: */ struct dmu_tx *os_synctx; /* XXX sketchy */ Modified: vendor-sys/illumos/dist/uts/common/fs/zfs/sys/dmu_send.h ============================================================================== --- vendor-sys/illumos/dist/uts/common/fs/zfs/sys/dmu_send.h Sat Nov 8 06:34:37 2014 (r274272) +++ vendor-sys/illumos/dist/uts/common/fs/zfs/sys/dmu_send.h Sat Nov 8 06:43:33 2014 (r274273) @@ -37,12 +37,14 @@ struct dsl_dataset; struct drr_begin; struct avl_tree; -int dmu_send(const char *tosnap, const char *fromsnap, boolean_t embedok, +int dmu_send(const char *tosnap, const char *fromsnap, + boolean_t embedok, boolean_t large_block_ok, int outfd, struct vnode *vp, offset_t *off); int dmu_send_estimate(struct dsl_dataset *ds, struct dsl_dataset *fromds, uint64_t *sizep); int dmu_send_obj(const char *pool, uint64_t tosnap, uint64_t fromsnap, - boolean_t embedok, int outfd, vnode_t *vp, offset_t *off); + boolean_t embedok, boolean_t large_block_ok, + int outfd, struct vnode *vp, offset_t *off); typedef struct dmu_recv_cookie { struct dsl_dataset *drc_ds; Modified: vendor-sys/illumos/dist/uts/common/fs/zfs/sys/dsl_dataset.h ============================================================================== --- vendor-sys/illumos/dist/uts/common/fs/zfs/sys/dsl_dataset.h Sat Nov 8 06:34:37 2014 (r274272) +++ vendor-sys/illumos/dist/uts/common/fs/zfs/sys/dsl_dataset.h Sat Nov 8 06:43:33 2014 (r274273) @@ -83,6 +83,13 @@ struct dsl_pool; #define DS_FIELD_BOOKMARK_NAMES "com.delphix:bookmarks" /* + * This field is present (with value=0) if this dataset may contain large + * blocks (>128KB). If it is present, then this dataset + * is counted in the refcount of the SPA_FEATURE_LARGE_BLOCKS feature. + */ +#define DS_FIELD_LARGE_BLOCKS "org.open-zfs:large_blocks" + +/* * DS_FLAG_CI_DATASET is set if the dataset contains a file system whose * name lookups should be performed case-insensitively. */ @@ -135,6 +142,8 @@ typedef struct dsl_dataset { /* only used in syncing context, only valid for non-snapshots: */ struct dsl_dataset *ds_prev; uint64_t ds_bookmarks; /* DMU_OTN_ZAP_METADATA */ + boolean_t ds_large_blocks; + boolean_t ds_need_large_blocks; /* has internal locking: */ dsl_deadlist_t ds_deadlist; @@ -244,6 +253,8 @@ int dsl_dataset_space_written(dsl_datase int dsl_dataset_space_wouldfree(dsl_dataset_t *firstsnap, dsl_dataset_t *last, uint64_t *usedp, uint64_t *compp, uint64_t *uncompp); boolean_t dsl_dataset_is_dirty(dsl_dataset_t *ds); +int dsl_dataset_activate_large_blocks(const char *dsname); +void dsl_dataset_activate_large_blocks_sync_impl(uint64_t dsobj, dmu_tx_t *tx); int dsl_dsobj_to_dsname(char *pname, uint64_t obj, char *buf); Modified: vendor-sys/illumos/dist/uts/common/fs/zfs/sys/spa.h ============================================================================== --- vendor-sys/illumos/dist/uts/common/fs/zfs/sys/spa.h Sat Nov 8 06:34:37 2014 (r274272) +++ vendor-sys/illumos/dist/uts/common/fs/zfs/sys/spa.h Sat Nov 8 06:43:33 2014 (r274273) @@ -94,17 +94,26 @@ _NOTE(CONSTCOND) } while (0) _NOTE(CONSTCOND) } while (0) /* - * We currently support nine block sizes, from 512 bytes to 128K. - * We could go higher, but the benefits are near-zero and the cost - * of COWing a giant block to modify one byte would become excessive. + * We currently support block sizes from 512 bytes to 16MB. + * The benefits of larger blocks, and thus larger IO, need to be weighed + * against the cost of COWing a giant block to modify one byte, and the + * large latency of reading or writing a large block. + * + * Note that although blocks up to 16MB are supported, the recordsize + * property can not be set larger than zfs_max_recordsize (default 1MB). + * See the comment near zfs_max_recordsize in dsl_dataset.c for details. + * + * Note that although the LSIZE field of the blkptr_t can store sizes up + * to 32MB, the dnode's dn_datablkszsec can only store sizes up to + * 32MB - 512 bytes. Therefore, we limit SPA_MAXBLOCKSIZE to 16MB. */ #define SPA_MINBLOCKSHIFT 9 -#define SPA_MAXBLOCKSHIFT 17 +#define SPA_OLD_MAXBLOCKSHIFT 17 +#define SPA_MAXBLOCKSHIFT 24 #define SPA_MINBLOCKSIZE (1ULL << SPA_MINBLOCKSHIFT) +#define SPA_OLD_MAXBLOCKSIZE (1ULL << SPA_OLD_MAXBLOCKSHIFT) #define SPA_MAXBLOCKSIZE (1ULL << SPA_MAXBLOCKSHIFT) -#define SPA_BLOCKSIZES (SPA_MAXBLOCKSHIFT - SPA_MINBLOCKSHIFT + 1) - /* * Size of block to hold the configuration data (a packed nvlist) */ @@ -781,6 +790,7 @@ extern boolean_t spa_has_slogs(spa_t *sp extern boolean_t spa_is_root(spa_t *spa); extern boolean_t spa_writeable(spa_t *spa); extern boolean_t spa_has_pending_synctask(spa_t *spa); +extern int spa_maxblocksize(spa_t *spa); extern int spa_mode(spa_t *spa); extern uint64_t strtonum(const char *str, char **nptr); Modified: vendor-sys/illumos/dist/uts/common/fs/zfs/sys/zap_impl.h ============================================================================== --- vendor-sys/illumos/dist/uts/common/fs/zfs/sys/zap_impl.h Sat Nov 8 06:34:37 2014 (r274272) +++ vendor-sys/illumos/dist/uts/common/fs/zfs/sys/zap_impl.h Sat Nov 8 06:43:33 2014 (r274273) @@ -42,8 +42,7 @@ extern int fzap_default_block_shift; #define MZAP_ENT_LEN 64 #define MZAP_NAME_LEN (MZAP_ENT_LEN - 8 - 4 - 2) -#define MZAP_MAX_BLKSHIFT SPA_MAXBLOCKSHIFT -#define MZAP_MAX_BLKSZ (1 << MZAP_MAX_BLKSHIFT) +#define MZAP_MAX_BLKSZ SPA_OLD_MAXBLOCKSIZE #define ZAP_NEED_CD (-1U) Modified: vendor-sys/illumos/dist/uts/common/fs/zfs/sys/zfs_ioctl.h ============================================================================== --- vendor-sys/illumos/dist/uts/common/fs/zfs/sys/zfs_ioctl.h Sat Nov 8 06:34:37 2014 (r274272) +++ vendor-sys/illumos/dist/uts/common/fs/zfs/sys/zfs_ioctl.h Sat Nov 8 06:43:33 2014 (r274273) @@ -85,13 +85,16 @@ typedef enum drr_headertype { /* flags #3 - #15 are reserved for incompatible closed-source implementations */ #define DMU_BACKUP_FEATURE_EMBED_DATA (1<<16) #define DMU_BACKUP_FEATURE_EMBED_DATA_LZ4 (1<<17) +/* flag #18 is reserved for a Delphix feature */ +#define DMU_BACKUP_FEATURE_LARGE_BLOCKS (1<<19) /* * Mask of all supported backup features */ #define DMU_BACKUP_FEATURE_MASK (DMU_BACKUP_FEATURE_DEDUP | \ DMU_BACKUP_FEATURE_DEDUPPROPS | DMU_BACKUP_FEATURE_SA_SPILL | \ - DMU_BACKUP_FEATURE_EMBED_DATA | DMU_BACKUP_FEATURE_EMBED_DATA_LZ4) + DMU_BACKUP_FEATURE_EMBED_DATA | DMU_BACKUP_FEATURE_EMBED_DATA_LZ4 | \ + DMU_BACKUP_FEATURE_LARGE_BLOCKS) /* Are all features in the given flag word currently supported? */ #define DMU_STREAM_SUPPORTED(x) (!((x) & ~DMU_BACKUP_FEATURE_MASK)) Modified: vendor-sys/illumos/dist/uts/common/fs/zfs/sys/zfs_znode.h ============================================================================== --- vendor-sys/illumos/dist/uts/common/fs/zfs/sys/zfs_znode.h Sat Nov 8 06:34:37 2014 (r274272) +++ vendor-sys/illumos/dist/uts/common/fs/zfs/sys/zfs_znode.h Sat Nov 8 06:43:33 2014 (r274273) @@ -136,8 +136,6 @@ extern "C" { #define ZFS_SHARES_DIR "SHARES" #define ZFS_SA_ATTRS "SA_ATTRS" -#define ZFS_MAX_BLOCKSIZE (SPA_MAXBLOCKSIZE) - /* * Path component length * Modified: vendor-sys/illumos/dist/uts/common/fs/zfs/sys/zil.h ============================================================================== --- vendor-sys/illumos/dist/uts/common/fs/zfs/sys/zil.h Sat Nov 8 06:34:37 2014 (r274272) +++ vendor-sys/illumos/dist/uts/common/fs/zfs/sys/zil.h Sat Nov 8 06:43:33 2014 (r274273) @@ -90,7 +90,6 @@ typedef struct zil_chain { } zil_chain_t; #define ZIL_MIN_BLKSZ 4096ULL -#define ZIL_MAX_BLKSZ SPA_MAXBLOCKSIZE /* * The words of a log block checksum. Modified: vendor-sys/illumos/dist/uts/common/fs/zfs/sys/zil_impl.h ============================================================================== --- vendor-sys/illumos/dist/uts/common/fs/zfs/sys/zil_impl.h Sat Nov 8 06:34:37 2014 (r274272) +++ vendor-sys/illumos/dist/uts/common/fs/zfs/sys/zil_impl.h Sat Nov 8 06:43:33 2014 (r274273) @@ -139,7 +139,7 @@ typedef struct zil_bp_node { avl_node_t zn_node; } zil_bp_node_t; -#define ZIL_MAX_LOG_DATA (SPA_MAXBLOCKSIZE - sizeof (zil_chain_t) - \ +#define ZIL_MAX_LOG_DATA (SPA_OLD_MAXBLOCKSIZE - sizeof (zil_chain_t) - \ sizeof (lr_write_t)) #ifdef __cplusplus *** DIFF OUTPUT TRUNCATED AT 1000 LINES ***