Date: Thu, 10 Mar 2016 09:57:06 +0000 From: Steven Hartland <steven@multiplay.co.uk> To: Alexander Motin <mav@FreeBSD.org>, svn-src-head@freebsd.org Subject: Re: svn commit: r296610 - head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs Message-ID: <56E144F2.3000609@freebsd.org> In-Reply-To: <201603100901.u2A91JLq082451@repo.freebsd.org> References: <201603100901.u2A91JLq082451@repo.freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
When processing the MFC of this I noticed there's lots of merge info @ sys/cddl/contrib/opensolaris. I've removed this from the MFC request but wondered if we should clean up HEAD so this doesn't accidentally get merged? On 10/03/2016 09:01, Alexander Motin wrote: > Author: mav > Date: Thu Mar 10 09:01:19 2016 > New Revision: 296610 > URL: https://svnweb.freebsd.org/changeset/base/296610 > > Log: > MFV r296609: 6370 ZFS send fails to transmit some holes > > Reviewed by: Matthew Ahrens <mahrens@delphix.com> > Reviewed by: Chris Williamson <chris.williamson@delphix.com> > Reviewed by: Stefan Ring <stefanrin@gmail.com> > Reviewed by: Steven Burgess <sburgess@datto.com> > Reviewed by: Arne Jansen <sensille@gmx.net> > Approved by: Robert Mustacchi <rm@joyent.com> > Author: Paul Dagnelie <pcd@delphix.com> > > In certain circumstances, "zfs send -i" (incremental send) can produce a > stream which will result in incorrect sparse file contents on the > target. > > The problem manifests as regions of the received file that should be > sparse (and read a zero-filled) actually contain data from a file that > was deleted (and which happened to share this file's object ID). > > Note: this can happen only with filesystems (not zvols, because they do > not free (and thus can not reuse) object IDs). > > Note: This can happen only if, since the incremental source (FromSnap), > a file was deleted and then another file was created, and the new file > is sparse (i.e. has areas that were never written to and should be > implicitly zero-filled). > > We suspect that this was introduced by 4370 (applies only if hole_birth > feature is enabled), and made worse by 5243 (applies if hole_birth > feature is disabled, and we never send any holes). > > The bug is caused by the hole birth feature. When an object is deleted > and replaced, all the holes in the object have birth time zero. However, > zfs send cannot tell that the holes are new since the file was replaced, > so it doesn't send them in an incremental. As a result, you can end up > with invalid data when you receive incremental send streams. As a > short-term fix, we can always send holes with birth time 0 (unless it's > a zvol or a dataset where we can guarantee that no objects have been > reused). > > Closes #37 > > openzfs/openzfs@adef853162e83f7cdf6b2d9af9756d434a9c743b > > Modified: > head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_object.c > head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_traverse.c > Directory Properties: > head/sys/cddl/contrib/opensolaris/ (props changed) > > Modified: head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_object.c > ============================================================================== > --- head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_object.c Thu Mar 10 08:56:18 2016 (r296609) > +++ head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_object.c Thu Mar 10 09:01:19 2016 (r296610) > @@ -20,7 +20,7 @@ > */ > /* > * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved. > - * Copyright (c) 2013, 2014 by Delphix. All rights reserved. > + * Copyright (c) 2013, 2015 by Delphix. All rights reserved. > * Copyright 2014 HybridCluster. All rights reserved. > */ > > @@ -50,6 +50,12 @@ dmu_object_alloc(objset_t *os, dmu_objec > * reasonably sparse (at most 1/4 full). Look from the > * beginning once, but after that keep looking from here. > * If we can't find one, just keep going from here. > + * > + * Note that dmu_traverse depends on the behavior that we use > + * multiple blocks of the dnode object before going back to > + * reuse objects. Any change to this algorithm should preserve > + * that property or find another solution to the issues > + * described in traverse_visitbp. > */ > if (P2PHASE(object, L2_dnode_count) == 0) { > uint64_t offset = restarted ? object << DNODE_SHIFT : 0; > > Modified: head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_traverse.c > ============================================================================== > --- head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_traverse.c Thu Mar 10 08:56:18 2016 (r296609) > +++ head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_traverse.c Thu Mar 10 09:01:19 2016 (r296610) > @@ -20,7 +20,7 @@ > */ > /* > * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved. > - * Copyright (c) 2012, 2014 by Delphix. All rights reserved. > + * Copyright (c) 2012, 2016 by Delphix. All rights reserved. > * Copyright (c) 2015 Chunwei Chen. All rights reserved. > */ > > @@ -63,6 +63,7 @@ typedef struct traverse_data { > uint64_t td_hole_birth_enabled_txg; > blkptr_cb_t *td_func; > void *td_arg; > + boolean_t td_realloc_possible; > } traverse_data_t; > > static int traverse_dnode(traverse_data_t *td, const dnode_phys_t *dnp, > @@ -232,18 +233,30 @@ traverse_visitbp(traverse_data_t *td, co > > if (bp->blk_birth == 0) { > /* > - * Since this block has a birth time of 0 it must be a > - * hole created before the SPA_FEATURE_HOLE_BIRTH > - * feature was enabled. If SPA_FEATURE_HOLE_BIRTH > - * was enabled before the min_txg for this traveral we > - * know the hole must have been created before the > - * min_txg for this traveral, so we can skip it. If > - * SPA_FEATURE_HOLE_BIRTH was enabled after the min_txg > - * for this traveral we cannot tell if the hole was > - * created before or after the min_txg for this > - * traversal, so we cannot skip it. > + * Since this block has a birth time of 0 it must be one of > + * two things: a hole created before the > + * SPA_FEATURE_HOLE_BIRTH feature was enabled, or a hole > + * which has always been a hole in an object. > + * > + * If a file is written sparsely, then the unwritten parts of > + * the file were "always holes" -- that is, they have been > + * holes since this object was allocated. However, we (and > + * our callers) can not necessarily tell when an object was > + * allocated. Therefore, if it's possible that this object > + * was freed and then its object number reused, we need to > + * visit all the holes with birth==0. > + * > + * If it isn't possible that the object number was reused, > + * then if SPA_FEATURE_HOLE_BIRTH was enabled before we wrote > + * all the blocks we will visit as part of this traversal, > + * then this hole must have always existed, so we can skip > + * it. We visit blocks born after (exclusive) td_min_txg. > + * > + * Note that the meta-dnode cannot be reallocated. > */ > - if (td->td_hole_birth_enabled_txg < td->td_min_txg) > + if ((!td->td_realloc_possible || > + zb->zb_object == DMU_META_DNODE_OBJECT) && > + td->td_hole_birth_enabled_txg <= td->td_min_txg) > return (0); > } else if (bp->blk_birth <= td->td_min_txg) { > return (0); > @@ -338,6 +351,15 @@ traverse_visitbp(traverse_data_t *td, co > objset_phys_t *osp = buf->b_data; > prefetch_dnode_metadata(td, &osp->os_meta_dnode, zb->zb_objset, > DMU_META_DNODE_OBJECT); > + /* > + * See the block comment above for the goal of this variable. > + * If the maxblkid of the meta-dnode is 0, then we know that > + * we've never had more than DNODES_PER_BLOCK objects in the > + * dataset, which means we can't have reused any object ids. > + */ > + if (osp->os_meta_dnode.dn_maxblkid == 0) > + td->td_realloc_possible = B_FALSE; > + > if (arc_buf_size(buf) >= sizeof (objset_phys_t)) { > prefetch_dnode_metadata(td, &osp->os_groupused_dnode, > zb->zb_objset, DMU_GROUPUSED_OBJECT); > @@ -544,12 +566,13 @@ traverse_impl(spa_t *spa, dsl_dataset_t > td.td_pfd = &pd; > td.td_flags = flags; > td.td_paused = B_FALSE; > + td.td_realloc_possible = (txg_start == 0 ? B_FALSE : B_TRUE); > > if (spa_feature_is_active(spa, SPA_FEATURE_HOLE_BIRTH)) { > VERIFY(spa_feature_enabled_txg(spa, > SPA_FEATURE_HOLE_BIRTH, &td.td_hole_birth_enabled_txg)); > } else { > - td.td_hole_birth_enabled_txg = 0; > + td.td_hole_birth_enabled_txg = UINT64_MAX; > } > > pd.pd_flags = flags; >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?56E144F2.3000609>