From owner-svn-src-head@freebsd.org Thu Mar 10 10:04:13 2016 Return-Path: Delivered-To: svn-src-head@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5BF61ACA7B1 for ; Thu, 10 Mar 2016 10:04:13 +0000 (UTC) (envelope-from mavbsd@gmail.com) Received: from mail-lb0-x22b.google.com (mail-lb0-x22b.google.com [IPv6:2a00:1450:4010:c04::22b]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id C58D42FE for ; Thu, 10 Mar 2016 10:04:12 +0000 (UTC) (envelope-from mavbsd@gmail.com) Received: by mail-lb0-x22b.google.com with SMTP id xr8so101196026lbb.1 for ; Thu, 10 Mar 2016 02:04:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:message-id:date:from:user-agent:mime-version:to:subject :references:in-reply-to:content-transfer-encoding; bh=fF33Z4RoXEM5aAfLndk/ZRTZWO5Geex8RuiH0gnKllg=; b=rpZjyJiJ3NZGRjX44gn6jAZbVG8XwWcw96kyht1nSQpc2zLGJXxOQttGffT7vmcg8X ljOHd07v4yhnXApJVWJ3mn5z+LA6v1vqhvTqGYhksGEUi11FJmji9hLHWZ8hzAw+rWCy jP+g5AzBNCgg91tTkr0fLzhBO6F1frR5QFmphzhF1FDSl7PpgFwqcg4iA+GNBcUzv4G5 nccDmyu8UMhYUjEbghogqd7rCs1kVyNFArX22pgM/UUa0mTFcij6eRfuit0WHrqIVSjF 7IwFb+3UMCmtxXY5UYZLZ3aLjSSCoXhjdQcUSe3y8U/3kUBUAEBfT83YRQy1QdYmmXAn hFTA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:sender:message-id:date:from:user-agent :mime-version:to:subject:references:in-reply-to :content-transfer-encoding; bh=fF33Z4RoXEM5aAfLndk/ZRTZWO5Geex8RuiH0gnKllg=; b=B+MkMhW0DZbuhp1ECUV65azhVANw8GlR/X0Quc1ky6TO5ukS+n7AsVA6pgNIb8Jp/+ Ae0xMgcLJiAVpVAChiDTBpoFl0VjzHGAOBDZ/S+V3iR5nXoG6F8C6u/dEmOfa6jE/47w jlj5+tTIZuULgNxYiDVNIzlyB0I19IwjiAiafWSW62PfqU6Bk+BpitYDlCpvMc2GI5+L u+9SnPMQ3RIti41P3CP10wlh3ovQzFHYotPIgxImT/JmKEISuWfd0T8iVEvDKf5FGMJq AAE3Rg78+QCaB05eI2vsH7VI4+dg3ebCZoJThTBHKALr91mhA730/O1TgU6+rh7Wuxwh Rlqw== X-Gm-Message-State: AD7BkJJTvBhP29i+oN2fr81sSHSHuVvnuC0USjuC+CVgxtJZiagSzhzE+k6ZQRF+3KYoaw== X-Received: by 10.112.16.133 with SMTP id g5mr703401lbd.134.1457604250861; Thu, 10 Mar 2016 02:04:10 -0800 (PST) Received: from mavbook.mavhome.dp.ua (mavhome.mavhome.dp.ua. [213.227.240.37]) by smtp.googlemail.com with ESMTPSA id pi5sm406911lbb.41.2016.03.10.02.04.09 (version=TLSv1/SSLv3 cipher=OTHER); Thu, 10 Mar 2016 02:04:10 -0800 (PST) Sender: Alexander Motin Message-ID: <56E14697.6080706@FreeBSD.org> Date: Thu, 10 Mar 2016 12:04:07 +0200 From: Alexander Motin User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:31.0) Gecko/20100101 Thunderbird/31.6.0 MIME-Version: 1.0 To: Steven Hartland , svn-src-head@freebsd.org Subject: Re: svn commit: r296610 - head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs References: <201603100901.u2A91JLq082451@repo.freebsd.org> <56E144F2.3000609@freebsd.org> In-Reply-To: <56E144F2.3000609@freebsd.org> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Mar 2016 10:04:13 -0000 On 10.03.16 11:57, Steven Hartland wrote: > When processing the MFC of this I noticed there's lots of merge info @ > sys/cddl/contrib/opensolaris. > > I've removed this from the MFC request but wondered if we should clean > up HEAD so this doesn't accidentally get merged? Merge info @ sys/cddl/contrib/opensolaris in head is valid one, tracking process of MFVs from respective vendor branch to head. Not that I use it often, but is it incorrect? > On 10/03/2016 09:01, Alexander Motin wrote: >> Author: mav >> Date: Thu Mar 10 09:01:19 2016 >> New Revision: 296610 >> URL: https://svnweb.freebsd.org/changeset/base/296610 >> >> Log: >> MFV r296609: 6370 ZFS send fails to transmit some holes >> Reviewed by: Matthew Ahrens >> Reviewed by: Chris Williamson >> Reviewed by: Stefan Ring >> Reviewed by: Steven Burgess >> Reviewed by: Arne Jansen >> Approved by: Robert Mustacchi >> Author: Paul Dagnelie >> In certain circumstances, "zfs send -i" (incremental send) can >> produce a >> stream which will result in incorrect sparse file contents on the >> target. >> The problem manifests as regions of the received file that >> should be >> sparse (and read a zero-filled) actually contain data from a file that >> was deleted (and which happened to share this file's object ID). >> Note: this can happen only with filesystems (not zvols, because >> they do >> not free (and thus can not reuse) object IDs). >> Note: This can happen only if, since the incremental source >> (FromSnap), >> a file was deleted and then another file was created, and the new file >> is sparse (i.e. has areas that were never written to and should be >> implicitly zero-filled). >> We suspect that this was introduced by 4370 (applies only if >> hole_birth >> feature is enabled), and made worse by 5243 (applies if hole_birth >> feature is disabled, and we never send any holes). >> The bug is caused by the hole birth feature. When an object is >> deleted >> and replaced, all the holes in the object have birth time zero. >> However, >> zfs send cannot tell that the holes are new since the file was >> replaced, >> so it doesn't send them in an incremental. As a result, you can end up >> with invalid data when you receive incremental send streams. As a >> short-term fix, we can always send holes with birth time 0 (unless >> it's >> a zvol or a dataset where we can guarantee that no objects have been >> reused). >> Closes #37 >> openzfs/openzfs@adef853162e83f7cdf6b2d9af9756d434a9c743b >> >> Modified: >> head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_object.c >> head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_traverse.c >> Directory Properties: >> head/sys/cddl/contrib/opensolaris/ (props changed) >> >> Modified: >> head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_object.c >> ============================================================================== >> >> --- >> head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_object.c >> Thu Mar 10 08:56:18 2016 (r296609) >> +++ >> head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_object.c >> Thu Mar 10 09:01:19 2016 (r296610) >> @@ -20,7 +20,7 @@ >> */ >> /* >> * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All >> rights reserved. >> - * Copyright (c) 2013, 2014 by Delphix. All rights reserved. >> + * Copyright (c) 2013, 2015 by Delphix. All rights reserved. >> * Copyright 2014 HybridCluster. All rights reserved. >> */ >> @@ -50,6 +50,12 @@ dmu_object_alloc(objset_t *os, dmu_objec >> * reasonably sparse (at most 1/4 full). Look from the >> * beginning once, but after that keep looking from here. >> * If we can't find one, just keep going from here. >> + * >> + * Note that dmu_traverse depends on the behavior that we use >> + * multiple blocks of the dnode object before going back to >> + * reuse objects. Any change to this algorithm should preserve >> + * that property or find another solution to the issues >> + * described in traverse_visitbp. >> */ >> if (P2PHASE(object, L2_dnode_count) == 0) { >> uint64_t offset = restarted ? object << DNODE_SHIFT : 0; >> >> Modified: >> head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_traverse.c >> ============================================================================== >> >> --- >> head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_traverse.c >> Thu Mar 10 08:56:18 2016 (r296609) >> +++ >> head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_traverse.c >> Thu Mar 10 09:01:19 2016 (r296610) >> @@ -20,7 +20,7 @@ >> */ >> /* >> * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All >> rights reserved. >> - * Copyright (c) 2012, 2014 by Delphix. All rights reserved. >> + * Copyright (c) 2012, 2016 by Delphix. All rights reserved. >> * Copyright (c) 2015 Chunwei Chen. All rights reserved. >> */ >> @@ -63,6 +63,7 @@ typedef struct traverse_data { >> uint64_t td_hole_birth_enabled_txg; >> blkptr_cb_t *td_func; >> void *td_arg; >> + boolean_t td_realloc_possible; >> } traverse_data_t; >> static int traverse_dnode(traverse_data_t *td, const dnode_phys_t >> *dnp, >> @@ -232,18 +233,30 @@ traverse_visitbp(traverse_data_t *td, co >> if (bp->blk_birth == 0) { >> /* >> - * Since this block has a birth time of 0 it must be a >> - * hole created before the SPA_FEATURE_HOLE_BIRTH >> - * feature was enabled. If SPA_FEATURE_HOLE_BIRTH >> - * was enabled before the min_txg for this traveral we >> - * know the hole must have been created before the >> - * min_txg for this traveral, so we can skip it. If >> - * SPA_FEATURE_HOLE_BIRTH was enabled after the min_txg >> - * for this traveral we cannot tell if the hole was >> - * created before or after the min_txg for this >> - * traversal, so we cannot skip it. >> + * Since this block has a birth time of 0 it must be one of >> + * two things: a hole created before the >> + * SPA_FEATURE_HOLE_BIRTH feature was enabled, or a hole >> + * which has always been a hole in an object. >> + * >> + * If a file is written sparsely, then the unwritten parts of >> + * the file were "always holes" -- that is, they have been >> + * holes since this object was allocated. However, we (and >> + * our callers) can not necessarily tell when an object was >> + * allocated. Therefore, if it's possible that this object >> + * was freed and then its object number reused, we need to >> + * visit all the holes with birth==0. >> + * >> + * If it isn't possible that the object number was reused, >> + * then if SPA_FEATURE_HOLE_BIRTH was enabled before we wrote >> + * all the blocks we will visit as part of this traversal, >> + * then this hole must have always existed, so we can skip >> + * it. We visit blocks born after (exclusive) td_min_txg. >> + * >> + * Note that the meta-dnode cannot be reallocated. >> */ >> - if (td->td_hole_birth_enabled_txg < td->td_min_txg) >> + if ((!td->td_realloc_possible || >> + zb->zb_object == DMU_META_DNODE_OBJECT) && >> + td->td_hole_birth_enabled_txg <= td->td_min_txg) >> return (0); >> } else if (bp->blk_birth <= td->td_min_txg) { >> return (0); >> @@ -338,6 +351,15 @@ traverse_visitbp(traverse_data_t *td, co >> objset_phys_t *osp = buf->b_data; >> prefetch_dnode_metadata(td, &osp->os_meta_dnode, zb->zb_objset, >> DMU_META_DNODE_OBJECT); >> + /* >> + * See the block comment above for the goal of this variable. >> + * If the maxblkid of the meta-dnode is 0, then we know that >> + * we've never had more than DNODES_PER_BLOCK objects in the >> + * dataset, which means we can't have reused any object ids. >> + */ >> + if (osp->os_meta_dnode.dn_maxblkid == 0) >> + td->td_realloc_possible = B_FALSE; >> + >> if (arc_buf_size(buf) >= sizeof (objset_phys_t)) { >> prefetch_dnode_metadata(td, &osp->os_groupused_dnode, >> zb->zb_objset, DMU_GROUPUSED_OBJECT); >> @@ -544,12 +566,13 @@ traverse_impl(spa_t *spa, dsl_dataset_t >> td.td_pfd = &pd; >> td.td_flags = flags; >> td.td_paused = B_FALSE; >> + td.td_realloc_possible = (txg_start == 0 ? B_FALSE : B_TRUE); >> if (spa_feature_is_active(spa, SPA_FEATURE_HOLE_BIRTH)) { >> VERIFY(spa_feature_enabled_txg(spa, >> SPA_FEATURE_HOLE_BIRTH, &td.td_hole_birth_enabled_txg)); >> } else { >> - td.td_hole_birth_enabled_txg = 0; >> + td.td_hole_birth_enabled_txg = UINT64_MAX; >> } >> pd.pd_flags = flags; >> > -- Alexander Motin