From owner-svn-src-all@freebsd.org Wed Sep 27 15:29:18 2017 Return-Path: Delivered-To: svn-src-all@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 79F69E072BC; Wed, 27 Sep 2017 15:29:18 +0000 (UTC) (envelope-from asomers@FreeBSD.org) Received: from repo.freebsd.org (repo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 55D827779D; Wed, 27 Sep 2017 15:29:18 +0000 (UTC) (envelope-from asomers@FreeBSD.org) Received: from repo.freebsd.org ([127.0.1.37]) by repo.freebsd.org (8.15.2/8.15.2) with ESMTP id v8RFTH50010819; Wed, 27 Sep 2017 15:29:17 GMT (envelope-from asomers@FreeBSD.org) Received: (from asomers@localhost) by repo.freebsd.org (8.15.2/8.15.2/Submit) id v8RFTHEr010818; Wed, 27 Sep 2017 15:29:17 GMT (envelope-from asomers@FreeBSD.org) Message-Id: <201709271529.v8RFTHEr010818@repo.freebsd.org> X-Authentication-Warning: repo.freebsd.org: asomers set sender to asomers@FreeBSD.org using -f From: Alan Somers Date: Wed, 27 Sep 2017 15:29:17 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-stable@freebsd.org, svn-src-stable-11@freebsd.org Subject: svn commit: r324063 - stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs X-SVN-Group: stable-11 X-SVN-Commit-Author: asomers X-SVN-Commit-Paths: stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs X-SVN-Commit-Revision: 324063 X-SVN-Commit-Repository: base MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Sep 2017 15:29:18 -0000 Author: asomers Date: Wed Sep 27 15:29:17 2017 New Revision: 324063 URL: https://svnweb.freebsd.org/changeset/base/324063 Log: MFC r323813: MFV r323789: 8473 scrub does not detect errors on active spares illumos/illumos-gate@554675eee75dd2d7398d960aa5c81083ceb8505a https://github.com/illumos/illumos-gate/commit/554675eee75dd2d7398d960aa5c81083ceb8505a https://www.illumos.org/issues/8473 Scrubbing is supposed to detect and repair all errors in the pool. However, it wrongly ignores active spare devices. The problem can easily be reproduced in OpenZFS at git rev 0ef125d with these commands: truncate -s 64m /tmp/a /tmp/b /tmp/c sudo zpool create testpool mirror /tmp/a /tmp/b spare /tmp/c sudo zpool replace testpool /tmp/a /tmp/c /bin/dd if=/dev/zero bs=1024k count=63 oseek=1 conv=notrunc of=/tmp/c sync sudo zpool scrub testpool zpool status testpool # Will show 0 errors, which is wrong sudo zpool offline testpool /tmp/a sudo zpool scrub testpool zpool status testpool # Will show errors on /tmp/c, # which should've already been fixed FreeBSD head is partially affected: the first scrub will detect some errors, but the second scrub will detect more. Reviewed by: Andy Stormont Reviewed by: Matt Ahrens Reviewed by: George Wilson Approved by: Richard Lowe Sponsored by: Spectra Logic Corp Modified: stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_mirror.c Directory Properties: stable/11/ (props changed) Modified: stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_mirror.c ============================================================================== --- stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_mirror.c Wed Sep 27 15:07:41 2017 (r324062) +++ stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_mirror.c Wed Sep 27 15:29:17 2017 (r324063) @@ -29,6 +29,9 @@ #include #include +#include +#include +#include #include #include #include @@ -52,7 +55,7 @@ typedef struct mirror_map { int *mm_preferred; int mm_preferred_cnt; int mm_children; - boolean_t mm_replacing; + boolean_t mm_resilvering; boolean_t mm_root; mirror_child_t mm_child[]; } mirror_map_t; @@ -119,13 +122,13 @@ vdev_mirror_map_size(int children) } static inline mirror_map_t * -vdev_mirror_map_alloc(int children, boolean_t replacing, boolean_t root) +vdev_mirror_map_alloc(int children, boolean_t resilvering, boolean_t root) { mirror_map_t *mm; mm = kmem_zalloc(vdev_mirror_map_size(children), KM_SLEEP); mm->mm_children = children; - mm->mm_replacing = replacing; + mm->mm_resilvering = resilvering; mm->mm_root = root; mm->mm_preferred = (int *)((uintptr_t)mm + offsetof(mirror_map_t, mm_child[children])); @@ -217,9 +220,39 @@ vdev_mirror_map_init(zio_t *zio) mc->mc_offset = DVA_GET_OFFSET(&dva[c]); } } else { - mm = vdev_mirror_map_alloc(vd->vdev_children, - (vd->vdev_ops == &vdev_replacing_ops || - vd->vdev_ops == &vdev_spare_ops), B_FALSE); + /* + * If we are resilvering, then we should handle scrub reads + * differently; we shouldn't issue them to the resilvering + * device because it might not have those blocks. + * + * We are resilvering iff: + * 1) We are a replacing vdev (ie our name is "replacing-1" or + * "spare-1" or something like that), and + * 2) The pool is currently being resilvered. + * + * We cannot simply check vd->vdev_resilver_txg, because it's + * not set in this path. + * + * Nor can we just check our vdev_ops; there are cases (such as + * when a user types "zpool replace pool odev spare_dev" and + * spare_dev is in the spare list, or when a spare device is + * automatically used to replace a DEGRADED device) when + * resilvering is complete but both the original vdev and the + * spare vdev remain in the pool. That behavior is intentional. + * It helps implement the policy that a spare should be + * automatically removed from the pool after the user replaces + * the device that originally failed. + * + * If a spa load is in progress, then spa_dsl_pool may be + * uninitialized. But we shouldn't be resilvering during a spa + * load anyway. + */ + boolean_t replacing = (vd->vdev_ops == &vdev_replacing_ops || + vd->vdev_ops == &vdev_spare_ops) && + spa_load_state(vd->vdev_spa) == SPA_LOAD_NONE && + dsl_scan_resilvering(vd->vdev_spa->spa_dsl_pool); + mm = vdev_mirror_map_alloc(vd->vdev_children, replacing, + B_FALSE); for (c = 0; c < mm->mm_children; c++) { mc = &mm->mm_child[c]; mc->mc_vd = vd->vdev_child[c]; @@ -448,7 +481,7 @@ vdev_mirror_io_start(zio_t *zio) mm = vdev_mirror_map_init(zio); if (zio->io_type == ZIO_TYPE_READ) { - if ((zio->io_flags & ZIO_FLAG_SCRUB) && !mm->mm_replacing && + if ((zio->io_flags & ZIO_FLAG_SCRUB) && !mm->mm_resilvering && mm->mm_children > 1) { /* * For scrubbing reads we need to allocate a read @@ -589,7 +622,7 @@ vdev_mirror_io_done(zio_t *zio) if (good_copies && spa_writeable(zio->io_spa) && (unexpected_errors || (zio->io_flags & ZIO_FLAG_RESILVER) || - ((zio->io_flags & ZIO_FLAG_SCRUB) && mm->mm_replacing))) { + ((zio->io_flags & ZIO_FLAG_SCRUB) && mm->mm_resilvering))) { /* * Use the good data we have in hand to repair damaged children. */