From owner-svn-src-all@freebsd.org Mon Sep 5 20:48:30 2016 Return-Path: Delivered-To: svn-src-all@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 86095B96ADA; Mon, 5 Sep 2016 20:48:30 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citapm.icyb.net.ua (citapm.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 1A9B37B5; Mon, 5 Sep 2016 20:48:28 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citapm.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id XAA21361; Mon, 05 Sep 2016 23:48:27 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1bh0od-000KTn-0x; Mon, 05 Sep 2016 23:48:27 +0300 Subject: Re: svn commit: r305331 - in head/sys/cddl/contrib/opensolaris/uts/common: fs/zfs fs/zfs/sys sys/fs To: Alexander Motin , src-committers@FreeBSD.org, svn-src-all@FreeBSD.org, svn-src-head@FreeBSD.org References: <201609031004.u83A4bI0097105@repo.freebsd.org> From: Andriy Gapon Message-ID: Date: Mon, 5 Sep 2016 23:47:30 +0300 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 MIME-Version: 1.0 In-Reply-To: <201609031004.u83A4bI0097105@repo.freebsd.org> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 05 Sep 2016 20:48:30 -0000 On 03/09/2016 13:04, Alexander Motin wrote: > Author: mav > Date: Sat Sep 3 10:04:37 2016 > New Revision: 305331 > URL: https://svnweb.freebsd.org/changeset/base/305331 > > Log: > MFV r304155: 7090 zfs should improve allocation order and throttle allocations > > illumos/illumos-gate@0f7643c7376dd69a08acbfc9d1d7d548b10c846a > https://github.com/illumos/illumos-gate/commit/0f7643c7376dd69a08acbfc9d1d7d548b > 10c846a > > https://www.illumos.org/issues/7090 > When write I/Os are issued, they are issued in block order but the ZIO pipelin > e > will drive them asynchronously through the allocation stage which can result i > n > blocks being allocated out-of-order. It would be nice to preserve as much of > the logical order as possible. > In addition, the allocations are equally scattered across all top-level VDEVs > but not all top-level VDEVs are created equally. The pipeline should be able t > o > detect devices that are more capable of handling allocations and should > allocate more blocks to those devices. This allows for dynamic allocation > distribution when devices are imbalanced as fuller devices will tend to be > slower than empty devices. > The change includes a new pool-wide allocation queue which would throttle and > order allocations in the ZIO pipeline. The queue would be ordered by issued > time and offset and would provide an initial amount of allocation of work to > each top-level vdev. The allocation logic utilizes a reservation system to > reserve allocations that will be performed by the allocator. Once an allocatio > n > is successfully completed it's scheduled on a given top-level vdev. Each top- > level vdev maintains a maximum number of allocations that it can handle > (mg_alloc_queue_depth). The pool-wide reserved allocations (top-levels * > mg_alloc_queue_depth) are distributed across the top-level vdevs metaslab > groups and round robin across all eligible metaslab groups to distribute the > work. As top-levels complete their work, they receive additional work from the > pool-wide allocation queue until the allocation queue is emptied. > > Reviewed by: Adam Leventhal > Reviewed by: Alex Reece > Reviewed by: Christopher Siden > Reviewed by: Dan Kimmel > Reviewed by: Matthew Ahrens > Reviewed by: Paul Dagnelie > Reviewed by: Prakash Surya > Reviewed by: Sebastien Roy > Approved by: Robert Mustacchi > Author: George Wilson > > Modified: > head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/metaslab.c > head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/refcount.c > head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c > head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_misc.c > head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/metaslab.h > head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/metaslab_impl.h > head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/refcount.h > head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/spa_impl.h > head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/vdev_impl.h > head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio.h > head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio_impl.h > head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c > head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_cache.c > head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_mirror.c > head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_queue.c > head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c > head/sys/cddl/contrib/opensolaris/uts/common/sys/fs/zfs.h > Directory Properties: > head/sys/cddl/contrib/opensolaris/ (props changed) ... > @@ -3720,6 +3719,7 @@ spa_create(const char *pool, nvlist_t *n > spa->spa_uberblock.ub_txg = txg - 1; > spa->spa_uberblock.ub_version = version; > spa->spa_ubsync = spa->spa_uberblock; > + spa->spa_load_state = SPA_LOAD_CREATE; > > /* > * Create "The Godfather" zio to hold all async IOs > @@ -3905,6 +3905,7 @@ spa_create(const char *pool, nvlist_t *n > */ > spa_evicting_os_wait(spa); > spa->spa_minref = refcount_count(&spa->spa_refcount); > + spa->spa_load_state = SPA_LOAD_NONE; > > mutex_exit(&spa_namespace_lock); > > @@ -5615,7 +5616,7 @@ spa_nvlist_lookup_by_guid(nvlist_t **nvp ... Alexander, I belive that this commit accidentally breaks the following scenario: zpool create tank /dev/xxx zpool destroy tank zpool create tank /dev/xxx It seems that vdev_geom code is unaware of SPA_LOAD_CREATE state and it would try to match a device GUID, if it can be read, in addition to a name. -- Andriy Gapon