From owner-svn-src-all@freebsd.org Thu Dec 8 15:58:05 2016 Return-Path: Delivered-To: svn-src-all@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4EB2DC6DCF0; Thu, 8 Dec 2016 15:58:05 +0000 (UTC) (envelope-from mav@FreeBSD.org) Received: from repo.freebsd.org (repo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 045AC1D6F; Thu, 8 Dec 2016 15:58:04 +0000 (UTC) (envelope-from mav@FreeBSD.org) Received: from repo.freebsd.org ([127.0.1.37]) by repo.freebsd.org (8.15.2/8.15.2) with ESMTP id uB8Fw4Mc027509; Thu, 8 Dec 2016 15:58:04 GMT (envelope-from mav@FreeBSD.org) Received: (from mav@localhost) by repo.freebsd.org (8.15.2/8.15.2/Submit) id uB8Fw4xA027508; Thu, 8 Dec 2016 15:58:04 GMT (envelope-from mav@FreeBSD.org) Message-Id: <201612081558.uB8Fw4xA027508@repo.freebsd.org> X-Authentication-Warning: repo.freebsd.org: mav set sender to mav@FreeBSD.org using -f From: Alexander Motin Date: Thu, 8 Dec 2016 15:58:04 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: svn commit: r309714 - head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs X-SVN-Group: head MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 08 Dec 2016 15:58:05 -0000 Author: mav Date: Thu Dec 8 15:58:03 2016 New Revision: 309714 URL: https://svnweb.freebsd.org/changeset/base/309714 Log: Fix spa_alloc_tree sorting by offset in r305331. Original commit "7090 zfs should improve allocation order" declares alloc queue sorted by time and offset. But in practice io_offset is always zero, so sorting happened only by time, while order of writes with equal time was completely random. On Illumos this did not affected much thanks to using high resolution timestamps. On FreeBSD due to using much faster but low resolution timestamps it caused bad data placement on disks, affecting further read performance. This change switches zio_timestamp_compare() from comparing uninitialized io_offset to really populated io_bookmark values. I haven't decided yet what to do with timestampts, but on simple tests this change gives the same peformance results by just making code to work as declared. MFC after: 1 week Modified: head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c Modified: head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c ============================================================================== --- head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c Thu Dec 8 12:42:36 2016 (r309713) +++ head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c Thu Dec 8 15:58:03 2016 (r309714) @@ -563,9 +563,24 @@ zio_timestamp_compare(const void *x1, co if (z1->io_queued_timestamp > z2->io_queued_timestamp) return (1); - if (z1->io_offset < z2->io_offset) + if (z1->io_bookmark.zb_objset < z2->io_bookmark.zb_objset) return (-1); - if (z1->io_offset > z2->io_offset) + if (z1->io_bookmark.zb_objset > z2->io_bookmark.zb_objset) + return (1); + + if (z1->io_bookmark.zb_object < z2->io_bookmark.zb_object) + return (-1); + if (z1->io_bookmark.zb_object > z2->io_bookmark.zb_object) + return (1); + + if (z1->io_bookmark.zb_level < z2->io_bookmark.zb_level) + return (-1); + if (z1->io_bookmark.zb_level > z2->io_bookmark.zb_level) + return (1); + + if (z1->io_bookmark.zb_blkid < z2->io_bookmark.zb_blkid) + return (-1); + if (z1->io_bookmark.zb_blkid > z2->io_bookmark.zb_blkid) return (1); if (z1 < z2)