From owner-svn-src-stable@freebsd.org Wed Jul 26 16:18:20 2017 Return-Path: Delivered-To: svn-src-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 72FDFD986F0; Wed, 26 Jul 2017 16:18:20 +0000 (UTC) (envelope-from mav@FreeBSD.org) Received: from repo.freebsd.org (repo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4BFD368AE4; Wed, 26 Jul 2017 16:18:20 +0000 (UTC) (envelope-from mav@FreeBSD.org) Received: from repo.freebsd.org ([127.0.1.37]) by repo.freebsd.org (8.15.2/8.15.2) with ESMTP id v6QGIJDU063688; Wed, 26 Jul 2017 16:18:19 GMT (envelope-from mav@FreeBSD.org) Received: (from mav@localhost) by repo.freebsd.org (8.15.2/8.15.2/Submit) id v6QGIJV8063686; Wed, 26 Jul 2017 16:18:19 GMT (envelope-from mav@FreeBSD.org) Message-Id: <201707261618.v6QGIJV8063686@repo.freebsd.org> X-Authentication-Warning: repo.freebsd.org: mav set sender to mav@FreeBSD.org using -f From: Alexander Motin Date: Wed, 26 Jul 2017 16:18:19 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-stable@freebsd.org, svn-src-stable-11@freebsd.org Subject: svn commit: r321532 - stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs X-SVN-Group: stable-11 X-SVN-Commit-Author: mav X-SVN-Commit-Paths: stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs X-SVN-Commit-Revision: 321532 X-SVN-Commit-Repository: base MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: svn-src-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SVN commit messages for all the -stable branches of the src tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 26 Jul 2017 16:18:20 -0000 Author: mav Date: Wed Jul 26 16:18:19 2017 New Revision: 321532 URL: https://svnweb.freebsd.org/changeset/base/321532 Log: MFC r317237: MFV 316870 7448 ZFS doesn't notice when disk vdevs have no write cache illumos/illumos-gate@295438ba3230419314faaa889a2616f561658bd5 https://github.com/illumos/illumos-gate/commit/295438ba3230419314faaa889a2616f56 1658bd5 https://www.illumos.org/issues/7448 I built a SmartOS image with all the NVMe commits including 7372 (support NVMe volatile write cache) and repeated my dd testing: > #!/bin/bash > for i in `seq 1 1000`; do > dd if=/dev/zero of=file00 bs=1M count=102400 oflag=sync & > dd if=/dev/zero of=file01 bs=1M count=102400 oflag=sync & > wait > rm file00 file01 > done > Previously each dd command took ~145 seconds to finish, now it takes ~400 seconds. Eventually I figured out it is 7372 that causes unnecessary nvme_bd_sync() executions which wasted CPU cycles. If a NVMe device doesn't support a write cache, the nvme_bd_sync function will return ENOTSUP to indicate this to upper layers. It seems this returned value is ignored by ZFS, and as such this bug is not really specific to NVMe. In vdev_disk_io_start() ZFS sends the flush to the disk driver (blkdev) with a callback to vdev_disk_ioctl_done(). As nvme filled in the bd_sync_cache function pointer, blkdev will not return ENOTSUP, as the nvme driver in general does support cache flush. Instead it will issue an asynchronous flush to nvme and immediately return 0, and hence ZFS will not se t vdev_nowritecache here. The nvme driver will at some point process the cache flush command, and if there is no write cache on the device it will return ENOTSUP, which will be delivered to the vdev_disk_ioctl_done() callback. This function will not check the error code and not set nowritecache. The right place to check the error code from the cache flush is in zio_vdev_io_assess(). This would catch both cases, synchronous and asynchronous cache flushes. This would also be independent of the implementation detail that some drivers can return ENOTSUP immediately. Reviewed by: Dan Fields Reviewed by: Alek Pinchuk Reviewed by: George Wilson Approved by: Dan McDonald Author: Hans Rosenfeld Modified: stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_disk.c stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c Directory Properties: stable/11/ (props changed) Modified: stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_disk.c ============================================================================== --- stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_disk.c Wed Jul 26 16:16:43 2017 (r321531) +++ stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_disk.c Wed Jul 26 16:18:19 2017 (r321532) @@ -21,7 +21,7 @@ /* * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved. * Copyright (c) 2012, 2015 by Delphix. All rights reserved. - * Copyright 2013 Nexenta Systems, Inc. All rights reserved. + * Copyright 2016 Nexenta Systems, Inc. All rights reserved. * Copyright (c) 2013 Joyent, Inc. All rights reserved. */ @@ -743,16 +743,6 @@ vdev_disk_io_start(zio_t *zio) return; } - if (error == ENOTSUP || error == ENOTTY) { - /* - * If we get ENOTSUP or ENOTTY, we know that - * no future attempts will ever succeed. - * In this case we set a persistent bit so - * that we don't bother with the ioctl in the - * future. - */ - vd->vdev_nowritecache = B_TRUE; - } zio->io_error = error; break; Modified: stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c ============================================================================== --- stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c Wed Jul 26 16:16:43 2017 (r321531) +++ stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c Wed Jul 26 16:18:19 2017 (r321532) @@ -3302,6 +3302,16 @@ zio_vdev_io_assess(zio_t *zio) vd->vdev_cant_write = B_TRUE; } + /* + * If a cache flush returns ENOTSUP or ENOTTY, we know that no future + * attempts will ever succeed. In this case we set a persistent bit so + * that we don't bother with it in the future. + */ + if ((zio->io_error == ENOTSUP || zio->io_error == ENOTTY) && + zio->io_type == ZIO_TYPE_IOCTL && + zio->io_cmd == DKIOCFLUSHWRITECACHE && vd != NULL) + vd->vdev_nowritecache = B_TRUE; + if (zio->io_error) zio->io_pipeline = ZIO_INTERLOCK_PIPELINE;