From owner-freebsd-fs@freebsd.org Fri Sep 11 16:07:49 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2F5D1A0205D for ; Fri, 11 Sep 2015 16:07:49 +0000 (UTC) (envelope-from matthew.ahrens@delphix.com) Received: from mail-io0-x236.google.com (mail-io0-x236.google.com [IPv6:2607:f8b0:4001:c06::236]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 00E07108E for ; Fri, 11 Sep 2015 16:07:48 +0000 (UTC) (envelope-from matthew.ahrens@delphix.com) Received: by ioii196 with SMTP id i196so103266506ioi.3 for ; Fri, 11 Sep 2015 09:07:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=delphix.com; s=google; h=mime-version:date:message-id:subject:from:to:content-type; bh=IBkdDRvovcOa4E2PUEKWfsuWAe9WD4HYT2QnLBDtDIM=; b=GpOi1AYDtW41lzwfq/x7/DtjtLfsRn5l4wXQqDLxR6ZI4xlfXiRo2Wwe0QCe/ekJSn ldi0deYfKw640eCsvqsRFwwa7S1TMY/7xfPHvS98XQY9bByDuHdjPzCn3+A5G7LL/GXY U1nJDUvgXElfDvK8o2rk9jKiFqJ6pXg0bLJd4= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:date:message-id:subject:from:to :content-type; bh=IBkdDRvovcOa4E2PUEKWfsuWAe9WD4HYT2QnLBDtDIM=; b=OvTg+zPuuT4a1fZ2himC7hnS/ECKIVovtNfVKkegy8l1LJ0/BwJ8+26b9WcWfSkysg 1REyuUmbbD3Gp08Xct7OZkdIlU+PTr1cPS+lLFxw0JmJPHIYE3985M3Wrk01+BMV1+za 6BlCg7OXE/qzwPL1StvUOh+vH4vr1Dbcz4eK+9Q8XxcFlPxuR11kTPxcK/GYV/0VdA1i ErCh2DZWlKzJ//bOK7W7iX+QUjQDcGTeGSL1zvnfsmjstQa+UFao/V5bRj5MMUKcBE6T IJDz5rsHQQrl5WkjJXpQb954Xe78/icj5lha0SZeLgmIsRAoDNpFWLbc3goW4iNG8pZf t7qg== X-Gm-Message-State: ALoCoQlQGS5y+IsTzH/AjQsYFRzbGDouE96ec5bu2NuS+aVSSsRhf4K3/Pty4TPHmMLO9+UJmah6 MIME-Version: 1.0 X-Received: by 10.107.3.94 with SMTP id 91mr4969645iod.178.1441987667922; Fri, 11 Sep 2015 09:07:47 -0700 (PDT) Received: by 10.36.85.197 with HTTP; Fri, 11 Sep 2015 09:07:47 -0700 (PDT) Date: Fri, 11 Sep 2015 09:07:47 -0700 Message-ID: Subject: zfs_trim_enabled destroys zio_free() performance From: Matthew Ahrens To: freebsd-fs , Alexander Motin Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 Sep 2015 16:07:49 -0000 I discovered that when destroying a ZFS snapshot, we can end up using several seconds of CPU via this stack trace: kernel`spinlock_exit+0x2d kernel`taskqueue_enqueue+0x12c zfs.ko`zio_issue_async+0x7c zfs.ko`zio_execute+0x162 zfs.ko`dsl_scan_free_block_cb+0x15f zfs.ko`bpobj_iterate_impl+0x25d zfs.ko`bpobj_iterate_impl+0x46e zfs.ko`dsl_scan_sync+0x152 zfs.ko`spa_sync+0x5c1 zfs.ko`txg_sync_thread+0x3a6 kernel`fork_exit+0x9a kernel`0xffffffff80d0acbe 6558 ms This is not good for performance since, in addition to the CPU cost, it doesn't allow the sync thread to do anything else, and this is observable as periods where we don't do any write i/o to disk for several seconds. The problem is that when zfs_trim_enabled is set (which it is by default), zio_free_sync() always sets ZIO_STAGE_ISSUE_ASYNC, causing the free to be dispatched to a taskq. Since each task completes very quickly, there is a large locking and context switching overhead -- we would be better off just processing the free in the caller's context. I'm not sure exactly why we need to go async when trim is enabled, but it seems like at least we should not bother going async if trim is not actually being used (e.g. with an all-spinning-disk pool). It would also be worth investigating not going async even when trim is useful (e.g. on SSD-based pools). Here is the relevant code: zio_free_sync(): if (zfs_trim_enabled) stage |= ZIO_STAGE_ISSUE_ASYNC | ZIO_STAGE_VDEV_IO_START | ZIO_STAGE_VDEV_IO_ASSESS; /* * GANG and DEDUP blocks can induce a read (for the gang block header, * or the DDT), so issue them asynchronously so that this thread is * not tied up. */ else if (BP_IS_GANG(bp) || BP_GET_DEDUP(bp)) stage |= ZIO_STAGE_ISSUE_ASYNC; --matt