From owner-freebsd-fs@freebsd.org Sun Sep 13 12:55:14 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 548EEA036E6 for ; Sun, 13 Sep 2015 12:55:14 +0000 (UTC) (envelope-from steven@multiplay.co.uk) Received: from mail-wi0-f178.google.com (mail-wi0-f178.google.com [209.85.212.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id E930F142F for ; Sun, 13 Sep 2015 12:55:13 +0000 (UTC) (envelope-from steven@multiplay.co.uk) Received: by wiclk2 with SMTP id lk2so110167633wic.0 for ; Sun, 13 Sep 2015 05:55:11 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:subject:to:references:message-id:date :user-agent:mime-version:in-reply-to:content-type :content-transfer-encoding; bh=BfRi7nbQcMqf8pd12rSgYg77rOFeAN7b7c9+2ytgK7Y=; b=HMLK4oDO+wrRpLxVEQbqz/NvoFvx/xcqeXB3BiVjnYf4Qr3gxE6unRzGZnbmJIQD7P i+0abeowrLYEAfUk4Ffcv9UFsezDnFMkWYTUeX6LF1uhWHhv/HzpKwkST4Uyf12C5QCr 919Tob+I/fs617JkX8r09fxxWvLGoIAxJbnMStmilu7/YeaxCJahae0DFMxPmXtdvE0c hgo9nQkMdzbMnQaf/r0/jHkC3WaPeuMR5xcesnXQaOV69E77O6a4E2DAFZK52tYebycu zU7Os2gikp2WpPFzoFzC955Jq+aW81DBvudbM+hiVML/5tQaD/IexhRSbouZPRHTl75J N2mQ== X-Gm-Message-State: ALoCoQnAOHntvavjugCwgpDgnHO43aG121r4g/bx+xWvATZAQOt0VT1RMgKKyKL67pCYXpJwWr3I X-Received: by 10.194.71.107 with SMTP id t11mr18065966wju.142.1442148911613; Sun, 13 Sep 2015 05:55:11 -0700 (PDT) Received: from [10.10.1.68] (82-69-141-170.dsl.in-addr.zen.co.uk. [82.69.141.170]) by smtp.gmail.com with ESMTPSA id t7sm1391504wib.1.2015.09.13.05.55.10 for (version=TLSv1/SSLv3 cipher=OTHER); Sun, 13 Sep 2015 05:55:10 -0700 (PDT) From: Steven Hartland X-Google-Original-From: Steven Hartland Subject: Re: zfs_trim_enabled destroys zio_free() performance To: freebsd-fs@freebsd.org References: Message-ID: <55F57228.4090500@freebsd.org> Date: Sun, 13 Sep 2015 13:55:04 +0100 User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 13 Sep 2015 12:55:14 -0000 On 11/09/2015 17:07, Matthew Ahrens wrote: > I discovered that when destroying a ZFS snapshot, we can end up using > several seconds of CPU via this stack trace: > > kernel`spinlock_exit+0x2d > kernel`taskqueue_enqueue+0x12c > zfs.ko`zio_issue_async+0x7c > zfs.ko`zio_execute+0x162 > zfs.ko`dsl_scan_free_block_cb+0x15f > zfs.ko`bpobj_iterate_impl+0x25d > zfs.ko`bpobj_iterate_impl+0x46e > zfs.ko`dsl_scan_sync+0x152 > zfs.ko`spa_sync+0x5c1 > zfs.ko`txg_sync_thread+0x3a6 > kernel`fork_exit+0x9a > kernel`0xffffffff80d0acbe > 6558 ms > > This is not good for performance since, in addition to the CPU cost, it > doesn't allow the sync thread to do anything else, and this is observable > as periods where we don't do any write i/o to disk for several seconds. > > The problem is that when zfs_trim_enabled is set (which it is by default), > zio_free_sync() always sets ZIO_STAGE_ISSUE_ASYNC, causing the free to be > dispatched to a taskq. Since each task completes very quickly, there is a > large locking and context switching overhead -- we would be better off just > processing the free in the caller's context. > > I'm not sure exactly why we need to go async when trim is enabled, but it > seems like at least we should not bother going async if trim is not > actually being used (e.g. with an all-spinning-disk pool). It would also > be worth investigating not going async even when trim is useful (e.g. on > SSD-based pools). > > Here is the relevant code: > > zio_free_sync(): > if (zfs_trim_enabled) > stage |= ZIO_STAGE_ISSUE_ASYNC | ZIO_STAGE_VDEV_IO_START | > ZIO_STAGE_VDEV_IO_ASSESS; > /* > * GANG and DEDUP blocks can induce a read (for the gang block > header, > * or the DDT), so issue them asynchronously so that this thread is > * not tied up. > */ > else if (BP_IS_GANG(bp) || BP_GET_DEDUP(bp)) > stage |= ZIO_STAGE_ISSUE_ASYNC; TRIM requests are queued, combined and only actioned after time in the TRIM thread as they are quite expensive which why I believe it was thought async was required, however given all this will do is trigger a call to trim_map_free for leaf vdev's which will be either: 1. A no-op if vdev_notrim is set (spinning rust) 2. An insert into the trim AVL The processing of the zio should always be quick I don't see why we couldn't execute it sync. I've set a test going on my head box removing ZIO_STAGE_ISSUE_ASYNC to see if I get any strange behaviour. Regards Steve