Date: Mon, 14 Sep 2015 16:41:42 -0700 From: Sean Chittenden <seanc@groupon.com> To: Steven Hartland <killing@multiplay.co.uk> Cc: FreeBSD Filesystems <freebsd-fs@freebsd.org>, Matthew Ahrens <mahrens@delphix.com> Subject: Re: zfs_trim_enabled destroys zio_free() performance Message-ID: <CACfj5vKcTofNK6XdNAyGTx1NrFo=ptW3_U6c7XYnv7dDS3OJNA@mail.gmail.com> In-Reply-To: <55F57439.8060000@multiplay.co.uk> References: <CAJjvXiE2mRT4=kPMk3gwiT-3ykeAhaYBx6Tw6HgXhs2=XZWWFg@mail.gmail.com> <55F308B7.3020302@FreeBSD.org> <55F57439.8060000@multiplay.co.uk>
next in thread | previous in thread | raw e-mail | index | archive | help
Random =E2=80=8Bindustry note , we've had issues with trim-enabled hosts where =E2=80=8Bdeleting a moderate sized dataset (~1TB) would cause the box to tr= ip over the deadman timer =E2=80=8B. When the host comes back up it almost immediately panics again = because the trim commands will be reissued again causing the box to panic in a loop. Disabling TRIM breaks this cycle. At the very least, getting trim to obey a different timer would useful. -sc=E2=80=8B # panic: I/O to pool 'tank' appears to be hung on vdev guid > 1181753144268412659 at '/dev/da0p1'. > cpuid =3D 13 > KDB: stack backtrace: > #0 0xffffffff805df950 at kdb_backtrace+0x60 > #1 0xffffffff805a355d at panic+0x17d > #2 0xffffffff81034db3 at vdev_deadman+0x123 > #3 0xffffffff81034cc0 at vdev_deadman+0x30 > #4 0xffffffff81034cc0 at vdev_deadman+0x30 > #5 0xffffffff810298e5 at spa_deadman+0x85 > #6 0xffffffff805b8ca5 at softclock_call_cc+0x165 > #7 0xffffffff805b90b4 at softclock+0x94 > #8 0xffffffff805716cb at intr_event_execute_handlers+0xab > #9 0xffffffff80571b16 at ithread_loop+0x96 > #10 0xffffffff8056f19a at fork_exit+0x9a > #11 0xffffffff807a817e at fork_trampoline+0xe > Uptime: 59s On Sun, Sep 13, 2015 at 6:03 AM, Steven Hartland <killing@multiplay.co.uk> wrote: > > Do you remember if this was this causing a deadlock or something similar that's easy to provoke? > > Regards > Steve > > > On 11/09/2015 18:00, Alexander Motin wrote: >> >> Hi. >> >> The code in question was added by me at r253992. Commit message tells it >> was made to decouple locks. I don't remember much more details, but may >> be it can be redone somehow else. >> >> On 11.09.2015 19:07, Matthew Ahrens wrote: >>> >>> I discovered that when destroying a ZFS snapshot, we can end up using >>> several seconds of CPU via this stack trace: >>> >>> kernel`spinlock_exit+0x2d >>> kernel`taskqueue_enqueue+0x12c >>> zfs.ko`zio_issue_async+0x7c >>> zfs.ko`zio_execute+0x162 >>> zfs.ko`dsl_scan_free_block_cb+0x15f >>> zfs.ko`bpobj_iterate_impl+0x25d >>> zfs.ko`bpobj_iterate_impl+0x46e >>> zfs.ko`dsl_scan_sync+0x152 >>> zfs.ko`spa_sync+0x5c1 >>> zfs.ko`txg_sync_thread+0x3a6 >>> kernel`fork_exit+0x9a >>> kernel`0xffffffff80d0acbe >>> 6558 ms >>> >>> This is not good for performance since, in addition to the CPU cost, it >>> doesn't allow the sync thread to do anything else, and this is >>> observable as periods where we don't do any write i/o to disk for >>> several seconds. >>> >>> The problem is that when zfs_trim_enabled is set (which it is by >>> default), zio_free_sync() always sets ZIO_STAGE_ISSUE_ASYNC, causing th= e >>> free to be dispatched to a taskq. Since each task completes very >>> quickly, there is a large locking and context switching overhead -- we >>> would be better off just processing the free in the caller's context. >>> >>> I'm not sure exactly why we need to go async when trim is enabled, but >>> it seems like at least we should not bother going async if trim is not >>> actually being used (e.g. with an all-spinning-disk pool). It would >>> also be worth investigating not going async even when trim is useful >>> (e.g. on SSD-based pools). >>> >>> Here is the relevant code: >>> >>> zio_free_sync(): >>> if (zfs_trim_enabled) >>> stage |=3D ZIO_STAGE_ISSUE_ASYNC | ZIO_STAGE_VDEV_IO_START | >>> ZIO_STAGE_VDEV_IO_ASSESS; >>> /* >>> * GANG and DEDUP blocks can induce a read (for the gang block >>> header, >>> * or the DDT), so issue them asynchronously so that this thread is >>> * not tied up. >>> */ >>> else if (BP_IS_GANG(bp) || BP_GET_DEDUP(bp)) >>> stage |=3D ZIO_STAGE_ISSUE_ASYNC; >>> >>> --matt >> >> > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" -- Sean Chittenden
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CACfj5vKcTofNK6XdNAyGTx1NrFo=ptW3_U6c7XYnv7dDS3OJNA>