Date: Thu, 9 Jul 2020 22:34:30 -0500 From: Josh Paetzel <jpaetzel@FreeBSD.org> To: Alan Somers <asomers@freebsd.org> Cc: FreeBSD Hackers <freebsd-hackers@freebsd.org> Subject: Re: Right-sizing the geli thread pool Message-ID: <49D059B5-9A35-4EB5-9811-AFB024DA0566@FreeBSD.org> In-Reply-To: <CAOtMX2g0UTT1wG%2B_rUNssVvaJH1LfG-UoEGvYhYGQZVn26dNFA@mail.gmail.com> References: <CAOtMX2g0UTT1wG%2B_rUNssVvaJH1LfG-UoEGvYhYGQZVn26dNFA@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
>=20 > On Jul 9, 2020, at 4:27 PM, Alan Somers <asomers@freebsd.org> wrote: >=20 > =EF=BB=BFCurrently, geli creates a separate thread pool for each provider,= and by > default each thread pool contains one thread per cpu. On a large server > with many encrypted disks, that can balloon into a very large number of > threads! I have a patch in progress that switches from per-provider threa= d > pools to a single thread pool for the entire module. Happily, I see read > IOPs increase by up to 60%. But to my surprise, write IOPs _decreases_ by= > up to 25%. dtrace suggests that the CPU usage is dominated by the > vmem_free call in biodone, as in the below stack. >=20 > kernel`lock_delay+0x32 > kernel`biodone+0x88 > kernel`g_io_deliver+0x214 > geom_eli.ko`g_eli_write_done+0xf6 > kernel`g_io_deliver+0x214 > kernel`md_kthread+0x275 > kernel`fork_exit+0x7e > kernel`0xffffffff8104784e >=20 > I only have one idea for how to improve things from here. The geli thread= > pool is still fed by a single global bio queue. That could cause cache > thrashing, if bios get moved between cores too often. I think a superior > design would be to use a separate bio queue for each geli thread, and use > work-stealing to balance them. However, >=20 > 1) That doesn't explain why this change benefits reads more than writes, a= nd > 2) work-stealing is hard to get right, and I can't find any examples in th= e > kernel. >=20 > Can anybody offer tips or code for implementing work stealing? Or any > other suggestions about why my write performance is suffering? I would > like to get this change committed, but not without resolving that issue. >=20 > -Alan > __ Alan, Several years ago I spent a bunch of time optimizing geli+ZFS performance. Nothing as ambitious as what you are doing though. I have some hand wavy theories about the write performance and how cache thr= ash would be more expensive for writes than reads. The default configuratio= n is essentially pathological for systems with large amounts of disks. But t= hat doesn=E2=80=99t really explain why your change drops performance. Howev= er I=E2=80=99ll send you over some dtrace stuff I have at work tomorrow. It=E2= =80=99s pretty sophisticated and should let you visualize the entire I/O pip= eline. (You=E2=80=99ll have to add the geli part) What I discovered is without a histogram based auto tuner it was not possibl= e to tune for optimal performance for dynamic workloads. As to your question about work stealing. I=E2=80=99ve got nothing there. Thanks, Josh Paetzel FreeBSD - The Power to Serve
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?49D059B5-9A35-4EB5-9811-AFB024DA0566>