Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 9 Jul 2020 22:34:30 -0500
From:      Josh Paetzel <jpaetzel@FreeBSD.org>
To:        Alan Somers <asomers@freebsd.org>
Cc:        FreeBSD Hackers <freebsd-hackers@freebsd.org>
Subject:   Re: Right-sizing the geli thread pool
Message-ID:  <49D059B5-9A35-4EB5-9811-AFB024DA0566@FreeBSD.org>
In-Reply-To: <CAOtMX2g0UTT1wG%2B_rUNssVvaJH1LfG-UoEGvYhYGQZVn26dNFA@mail.gmail.com>
References:  <CAOtMX2g0UTT1wG%2B_rUNssVvaJH1LfG-UoEGvYhYGQZVn26dNFA@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help


>=20
> On Jul 9, 2020, at 4:27 PM, Alan Somers <asomers@freebsd.org> wrote:
>=20
> =EF=BB=BFCurrently, geli creates a separate thread pool for each provider,=
 and by
> default each thread pool contains one thread per cpu.  On a large server
> with many encrypted disks, that can balloon into a very large number of
> threads!  I have a patch in progress that switches from per-provider threa=
d
> pools to a single thread pool for the entire module.  Happily, I see read
> IOPs increase by up to 60%.  But to my surprise, write IOPs _decreases_ by=

> up to 25%.  dtrace suggests that the CPU usage is dominated by the
> vmem_free call in biodone, as in the below stack.
>=20
>              kernel`lock_delay+0x32
>              kernel`biodone+0x88
>              kernel`g_io_deliver+0x214
>              geom_eli.ko`g_eli_write_done+0xf6
>              kernel`g_io_deliver+0x214
>              kernel`md_kthread+0x275
>              kernel`fork_exit+0x7e
>              kernel`0xffffffff8104784e
>=20
> I only have one idea for how to improve things from here.  The geli thread=

> pool is still fed by a single global bio queue.  That could cause cache
> thrashing, if bios get moved between cores too often.  I think a superior
> design would be to use a separate bio queue for each geli thread, and use
> work-stealing to balance them.  However,
>=20
> 1) That doesn't explain why this change benefits reads more than writes, a=
nd
> 2) work-stealing is hard to get right, and I can't find any examples in th=
e
> kernel.
>=20
> Can anybody offer tips or code for implementing work stealing?  Or any
> other suggestions about why my write performance is suffering?  I would
> like to get this change committed, but not without resolving that issue.
>=20
> -Alan
> __

Alan,

Several years ago I spent a bunch of time optimizing geli+ZFS performance.

Nothing as ambitious as what you are doing though.

I have some hand wavy theories about the write performance and how cache thr=
ash would be more expensive for writes than reads.  The default configuratio=
n is essentially pathological for systems with large amounts of disks.  But t=
hat doesn=E2=80=99t really explain why your change drops performance.  Howev=
er I=E2=80=99ll send you over some dtrace stuff I have at work tomorrow. It=E2=
=80=99s pretty sophisticated and should let you visualize the entire I/O pip=
eline. (You=E2=80=99ll have to add the geli part)

What I discovered is without a histogram based auto tuner it was not possibl=
e to tune for optimal performance for dynamic workloads.

As to your question about work stealing. I=E2=80=99ve got nothing there.

Thanks,

Josh Paetzel
FreeBSD - The Power to Serve





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?49D059B5-9A35-4EB5-9811-AFB024DA0566>