Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 04 Apr 2021 19:01:44 +0000
From:      "Poul-Henning Kamp" <phk@phk.freebsd.dk>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        Warner Losh <imp@bsdimp.com>, Mateusz Guzik <mjguzik@gmail.com>, FreeBSD CURRENT <freebsd-current@freebsd.org>
Subject:   Re: [SOLVED] Re: Strange behavior after running under high load
Message-ID:  <11447.1617562904@critter.freebsd.dk>
In-Reply-To: <YGn8%2BW/ipcysamdI@kib.kiev.ua>
References:  <58bea0f0-5c3d-4263-ebee-f939a7e169e9@freebsd.org> <494d4aab-487b-83c9-03f3-10cf470081c5@freebsd.org> <CAGudoHHDBxOWc_u6=c1v8x%2Bw-yfYEhv_-BALCj5t95HkobCZeA@mail.gmail.com> <81671.1617432659@critter.freebsd.dk> <CAGudoHFp4x3C7fzh-SM4DQ%2B7t3YuREuknUBd-VaO=%2Bs2th4J6A@mail.gmail.com> <CANCZdfrthB8QLbeF%2Bfux9i1H2_jF6LRppkYe1dhEt7URBo4qSw@mail.gmail.com> <YGn8%2BW/ipcysamdI@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
--------
Konstantin Belousov writes:

> But what would you provide as the input for PID controller, and what wou=
ld be the targets?

Viewing this purely as a vnode related issue is wrong, this is about memor=
y allocation in general.

We may or may not want a PID regulator, but putting it on counts of vnode =
would not improve things, precisely, as you point out, because the amount =
of memory a vnode ties up has enormous variance.


We should focus on the end goal: To ensure "sufficient" memory can always =
be allocated for any purpose "without major delay".


Architecturally there are three major problems:

A) While each subsystem generally have a good idea about memory that can b=
e released "without major delay", the information does not trickle up thro=
ugh a summarizing NUMA aware tree.

B) We lack a nuanced call-back to tell the subsystems to release some of t=
heir memory "without major delay".

C) We have never attempted to enlist userland, where jemalloc often hang o=
n to a lot of unused VM pages.


As far as vnodes go:


It used to be that "without major delay" meant "without disk-I/O" which ag=
ain led to the "dirty buffers/VM pages" heuristic.

With microsecond SSD backing store, that heuristic is not only invalid, it=
 is down-right harmful in many cases.

GEOM maintains estimates of per-provider latency and VM+VFS should use tha=
t to schedule write-back so that more of it happens outside rush-hour, in =
order to increase the amount of memory which can be released "without majo=
r delay".

Today that happens largely as a side effect of the periodic syncer, which =
does a really bad job at it, because it still expects VAX-era hardware per=
formance and workloads.


-- =

Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    =

Never attribute to malice what can adequately be explained by incompetence=
.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?11447.1617562904>