Date: Tue, 18 May 2021 16:00:14 -0600 From: Alan Somers <asomers@freebsd.org> To: Mark Johnston <markj@freebsd.org> Cc: FreeBSD Hackers <freebsd-hackers@freebsd.org> Subject: Re: The pagedaemon evicts ARC before scanning the inactive page list Message-ID: <CAOtMX2he1YBidG=zF=iUQw%2BOs7p=gWMk-sab00NVr0nNs=Cwog@mail.gmail.com> In-Reply-To: <YKQ1biSSGbluuy5f@nuc> References: <CAOtMX2gvkrYS0zYYYtjD%2BAaqv62MzFYFhWPHjLDGXA1=H7LfCg@mail.gmail.com> <YKQ1biSSGbluuy5f@nuc>
next in thread | previous in thread | raw e-mail | index | archive | help
--00000000000064ba8e05c2a1d6b9 Content-Type: text/plain; charset="UTF-8" On Tue, May 18, 2021 at 3:45 PM Mark Johnston <markj@freebsd.org> wrote: > On Tue, May 18, 2021 at 03:07:44PM -0600, Alan Somers wrote: > > I'm using ZFS on servers with tons of RAM and running FreeBSD > > 12.2-RELEASE. Sometimes they get into a pathological situation where > most > > of that RAM sits unused. For example, right now one of them has: > > > > 2 GB Active > > 529 GB Inactive > > 16 GB Free > > 99 GB ARC total > > 469 GB ARC max > > 86 GB ARC target > > > > When a server gets into this situation, it stays there for days, with the > > ARC target barely budging. All that inactive memory never gets reclaimed > > and put to a good use. Frequently the server never recovers until a > reboot. > > > > I have a theory for what's going on. Ever since r334508^ the pagedaemon > > sends the vm_lowmem event _before_ it scans the inactive page list. If > the > > ARC frees enough memory, then vm_pageout_scan_inactive won't need to free > > any. Is that order really correct? For reference, here's the relevant > > code, from vm_pageout_worker: > > That was the case even before r334508. Note that prior to that revision > vm_pageout_scan_inactive() would trigger vm_lowmem if pass > 0, before > scanning the inactive queue. During a memory shortage we have pass > 0. > pass == 0 only when the page daemon is scanning the active queue. > > > shortage = pidctrl_daemon(&vmd->vmd_pid, vmd->vmd_free_count); > > if (shortage > 0) { > > ofree = vmd->vmd_free_count; > > if (vm_pageout_lowmem() && vmd->vmd_free_count > ofree) > > shortage -= min(vmd->vmd_free_count - ofree, > > (u_int)shortage); > > target_met = vm_pageout_scan_inactive(vmd, shortage, > > &addl_shortage); > > } else > > addl_shortage = 0 > > > > Raising vfs.zfs.arc_min seems to workaround the problem. But ideally > that > > wouldn't be necessary. > > vm_lowmem is too primitive: it doesn't tell subscribing subsystems > anything about the magnitude of the shortage. At the same time, the VM > doesn't know much about how much memory they are consuming. A better > strategy, at least for the ARC, would be reclaim memory based on the > relative memory consumption of each subsystem. In your case, when the > page daemon goes to reclaim memory, it should use the inactive queue to > make up ~85% of the shortfall and reclaim the rest from the ARC. Even > better would be if the ARC could use the page cache as a second-level > cache, like the buffer cache does. > > Today I believe the ARC treats vm_lowmem as a signal to shed some > arbitrary fraction of evictable data. If the ARC is able to quickly > answer the question, "how much memory can I release if asked?", then > the page daemon could use that to determine how much of its reclamation > target should come from the ARC vs. the page cache. > I guess I don't understand why you would ever free from the ARC rather than from the inactive list. When is inactive memory ever useful? --00000000000064ba8e05c2a1d6b9 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable <div dir=3D"ltr"><div class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail= _attr">On Tue, May 18, 2021 at 3:45 PM Mark Johnston <<a href=3D"mailto:= markj@freebsd.org">markj@freebsd.org</a>> wrote:<br></div><blockquote cl= ass=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid= rgb(204,204,204);padding-left:1ex">On Tue, May 18, 2021 at 03:07:44PM -060= 0, Alan Somers wrote:<br> > I'm using ZFS on servers with tons of RAM and running FreeBSD<br> > 12.2-RELEASE.=C2=A0 Sometimes they get into a pathological situation w= here most<br> > of that RAM sits unused.=C2=A0 For example, right now one of them has:= <br> > <br> > 2 GB=C2=A0 =C2=A0Active<br> > 529 GB Inactive<br> > 16 GB=C2=A0 Free<br> > 99 GB=C2=A0 ARC total<br> > 469 GB ARC max<br> > 86 GB=C2=A0 ARC target<br> > <br> > When a server gets into this situation, it stays there for days, with = the<br> > ARC target barely budging.=C2=A0 All that inactive memory never gets r= eclaimed<br> > and put to a good use.=C2=A0 Frequently the server never recovers unti= l a reboot.<br> > <br> > I have a theory for what's going on.=C2=A0 Ever since r334508^ the= pagedaemon<br> > sends the vm_lowmem event _before_ it scans the inactive page list.=C2= =A0 If the<br> > ARC frees enough memory, then vm_pageout_scan_inactive won't need = to free<br> > any.=C2=A0 Is that order really correct?=C2=A0 For reference, here'= ;s the relevant<br> > code, from vm_pageout_worker:<br> <br> That was the case even before r334508.=C2=A0 Note that prior to that revisi= on<br> vm_pageout_scan_inactive() would trigger vm_lowmem if pass > 0, before<b= r> scanning the inactive queue.=C2=A0 During a memory shortage we have pass &g= t; 0.<br> pass =3D=3D 0 only when the page daemon is scanning the active queue.<br> <br> > shortage =3D pidctrl_daemon(&vmd->vmd_pid, vmd->vmd_free_cou= nt);<br> > if (shortage > 0) {<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0ofree =3D vmd->vmd_free_count;<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (vm_pageout_lowmem() && vm= d->vmd_free_count > ofree)<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0shortage = -=3D min(vmd->vmd_free_count - ofree,<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0(u_int)shortage);<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0target_met =3D vm_pageout_scan_inacti= ve(vmd, shortage,<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0&addl_shortage);<br= > > } else<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0addl_shortage =3D 0<br> > <br> > Raising vfs.zfs.arc_min seems to workaround the problem.=C2=A0 But ide= ally that<br> > wouldn't be necessary.<br> <br> vm_lowmem is too primitive: it doesn't tell subscribing subsystems<br> anything about the magnitude of the shortage.=C2=A0 At the same time, the V= M<br> doesn't know much about how much memory they are consuming.=C2=A0 A bet= ter<br> strategy, at least for the ARC, would be reclaim memory based on the<br> relative memory consumption of each subsystem.=C2=A0 In your case, when the= <br> page daemon goes to reclaim memory, it should use the inactive queue to<br> make up ~85% of the shortfall and reclaim the rest from the ARC.=C2=A0 Even= <br> better would be if the ARC could use the page cache as a second-level<br> cache, like the buffer cache does.<br> <br> Today I believe the ARC treats vm_lowmem as a signal to shed some<br> arbitrary fraction of evictable data.=C2=A0 If the ARC is able to quickly<b= r> answer the question, "how much memory can I release if asked?", t= hen<br> the page daemon could use that to determine how much of its reclamation<br> target should come from the ARC vs. the page cache.<br></blockquote><div><b= r></div><div>I guess I don't understand why you would ever free from th= e ARC rather than from the inactive list.=C2=A0 When is inactive memory eve= r useful?<br></div></div></div> --00000000000064ba8e05c2a1d6b9--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAOtMX2he1YBidG=zF=iUQw%2BOs7p=gWMk-sab00NVr0nNs=Cwog>