Date: Sun, 24 Apr 2022 04:06:06 +0000 From: Michael Jung <mikej@paymentallianceintl.com> To: Mark Millard <marklmi@yahoo.com>, Pete Wright <pete@nomadlogic.org> Cc: freebsd-current <freebsd-current@freebsd.org> Subject: RE: Chasing OOM Issues - good sysctl metrics to use? Message-ID: <e2322e858bfb4f1caadaafedfb3ec6a7@MAIL-HUB.pai.local> In-Reply-To: <DD98C932-A07F-4097-AE7F-D9CEF0BB6AEE@yahoo.com> References: <83A713B9-A973-4C97-ACD6-830DF6A50B76.ref@yahoo.com> <83A713B9-A973-4C97-ACD6-830DF6A50B76@yahoo.com> <a5b2e248-3298-80e3-4bb6-742c8431f064@nomadlogic.org> <94B2E2FD-2371-4FEA-8E01-F37103F63CC0@yahoo.com> <0fcb5a4a-5517-e57b-2b69-4f3b3b10589a@nomadlogic.org> <DD98C932-A07F-4097-AE7F-D9CEF0BB6AEE@yahoo.com>
next in thread | previous in thread | raw e-mail | index | archive | help
--_000_e2322e858bfb4f1caadaafedfb3ec6a7MAILHUBpailocal_ Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi: I guess I'm kind of high jacking this thread but I think what I'm going to ask is close enough to the topic at hand to ask instead of starting a new thread and referencing this one. I use sysutils/monit to what running processes and then restart them I as I want. I use protect(1) to make sure that monit would not dies. In /etc/rc.local "protect -i monit" protect seems in the end simply call PROC_SPROTECT with the INHERIT flag and as documented in procctl(2) So I followed a bit of code I guess that cools if I got it right but I know about .0001% about system internals. Can anyone speak to how protect(1) works and is it in itself is prone to wh= at has been discussed? For my use case is protect "good enough" or do I really need to tuning like= has been talked about? If protect is the right answer and someone could explain how it does Its thing at a slighter higher technical barrier I would love to hear more about why I'm either doing it wrong, that that what I'm doing it ok, o= r why I should really be doing something completely different and the why I should be doing it differently. I suspect there are many that would like to know this but would never ask, at least not on list. Always the seeker of new knowledge. Thanks in advance. --mikej CONFIDENTIALITY NOTE: This message is intended only for the use of the individual or entity to whom it is addressed and may contain information that is privileged, confidential, and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this transmission in error, please notify us by telephone at (502) 212-4000 or notify us at: PAI, Dept. 99, 2101 High Wickham Place, Suite 101, Louisville, KY 40245 From: owner-freebsd-current@freebsd.org [mailto:owner-freebsd-current@freeb= sd.org] On Behalf Of Mark Millard Sent: Saturday, April 23, 2022 3:32 PM To: Pete Wright <pete@nomadlogic.org> Cc: freebsd-current <freebsd-current@freebsd.org> Subject: Re: Chasing OOM Issues - good sysctl metrics to use? On 2022-Apr-23, at 10:26, Pete Wright <pete@nomadlogic.org<mailto:pete@noma= dlogic.org>> wrote: > On 4/22/22 18:46, Mark Millard wrote: >> On 2022-Apr-22, at 16:42, Pete Wright <pete@nomadlogic.org<mailto:pete@n= omadlogic.org>> wrote: >> >>> On 4/21/22 21:18, Mark Millard wrote: >>>> Messages in the console out would be appropriate >>>> to report. Messages might also be available via >>>> the following at appropriate times: >>> that is what is frustrating. i will get notification that the processes= are killed: >>> Apr 22 09:55:15 topanga kernel: pid 76242 (chrome), jid 0, uid 1001, wa= s killed: failed to reclaim memory >>> Apr 22 09:55:19 topanga kernel: pid 76288 (chrome), jid 0, uid 1001, wa= s killed: failed to reclaim memory >>> Apr 22 09:55:20 topanga kernel: pid 76259 (firefox), jid 0, uid 1001, w= as killed: failed to reclaim memory >>> Apr 22 09:55:22 topanga kernel: pid 76252 (firefox), jid 0, uid 1001, w= as killed: failed to reclaim memory >>> Apr 22 09:55:23 topanga kernel: pid 76267 (firefox), jid 0, uid 1001, w= as killed: failed to reclaim memory >>> Apr 22 09:55:24 topanga kernel: pid 76234 (chrome), jid 0, uid 1001, wa= s killed: failed to reclaim memory >>> Apr 22 09:55:26 topanga kernel: pid 76275 (firefox), jid 0, uid 1001, w= as killed: failed to reclaim memory >> Those messages are not reporting being out of swap >> as such. They are reporting sustained low free RAM >> despite a number of less drastic attempts to gain >> back free RAM (to above some threshold). >> >> FreeBSD does not swap out the kernel stacks for >> processes that stay in a runnable state: it just >> continues to page. Thus just one large process >> that has a huge working set of active pages can >> lead to OOM kills in a context were no other set >> of processes would be enough to gain the free >> RAM required. Such contexts are not really a >> swap issue. > > Thank you for this clarification/explanation - that totally makes sense! > >> >> Based on there being only 1 "killed:" reason, >> I have a suggestion that should allow delaying >> such kills for a long time. That in turn may >> help with investigating without actually >> suffering the kills during the activity: more >> time with low free RAM to observe. > > Great idea thank-you! and thanks for the example settings and description= s as well. >> But those are large but finite activities. If >> you want to leave something running for days, >> weeks, months, or whatever that produces the >> sustained low free RAM conditions, the problem >> will eventually happen. Ultimately one may have >> to exit and restart such processes once and a >> while, exiting enough of them to give a little >> time with sufficient free RAM. > perfect - since this is a workstation my run-time for these processes is = probably a week as i update my system and pkgs over the weekend, then dog f= ood current during the work week. > >>> yes i have a 2GB of swap that resides on a nvme device. >> I assume a partition style. Otherwise there are other >> issues involved --that likely should be avoided by >> switching to partition style. > > so i kinda lied - initially i had just a 2G swap, but i added a second 20= G swap a while ago to have enough space to capture some cores while testing= drm-kmod work. based on this comment i am going to only use the 20G file b= acked swap and see how that goes. > > this is my fstab entry currently for the file backed swap: > md99 none swap sw,file=3D/root/swap1,late 0 0 I think you may have taken my suggestion backwards . . . Unfortunately, vnode (file) based swap space should be *avoided* and partitions are what should be used in order to avoid deadlocks: On 2017-Feb-13, at 7:20 PM, Konstantin Belousov <kostikbel at gmail.com> wr= ote on the freebsd-arm list: QUOTE swapfile write requires the write request to come through the filesystem write path, which might require the filesystem to allocate more memory and read some data. E.g. it is known that any ZFS write request allocates memory, and that write request on large UFS file might require allocating and reading an indirect block buffer to find the block number of the written block, if the indirect block was not yet read. As result, swapfile swapping is more prone to the trivial and unavoidable deadlocks where the pagedaemon thread, which produces free memory, needs more free memory to make a progress. Swap write on the raw partition over simple partitioning scheme directly over HBA are usually safe, while e.g. zfs over geli over umass is the worst construction. END QUOTE The developers handbook has a section debugging deadlocks that he referenced in a response to another report (on freebsd-hackers). https://docs.freebsd.org/en/books/developers-handbook/kerneldebug/#kernelde= bug-deadlocks<https://docs.freebsd.org/en/books/developers-handbook/kerneld= ebug/#kerneldebug-deadlocks> >>>> ZFS (so with ARC)? UFS? Both? >>> i am using ZFS and am setting my vfs.zfs.arc.max to 10G. i have also ex= perienced this crash with that set to the default unlimited value as well. >> I use ZFS on systems with at least 8 GiBytes of RAM, >> but I've never tuned ZFS. So I'm not much help for >> that side of things. > > since we started this thread I've gone ahead and removed the zfs.arc.max = setting since its cruft at this point. i initially added it to test a confi= guration i deployed to a sever hosting a bunch of VMs. > >> I'm hoping that vm.pageout_oom_seq=3D120 (or more) makes it >> so you do not have to have identified everything up front >> and can explore easier. >> >> >> Note that vm.pageout_oom_seq is both a loader tunable >> and a writeable runtime tunable: >> >> # sysctl -T vm.pageout_oom_seq >> vm.pageout_oom_seq: 120 >> amd64_ZFS amd64 1400053 1400053 # sysctl -W vm.pageout_oom_seq >> vm.pageout_oom_seq: 120 >> >> So you can use it to extend the time when the >> machine is already running. > > fantastic. thanks again for taking your time and sharing your knowledge a= nd experience with me Mark! > > these types of journeys are why i run current on my daily driver, it real= ly helps me better understand the OS so that i can be a better admin on the= "real" servers i run for work. its also just fun to learn stuff too heh. > =3D=3D=3D Mark Millard marklmi at yahoo.com Disclaimer The information contained in this communication from the sender is confiden= tial. It is intended solely for use by the recipient and others authorized = to receive it. If you are not the recipient, you are hereby notified that a= ny disclosure, copying, distribution or taking action in relation of the co= ntents of this information is strictly prohibited and may be unlawful. This email has been scanned for viruses and malware, and may have been auto= matically archived by Mimecast, a leader in email security and cyber resili= ence. Mimecast integrates email defenses with brand protection, security aw= areness training, web security, compliance and other essential capabilities= . Mimecast helps protect large and small organizations from malicious activ= ity, human error and technology failure; and to lead the movement toward bu= ilding a more resilient world. To find out more, visit our website. --_000_e2322e858bfb4f1caadaafedfb3ec6a7MAILHUBpailocal_ Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable <html><head> <meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3Dus-ascii"= > <meta name=3D"Generator" content=3D"Microsoft Word 15 (filtered medium)"> <style><!-- /* Font Definitions */ @font-face =09{font-family:"Cambria Math"; =09panose-1:2 4 5 3 5 4 6 3 2 4;} @font-face =09{font-family:Calibri; =09panose-1:2 15 5 2 2 2 4 3 2 4;} /* Style Definitions */ p.MsoNormal, li.MsoNormal, div.MsoNormal =09{margin:0in; =09margin-bottom:.0001pt; =09font-size:12.0pt; =09font-family:"Times New Roman",serif;} a:link, span.MsoHyperlink =09{mso-style-priority:99; =09color:blue; =09text-decoration:underline;} a:visited, span.MsoHyperlinkFollowed =09{mso-style-priority:99; =09color:purple; =09text-decoration:underline;} pre =09{mso-style-priority:99; =09mso-style-link:"HTML Preformatted Char"; =09margin:0in; =09margin-bottom:.0001pt; =09font-size:10.0pt; =09font-family:"Courier New";} span.EmailStyle17 =09{mso-style-type:personal; =09font-family:"Calibri",sans-serif; =09color:#1F497D;} span.EmailStyle18 =09{mso-style-type:personal-compose; =09font-family:"Calibri",sans-serif; =09color:windowtext;} span.HTMLPreformattedChar =09{mso-style-name:"HTML Preformatted Char"; =09mso-style-priority:99; =09mso-style-link:"HTML Preformatted"; =09font-family:"Courier New";} .MsoChpDefault =09{mso-style-type:export-only; =09font-size:10.0pt;} @page WordSection1 =09{size:8.5in 11.0in; =09margin:1.0in 1.0in 1.0in 1.0in;} div.WordSection1 =09{page:WordSection1;} --></style><!--[if gte mso 9]><xml> <o:shapedefaults v:ext=3D"edit" spidmax=3D"1026" /> </xml><![endif]--><!--[if gte mso 9]><xml> <o:shapelayout v:ext=3D"edit"> <o:idmap v:ext=3D"edit" data=3D"1" /> </o:shapelayout></xml><![endif]--> <style type=3D"text/css">.style1 {font-family: "Times New Roman";}</style><= /head><body lang=3D"EN-US" link=3D"blue" vlink=3D"purple"> <div class=3D"WordSection1"> <p class=3D"MsoNormal"><span style=3D"font-size:10.0pt;font-family:"Co= urier New"">Hi:<o:p></o:p></span></p> <p class=3D"MsoNormal"><span style=3D"font-size:10.0pt;font-family:"Co= urier New""><o:p> </o:p></span></p> <p class=3D"MsoNormal"><span style=3D"font-size:10.0pt;font-family:"Co= urier New"">I guess I’m kind of high jacking this thread but I t= hink what I’m <o:p></o:p></span></p> <p class=3D"MsoNormal"><span style=3D"font-size:10.0pt;font-family:"Co= urier New"">going to ask is close enough to the topic at hand to ask i= nstead <o:p></o:p></span></p> <p class=3D"MsoNormal"><span style=3D"font-size:10.0pt;font-family:"Co= urier New"">of starting a new thread and referencing this one.<o:p></o= :p></span></p> <p class=3D"MsoNormal"><span style=3D"font-size:10.0pt;font-family:"Co= urier New""><o:p> </o:p></span></p> <p class=3D"MsoNormal"><span style=3D"font-size:10.0pt;font-family:"Co= urier New"">I use sysutils/monit to what running processes and then re= start them<o:p></o:p></span></p> <p class=3D"MsoNormal"><span style=3D"font-size:10.0pt;font-family:"Co= urier New"">I as I want. <o:p></o:p></span></p> <p class=3D"MsoNormal"><span style=3D"font-size:10.0pt;font-family:"Co= urier New""><o:p> </o:p></span></p> <p class=3D"MsoNormal"><span style=3D"font-size:10.0pt;font-family:"Co= urier New"">I use protect(1) to make sure that monit would not d= ies.<o:p></o:p></span></p> <p class=3D"MsoNormal"><span style=3D"font-size:10.0pt;font-family:"Co= urier New""><o:p> </o:p></span></p> <p class=3D"MsoNormal"><span style=3D"font-size:10.0pt;font-family:"Co= urier New"">In /etc/rc.local “protect –i monit”<o:p>= </o:p></span></p> <p class=3D"MsoNormal"><span style=3D"font-size:10.0pt;font-family:"Co= urier New""><o:p> </o:p></span></p> <p class=3D"MsoNormal"><span style=3D"font-size:10.0pt;font-family:"Co= urier New"">protect seems in the end simply call <o:p></o:p></span></p> <p class=3D"MsoNormal"><span style=3D"font-size:10.0pt;font-family:"Co= urier New""><o:p> </o:p></span></p> <p class=3D"MsoNormal"><span style=3D"font-size:10.0pt;font-family:"Co= urier New""> PROC_SPROTECT with the INHERIT fl= ag and as documented in procctl(2)<o:p></o:p></span></p> <p class=3D"MsoNormal"><span style=3D"font-size:10.0pt;font-family:"Co= urier New""><o:p> </o:p></span></p> <p class=3D"MsoNormal"><span style=3D"font-size:10.0pt;font-family:"Co= urier New"">So I followed a bit of code I guess that cools if I got it= right but I know <o:p></o:p></span></p> <p class=3D"MsoNormal"><span style=3D"font-size:10.0pt;font-family:"Co= urier New"">about .0001% about system internals.<o:p></o:p></span></p> <p class=3D"MsoNormal"><span style=3D"font-size:10.0pt;font-family:"Co= urier New""><o:p> </o:p></span></p> <p class=3D"MsoNormal"><span style=3D"font-size:10.0pt;font-family:"Co= urier New"">Can anyone speak to how protect(1) works and is it in itse= lf is prone to what has been discussed?<o:p></o:p></span></p> <p class=3D"MsoNormal"><span style=3D"font-size:10.0pt;font-family:"Co= urier New""><o:p> </o:p></span></p> <p class=3D"MsoNormal"><span style=3D"font-size:10.0pt;font-family:"Co= urier New"">For my use case is protect “good enough” or do= I really need to tuning like has been talked about?<o:p></o:p></span></p> <p class=3D"MsoNormal"><span style=3D"font-size:10.0pt;font-family:"Co= urier New""><o:p> </o:p></span></p> <p class=3D"MsoNormal"><span style=3D"font-size:10.0pt;font-family:"Co= urier New"">If protect is the right answer and someone could explain h= ow it does<o:p></o:p></span></p> <p class=3D"MsoNormal"><span style=3D"font-size:10.0pt;font-family:"Co= urier New"">Its thing at a slighter higher technical barrier I would l= ove to hear<o:p></o:p></span></p> <p class=3D"MsoNormal"><span style=3D"font-size:10.0pt;font-family:"Co= urier New"">more about why I’m either doing it wrong, that that = what I’m doing it ok, or<o:p></o:p></span></p> <p class=3D"MsoNormal"><span style=3D"font-size:10.0pt;font-family:"Co= urier New"">why I should really be doing something completely differen= t and the why I<o:p></o:p></span></p> <p class=3D"MsoNormal"><span style=3D"font-size:10.0pt;font-family:"Co= urier New"">should be doing it differently.<o:p></o:p></span></p> <p class=3D"MsoNormal"><span style=3D"font-size:10.0pt;font-family:"Co= urier New""><o:p> </o:p></span></p> <p class=3D"MsoNormal"><span style=3D"font-size:10.0pt;font-family:"Co= urier New"">I suspect there are many that would like to know this but = would never ask,<o:p></o:p></span></p> <p class=3D"MsoNormal"><span style=3D"font-size:10.0pt;font-family:"Co= urier New"">at least not on list.<o:p></o:p></span></p> <p class=3D"MsoNormal"><span style=3D"font-size:10.0pt;font-family:"Co= urier New""><o:p> </o:p></span></p> <p class=3D"MsoNormal"><span style=3D"font-size:10.0pt;font-family:"Co= urier New"">Always the seeker of new knowledge.<o:p></o:p></span></p> <p class=3D"MsoNormal"><span style=3D"font-size:10.0pt;font-family:"Co= urier New""><o:p> </o:p></span></p> <p class=3D"MsoNormal"><span style=3D"font-size:10.0pt;font-family:"Co= urier New"">Thanks in advance.<o:p></o:p></span></p> <p class=3D"MsoNormal"><span style=3D"font-size:10.0pt;font-family:"Co= urier New""><o:p> </o:p></span></p> <p class=3D"MsoNormal"><span style=3D"font-size:10.0pt;font-family:"Co= urier New"">--mikej<o:p></o:p></span></p> <p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:"Ca= libri",sans-serif;color:#1F497D"><o:p> </o:p></span></p> <p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:"Ca= libri",sans-serif;color:#1F497D"><o:p> </o:p></span></p> <div> <p style=3D"FONT-SIZE: 10pt; FONT-FAMILY: Arial"> </p> <div style=3D"FONT-SIZE: 10px; BORDER-TOP: #666666 1px solid; FONT-FAMILY: = Verdana; WIDTH: 410px; BORDER-BOTTOM: #666666 1px solid; PADDING-BOTTOM: 5p= x; PADDING-TOP: 5px; PADDING-LEFT: 5px; PADDING-RIGHT: 5px"> CONFIDENTIALITY NOTE: This message is intended only for the use<br> of the individual or entity to whom it is addressed and may <br> contain information that is privileged, confidential, and <br> exempt from disclosure under applicable law. If the reader <br> of this message is not the intended recipient, you are hereby <br> notified that any dissemination, distribution or copying <br> of this communication is strictly prohibited. If you have <br> received this transmission in error, please notify us by <br> telephone at (502) 212-4000 or notify us at: PAI, Dept. 99, <br> 2101 High Wickham Place, Suite 101, Louisville, KY 40245<br> <br> <br> <br> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> <p style=3D"FONT-SIZE: 10px; FONT-FAMILY: Verdana"></p> </div> </div> <div> <div style=3D"border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in = 0in 0in"> <p class=3D"MsoNormal"><b><span style=3D"font-size:11.0pt;font-family:"= ;Calibri",sans-serif">From:</span></b><span style=3D"font-size:11.0pt;= font-family:"Calibri",sans-serif"> owner-freebsd-current@freebsd.= org [mailto:owner-freebsd-current@freebsd.org] <b>On Behalf Of </b>Mark Millard<br> <b>Sent:</b> Saturday, April 23, 2022 3:32 PM<br> <b>To:</b> Pete Wright <pete@nomadlogic.org><br> <b>Cc:</b> freebsd-current <freebsd-current@freebsd.org><br> <b>Subject:</b> Re: Chasing OOM Issues - good sysctl metrics to use?<o:p></= o:p></span></p> </div> </div> <p class=3D"MsoNormal"><o:p> </o:p></p> <p class=3D"MsoNormal" style=3D"margin-bottom:12.0pt">On 2022-Apr-23, at 10= :26, Pete Wright <<a href=3D"mailto:pete@nomadlogic.org">pete@nomadlogic= .org</a>> wrote:<br> <br> > On 4/22/22 18:46, Mark Millard wrote:<br> >> On 2022-Apr-22, at 16:42, Pete Wright <<a href=3D"mailto:pete@n= omadlogic.org">pete@nomadlogic.org</a>> wrote:<br> >> <br> >>> On 4/21/22 21:18, Mark Millard wrote:<br> >>>> Messages in the console out would be appropriate<br> >>>> to report. Messages might also be available via<br> >>>> the following at appropriate times:<br> >>> that is what is frustrating. i will get notification that the = processes are killed:<br> >>> Apr 22 09:55:15 topanga kernel: pid 76242 (chrome), jid 0, uid= 1001, was killed: failed to reclaim memory<br> >>> Apr 22 09:55:19 topanga kernel: pid 76288 (chrome), jid 0, uid= 1001, was killed: failed to reclaim memory<br> >>> Apr 22 09:55:20 topanga kernel: pid 76259 (firefox), jid 0, ui= d 1001, was killed: failed to reclaim memory<br> >>> Apr 22 09:55:22 topanga kernel: pid 76252 (firefox), jid 0, ui= d 1001, was killed: failed to reclaim memory<br> >>> Apr 22 09:55:23 topanga kernel: pid 76267 (firefox), jid 0, ui= d 1001, was killed: failed to reclaim memory<br> >>> Apr 22 09:55:24 topanga kernel: pid 76234 (chrome), jid 0, uid= 1001, was killed: failed to reclaim memory<br> >>> Apr 22 09:55:26 topanga kernel: pid 76275 (firefox), jid 0, ui= d 1001, was killed: failed to reclaim memory<br> >> Those messages are not reporting being out of swap<br> >> as such. They are reporting sustained low free RAM<br> >> despite a number of less drastic attempts to gain<br> >> back free RAM (to above some threshold).<br> >> <br> >> FreeBSD does not swap out the kernel stacks for<br> >> processes that stay in a runnable state: it just<br> >> continues to page. Thus just one large process<br> >> that has a huge working set of active pages can<br> >> lead to OOM kills in a context were no other set<br> >> of processes would be enough to gain the free<br> >> RAM required. Such contexts are not really a<br> >> swap issue.<br> > <br> > Thank you for this clarification/explanation - that totally makes sens= e!<br> > <br> >> <br> >> Based on there being only 1 "killed:" reason,<br> >> I have a suggestion that should allow delaying<br> >> such kills for a long time. That in turn may<br> >> help with investigating without actually<br> >> suffering the kills during the activity: more<br> >> time with low free RAM to observe.<br> > <br> > Great idea thank-you! and thanks for the example settings and descript= ions as well.<br> >> But those are large but finite activities. If<br> >> you want to leave something running for days,<br> >> weeks, months, or whatever that produces the<br> >> sustained low free RAM conditions, the problem<br> >> will eventually happen. Ultimately one may have<br> >> to exit and restart such processes once and a<br> >> while, exiting enough of them to give a little<br> >> time with sufficient free RAM.<br> > perfect - since this is a workstation my run-time for these processes = is probably a week as i update my system and pkgs over the weekend, then do= g food current during the work week.<br> > <br> >>> yes i have a 2GB of swap that resides on a nvme device.<br> >> I assume a partition style. Otherwise there are other<br> >> issues involved --that likely should be avoided by<br> >> switching to partition style.<br> > <br> > so i kinda lied - initially i had just a 2G swap, but i added a second= 20G swap a while ago to have enough space to capture some cores while test= ing drm-kmod work. based on this comment i am going to only use the 20G fil= e backed swap and see how that goes.<br> > <br> > this is my fstab entry currently for the file backed swap:<br> > md99 none swap sw,file=3D/root/swap1,late 0 0<br> <br> I think you may have taken my suggestion backwards . . .<br> <br> Unfortunately, vnode (file) based swap space should be *avoided*<br> and partitions are what should be used in order to avoid deadlocks:<br> <br> On 2017-Feb-13, at 7:20 PM, Konstantin Belousov <kostikbel at gmail.com&= gt; wrote<br> on the freebsd-arm list:<br> <br> QUOTE<br> swapfile write requires the write request to come through the filesystem<br= > write path, which might require the filesystem to allocate more memory<br> and read some data. E.g. it is known that any ZFS write request<br> allocates memory, and that write request on large UFS file might require<br= > allocating and reading an indirect block buffer to find the block number<br= > of the written block, if the indirect block was not yet read.<br> <br> As result, swapfile swapping is more prone to the trivial and unavoidable<b= r> deadlocks where the pagedaemon thread, which produces free memory, needs<br= > more free memory to make a progress. Swap write on the raw partition over<b= r> simple partitioning scheme directly over HBA are usually safe, while e.g.<b= r> zfs over geli over umass is the worst construction.<br> END QUOTE<br> <br> The developers handbook has a section debugging deadlocks that he<br> referenced in a response to another report (on freebsd-hackers).<br> <br> <a href=3D"https://docs.freebsd.org/en/books/developers-handbook/kerneldebu= g/#kerneldebug-deadlocks">https://docs.freebsd.org/en/books/developers-hand= book/kerneldebug/#kerneldebug-deadlocks</a><br> <br> >>>> ZFS (so with ARC)? UFS? Both?<br> >>> i am using ZFS and am setting my vfs.zfs.arc.max to 10G. i hav= e also experienced this crash with that set to the default unlimited value = as well.<br> >> I use ZFS on systems with at least 8 GiBytes of RAM,<br> >> but I've never tuned ZFS. So I'm not much help for<br> >> that side of things.<br> > <br> > since we started this thread I've gone ahead and removed the zfs.arc.m= ax setting since its cruft at this point. i initially added it to test a co= nfiguration i deployed to a sever hosting a bunch of VMs.<br> > <br> >> I'm hoping that vm.pageout_oom_seq=3D120 (or more) makes it<br> >> so you do not have to have identified everything up front<br> >> and can explore easier.<br> >> <br> >> <br> >> Note that vm.pageout_oom_seq is both a loader tunable<br> >> and a writeable runtime tunable:<br> >> <br> >> # sysctl -T vm.pageout_oom_seq<br> >> vm.pageout_oom_seq: 120<br> >> amd64_ZFS amd64 1400053 1400053 # sysctl -W vm.pageout_oom_seq<br> >> vm.pageout_oom_seq: 120<br> >> <br> >> So you can use it to extend the time when the<br> >> machine is already running.<br> > <br> > fantastic. thanks again for taking your time and sharing your knowledg= e and experience with me Mark!<br> > <br> > these types of journeys are why i run current on my daily driver, it r= eally helps me better understand the OS so that i can be a better admin on = the "real" servers i run for work. its also just fun to learn stu= ff too heh.<br> > <br> <br> <br> =3D=3D=3D<br> Mark Millard<br> marklmi at yahoo.com<o:p></o:p></p> </div> <br><br><p style=3D"font-family: Verdana; font-size:10pt; color:#666666;"><= b>Disclaimer</b></p><p style=3D"font-family: Verdana; font-size:8pt; color:= #666666;">The information contained in this communication from the sender i= s confidential. It is intended solely for use by the recipient and others a= uthorized to receive it. If you are not the recipient, you are hereby notif= ied that any disclosure, copying, distribution or taking action in relation= of the contents of this information is strictly prohibited and may be unla= wful.<br><br>This email has been scanned for viruses and malware, and may h= ave been automatically archived by Mimecast, a leader in email security and= cyber resilience. Mimecast integrates email defenses with brand protection= , security awareness training, web security, compliance and other essential= capabilities. Mimecast helps protect large and small organizations from ma= licious activity, human error and technology failure; and to lead the movem= ent toward building a more resilient world. To find out more, visit our web= site.</p></body></html> --_000_e2322e858bfb4f1caadaafedfb3ec6a7MAILHUBpailocal_--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?e2322e858bfb4f1caadaafedfb3ec6a7>