Date: Sun, 19 Aug 2018 23:37:27 -0700 From: Mark Millard <marklmi@yahoo.com> To: gurenchan@gmail.com, FreeBSD Current <freebsd-current@freebsd.org> Subject: Re: building LLVM threads gets killed Message-ID: <048B761D-E1A0-4EE3-AA55-E2FFBD19F9F6@yahoo.com> In-Reply-To: <C03FB387-9AF1-4767-A0A9-ACBDD37B6A48@yahoo.com> References: <C03FB387-9AF1-4767-A0A9-ACBDD37B6A48@yahoo.com>
next in thread | previous in thread | raw e-mail | index | archive | help
[In part a resend from the right Email account. In part adding a note about another Mark Johnston patch for reporting information.] On 2018-Aug-19, at 11:25 PM, Mark Millard <marklmi26-fbsd at yahoo.com> = wrote: > blubee blubeeme gurenchan at gmail.com wrote on > Mon Aug 20 03:02:01 UTC 2018 : >=20 >> I am running current compiling LLVM60 and when it comes to linking >> basically all the processes on my computer gets killed; Chrome, = Firefox and >> some of the LLVM threads as well >=20 >> . . . >=20 >> last pid: 20965; load averages: 0.64, 5.79, 7.73 >> up 12+01:35:46 11:00:36 >> 76 processes: 1 running, 75 sleeping >> CPU: 0.8% user, 0.5% nice, 1.0% system, 0.0% interrupt, 98.1% = idle >> Mem: 10G Active, 3G Inact, 100M Laundry, 13G Wired, 6G Free >> ARC: 4G Total, 942M MFU, 1G MRU, 1M Anon, 43M Header, 2G Other >> 630M Compressed, 2G Uncompressed, 2.74:1 Ratio >> Swap: 2G Total, 1G Used, 739M Free, 63% Inuse >> . . . >=20 > The timing of that top output relative to the first or > any OOM kill of a process is not clear. After? Just > before? How long before? What it is like leading up > to the first kill is of interest. >=20 > Folks that deal with this are likely to want do know > if you got console messages ( or var/log/messages content) > such as: >=20 > pid 49735 (c++), uid 0, was killed: out of swap space >=20 > (Note: "out of swap space" can be a misnomer for having > low Free RAM for "too long" [vm.pageout_oom_seq based], > even with swap unused or little used.) >=20 > And: Were you also getting messages like: >=20 > swap_pager_getswapspace(4): failed >=20 > and/or: >=20 > swap_pager: out of swap space >=20 > (These indicate the "killed: out of swap space" is not > necessarily a misnomer relative to swap space, even if > low free RAM over a time drives the process kills.) >=20 > How about messages like: >=20 > swap_pager: indefinite wait buffer: bufobj: 0, blkno: 28139, size: = 65536 >=20 > or any I/O error reports or retry reports? >=20 >=20 >=20 > Notes: >=20 > Mark Johnston published a patch used for some investigations of > the OOM killing: >=20 > https://people.freebsd.org/~markj/patches/slow_swap.diff >=20 > But this is tied to the I/O swap latencies involved and if they > are driving some time frames. It just adds more reporting to > the console ( and /var/log/messages ). It is not a fix. IT may > not be likely to report much for your context. >=20 >=20 > vm.pageout_oom_seq controls the "how long is low free RAM > tolerated" (my hprasing), though the units are not directly > time. In various arm contexts with small boards going from > the default of 12 to 120 allowed things to complete or get > much farther. So: >=20 > sysctl vm.pageout_oom_seq=3D120 >=20 > but 120 is not the limit: it is a C int parameter. >=20 > I'll note that "low free RAM" is as FreeBSD classifies it, > whatever the details are. >=20 >=20 >=20 > Most of the arm examples have been small memory contexts > and many of them likely avoid ZFS and use UFS instead. > ZFS and its ARC and such an additional complicated > context to the type of issue. There are lots of reports > around of the ARC growing too big. I do not know the > status of -r336196 relative to ZFS/ARC memory management > or if more recent versions have improvements. (I do not > use ZFS normally.) I've seen messages making suggestions > for controlling the growth but I'm no ZFS expert. >=20 >=20 > Just to give an idea what is sufficient to build > devel/llvm60: >=20 > I will note that on a Pine64+ 2GB (so 2 GiBytes of RAM > in a aarch64 context with 4 cores, 1 HW-thread per core) > running -r337400, and using UFS on a USB drive and a > swap partition that drive too, I have built devel/llvm60 > 2 times via poudriere-devel: just one builder > allowed but it being allowed to use all 4 cores in > parallel, about 14.5 hr each time. (Different USB media > each time.) This did require the: >=20 > sysctl vm.pageout_oom_seq=3D120 >=20 > Mark Johnston's slow_swap.diff patch code did not > report any I/O latency problems in the swap subsystem. >=20 > I've also built lang/gcc8 2 times, about 12.5 hrs > each time. >=20 > No ZFS, no ARC, no Chrome, no FireFox. Nothing else > major going on beyond the devel/llvm60 build (or, later, > the lang/gcc8 build) in each case. Mark Johnston in the investigation for the arm context also had us use the following patch: diff --git a/sys/vm/vm_pageout.c b/sys/vm/vm_pageout.c index 264c98203c51..9c7ebcf451ec 100644 --- a/sys/vm/vm_pageout.c +++ b/sys/vm/vm_pageout.c @@ -1670,6 +1670,8 @@ vm_pageout_mightbe_oom(struct vm_domain *vmd, int = page_shortage, * start OOM. Initiate the selection and signaling of the * victim. */ + printf("v_free_count: %u, v_inactive_count: %u\n", + vmd->vmd_free_count, = vmd->vmd_pagequeues[PQ_INACTIVE].pq_cnt); vm_pageout_oom(VM_OOM_MEM); /* This patch is not about the I/O latencies but about the free RAM and inactive RAM at exactly the point of the OOM kill activity. =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?048B761D-E1A0-4EE3-AA55-E2FFBD19F9F6>