From nobody Wed May 11 19:52:28 2022 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 7CD3F1ADA761 for ; Wed, 11 May 2022 19:52:47 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic317-20.consmr.mail.gq1.yahoo.com (sonic317-20.consmr.mail.gq1.yahoo.com [98.137.66.146]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4Kz5DG3knCz4nCM for ; Wed, 11 May 2022 19:52:46 +0000 (UTC) (envelope-from marklmi@yahoo.com) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1652298758; bh=PB2bU10ctaceXksHz6ZKN3k5Cu5aWdX5y/8D7woUUps=; h=Subject:From:In-Reply-To:Date:Cc:References:To:From:Subject:Reply-To; b=kAJFgj7Qf5N+yPq8V25uwZAyC9c3pW1VjzcBgr+3fYGsld/l/YOgXhRJn8BdwsHB5A/JXrGrlsWzUf2d0RzWWM4ieQxkI0YXvnUvtdS1Sn8TJ8TnnpAS2w0T3GcJ1CwPDHouPKR+WcS5VPqZ7HBwW5OViIoy5VnoMePpBrOcES3E407da6d9vBkRBOqhXRVjedlOci+UTK1YobZygC8q7ryIH5TXQ1AD/tf18ToaQ5qdk74XQGz2zw3PpsaN7byrl0+yIG/Ebac7wqi6q78Lbd6JjwmvUMsWhS//hGI4h/b9z5GWTTDhTK1aXnbY/Uf9uak7ockBw18vom7r5Wc/lw== X-SONIC-DKIM-SIGN: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1652298758; bh=VQJiq2hGYSOBdhDFBS6NUECiIOIofdTEyzcjl47bCMA=; h=X-Sonic-MF:Subject:From:Date:To:From:Subject; b=IwQKaeHtnbcALeoCss1ZU91TM0WHRZxkTwbiDbbNRxRS+Dqbt2Br/wKor2nFMDSmhiJ74OFyIa/3JQ6cF+Lp2v3QtCsHKgfcXYHXt9IwOpv7L8u3dpDmHAK8SlwY6K79Lf3ZLydeONMMHSiM3YHRjQ49G8fNae0FL+cCL1am1oYFqCEPSiAFaQBKtdjfUIb9Vem2lPf+Wu0qgRIvpSwm7jXep77lUVNKNboSApEuY5Pv9DPakCiB4ruC9DlEHahU7R6I8j1PJp3d26BtMx+Yl77/OmvqOM+JWFYSrz4PwNBzhEK3K5NEbbs3fdEYIjdnJczpdpYNV/me4NiTt1P4tg== X-YMail-OSG: 1Z48tIwVM1l4LFot.TOLojW_qZi4YHJ0KNSa6LmWwji.Ouc.wCd1rKxjJPrKhrJ wRHNGbBRxaBmYIR8JloT9lzTOMJGe3DSEp1jr_5Pf1gzU3D_XZMwp87WwLvXeNfVkIOGhMX.NbKj xXvu768KuWQy4j2sW4c_CpTgMTuDhETSfAz.RmB.23N7ndbSBTKuTqezuPIDYyi2M6FDQZuvKZEX k0Tzd3kOV_6ErbozskhHPFbijwrPpN2Hnd3oJV8rCecJQo7SVMKekvQgIliPxNN8JcNi9A9rOdlE oiHzVkPz.9zhrF5Vm9WcZrFXIulDzKmCYezYgZkuiDNbMrg4z2.eyiK5xYkRx9w8gh8YwnttKFT8 M2k4YQxA4Usp6k.3yJKX1XsjeWX6sP7aWnBlkRvos1l_RunS8GgzR_ynKQ4_HUNijvKCXcwMFqqv 3ihzrgB2RX0ku..bsYc9Qm4_UHMe2i8LSURTpfvwheUomdCpWi_7IKl1jKeUVQDY714QGD.5kBo8 DQ12CU3GEvyUQtn6Egg2O0qjvWiO56Y42zoXhOQy172HijdoXoYZv4._dZWbrdD12Hje00G5tGM_ YDtg8P9hZLe9AjK.I9UcV5GYUIoRSW_HLttfRtT_lp2IfY6moodezZmGwsjy6T.szZriYic92plR _BKfsxgYl1EDKS9APpAkaNJO4nN1wjvEmd29XDVOCuJeFEL..QZoOAHg5ivhhaCcmn9pmmRUlvT3 6tUxmkPr8r89xRcMN5jbs28xuxERV.4ir_7V1f5dr_dN2BYp9iLD9hMLyHfKGF9zbWAjifeFEqG6 2OvjmF4.VRQSaSx1NpTkLS9_MLS9Qn4ORntwrdOkx2MILr_7BnHm1bb4ubnJL_gFmkGQogTmnb4d z_08hRbrpBw.npwpa2cBid4bbVkLl_xhEy74_v90phB7a7nvKjAnAqNebdzA0jhG_7ySD7m0iIIy STropbdkMTSTGOlG3AyVhadA1nw1nsNcxOTMuCQEIedMEQaysW37BoXNen0HXvl3ZPfy.SfETfJ5 1D7pP44sXBuDkZ3IzrROBxFA2zP1RKai3aYgfIgMTyQTbz5uZlEbITkuB22Ts3Scz29vmPAUxgJn ULTzensDw2G_xl7oeMtbu7QvM_Qf2B3NqmXG9Yuo4MceZCr9hbJHw7HacdQ536an2trN6YH6Q.m. 9cQ76MLi7jGp.JR8PsJCEAzSjifn1X0duGc1Mu0U7lBDvPvOrBfHigMrtunVB1DeeootknvNnbvJ 0JwuvKpGPwo4_6q5scDbM3at310vXomYCUynVmMOMAPZ5fM3b7BM53fYmGwpnorAP.oJg_vBRXwS ajKSJmtUxgoUplCpErqk6wq1gE8IgsDg6xZQq3vyjF8m5jug5JjI8EA3DPvw61o9c8qVLyow.rfH l9iz96RpMdfD.Jnd5ERK.0zmnfPyu31JghBOCK6Xww9nYzFFVxMeIPxGhC9_SUKdk0swsur7ztMC aOd_kd9M0M8Ro.syNEsGJwC3awJjN8zct74qrQbWVFFrwXNTLfi.p4OiDFsnV7ghwAFNL513RaNN 1H3oROXMRht06xqEvGRwl2BkwEHwVG6YG3WmK8JX_8qTwikYOjVCrfPqJJI2DJQBrCamVGlt8g8x Td91NCQy77.G6Wmh9B8GiCxHmP_cuiAslxQohgETKqOcOSai0oO86bZfzzYs2PwCYhWOcWzSgUm7 PZfHalvQfYPxaWFKeqP9dNMj9GMMG6mWIEERoldeJDK_JH1Nwcyh0zhvShGW6zbDO5jnifFgfLyV BELkU7d_0wzg3c6q9eRR8BE.g18Wjgc.S8Gydk_Pfsf8EIio0FLY8GfO4vZxAC1EL7fg2..gEUEA htM7bofVcR8u5qdo9YBi2299tHh1nvJm50GYdSrINXSZ25ZQ_IFku0U.D7ybrAJS9jnROK05EgUG oa3ddkGXbdK9fi4YQ02ZtPm8uWEeWdx7Z4Ig1AGUfxVzsdLQRXdCXM1ClrNCpFL5ZjvEzjvykkYT 1OB4GteUpiehmL6q5VfZFhkUhgLVykhWetOXJKyeS976nmq00TePZ.2xKBpTv21Pp79dX46GwblR abQymm2bNGE3KDSmKi0mJB.qb9q91itZ38HNryeUFs8LditIyZ1M7hkDSwyfGQioy.G_QEhPy80T d5JtE.aDsRavSgIR7FQxkh2cI9zQLml.FdCOpHPxqMvBmODjlHFebC3pzskJxs1yWBcF554IHomB utpBxWtUONJj5LDbQO4oereeNnpZPF9ki.cMnRsIkPH4NNS2phkPxdVWU39AkIIhIHYqN08ETv75 DT6Pmt5yoWmpR9GltanbQgNhR X-Sonic-MF: Received: from sonic.gate.mail.ne1.yahoo.com by sonic317.consmr.mail.gq1.yahoo.com with HTTP; Wed, 11 May 2022 19:52:38 +0000 Received: by hermes--canary-production-bf1-579c78cbb7-nckdv (Yahoo Inc. Hermes SMTP Server) with ESMTPA ID d4f4d215f941b334ec004a9e148c41a5; Wed, 11 May 2022 19:52:33 +0000 (UTC) Content-Type: text/plain; charset=utf-8 List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@freebsd.org Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.120.0.1.13\)) Subject: Re: Chasing OOM Issues - good sysctl metrics to use? From: Mark Millard In-Reply-To: Date: Wed, 11 May 2022 12:52:28 -0700 Cc: freebsd-current Content-Transfer-Encoding: quoted-printable Message-Id: <0E44A609-A040-4801-B3FA-E0B410F0C3D3@yahoo.com> References: <83A713B9-A973-4C97-ACD6-830DF6A50B76.ref@yahoo.com> <83A713B9-A973-4C97-ACD6-830DF6A50B76@yahoo.com> <94B2E2FD-2371-4FEA-8E01-F37103F63CC0@yahoo.com> <0fcb5a4a-5517-e57b-2b69-4f3b3b10589a@nomadlogic.org> <464ED220-0DE4-4D2F-9DA2-AFD00D8D42B7@yahoo.com> <446d5913-a8c2-7dd0-860b-792fa9fe7c5b@nomadlogic.org> <33B740AA-A431-49CB-9F27-50B8C49734A2@yahoo.com> <3C5C183F-1471-4139-A53C-0B3815CFC25E@yahoo.com> <75C02C8C-6A5E-4E19-AC7D-B5DB704E8F16@transactionware.com> To: Jan Mikkelsen , Pete Wright X-Mailer: Apple Mail (2.3654.120.0.1.13) X-Rspamd-Queue-Id: 4Kz5DG3knCz4nCM X-Spamd-Bar: - Authentication-Results: mx1.freebsd.org; dkim=pass header.d=yahoo.com header.s=s2048 header.b=kAJFgj7Q; dmarc=pass (policy=reject) header.from=yahoo.com; spf=pass (mx1.freebsd.org: domain of marklmi@yahoo.com designates 98.137.66.146 as permitted sender) smtp.mailfrom=marklmi@yahoo.com X-Spamd-Result: default: False [-1.83 / 15.00]; FREEMAIL_FROM(0.00)[yahoo.com]; MV_CASE(0.50)[]; R_SPF_ALLOW(-0.20)[+ptr:yahoo.com]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[yahoo.com:+]; DMARC_POLICY_ALLOW(-0.50)[yahoo.com,reject]; NEURAL_HAM_SHORT(-0.73)[-0.735]; FROM_EQ_ENVFROM(0.00)[]; RCVD_TLS_LAST(0.00)[]; MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[yahoo.com]; ASN(0.00)[asn:36647, ipnet:98.137.64.0/20, country:US]; MID_RHS_MATCH_FROM(0.00)[]; DWL_DNSWL_NONE(0.00)[yahoo.com:dkim]; SUBJECT_ENDS_QUESTION(1.00)[]; ARC_NA(0.00)[]; R_DKIM_ALLOW(-0.20)[yahoo.com:s=s2048]; NEURAL_HAM_MEDIUM(-0.59)[-0.594]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; NEURAL_HAM_LONG(-1.00)[-1.000]; MIME_GOOD(-0.10)[text/plain]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[98.137.66.146:from]; MLMMJ_DEST(0.00)[freebsd-current]; RCVD_COUNT_TWO(0.00)[2] X-ThisMailContainsUnwantedMimeParts: N On 2022-May-10, at 20:31, Mark Millard wrote: > On 2022-May-10, at 17:49, Mark Millard wrote: >=20 >> On 2022-May-10, at 11:49, Mark Millard wrote: >>=20 >>> On 2022-May-10, at 08:47, Jan Mikkelsen = wrote: >>>=20 >>>> On 10 May 2022, at 10:01, Mark Millard wrote: >>>>>=20 >>>>> On 2022-Apr-29, at 13:57, Mark Millard wrote: >>>>>=20 >>>>>> On 2022-Apr-29, at 13:41, Pete Wright = wrote: >>>>>>>=20 >>>>>>>> . . . >>>>>>>=20 >>>>>>> d'oh - went out for lunch and workstation locked up. i *knew* i = shouldn't have said anything lol. >>>>>>=20 >>>>>> Any interesting console messages ( or dmesg -a or = /var/log/messages )? >>>>>>=20 >>>>>=20 >>>>> I've been doing some testing of a patch by tijl at FreeBSD.org >>>>> and have reproduced both hang-ups (ZFS/ARC context) and kills >>>>> (UFS/noARC and ZFS/ARC) for "was killed: failed to reclaim >>>>> memory", both with and without the patch. This is with only a >>>>> tiny fraction of the swap partition(s) enabled being put to >>>>> use. So far, the testing was deliberately with >>>>> vm.pageout_oom_seq=3D12 (the default value). My testing has been >>>>> with main [so: 14]. >>>>>=20 >>>>> But I also learned how to avoid the hang-ups that I got --but >>>>> it costs making kills more likely/quicker, other things being >>>>> equal. >>>>>=20 >>>>> I discovered that the hang-ups that I got were from all the >>>>> processes that I interact with the system via ending up with >>>>> the process's kernel threads swapped out and were not being >>>>> swapped in. (including sshd, so no new ssh connections). In >>>>> some contexts I only had escaping into the kernel debugger >>>>> available, not even ^T would work. Other times ^T did work. >>>>>=20 >>>>> So, when I'm willing to risk kills in order to maintain >>>>> the ability to interact normally, I now use in >>>>> /etc/sysctl.conf : >>>>>=20 >>>>> vm.swap_enabled=3D0 >>>>=20 >>>> I have been looking at an OOM related issue. Ignoring the actual = leak, the problem leads to a process being killed because the system was = out of memory. This is fine. After that, however, the system console was = black with a single block cursor and the console keyboard was = unresponsive. Caps lock and num lock didn=E2=80=99t toggle their lights = when pressed. >>>>=20 >>>> Using an ssh session, the system looked fine. USB events for the = keyboard being disconnected and reconnected appeared but the keyboard = stayed unresponsive. >>>>=20 >>>> Setting vm.swap_enabled=3D0, as you did above, resolved this = problem. After the process was killed a perfectly normal console = returned. >>>>=20 >>>> The interesting thing is that this test system is configured with = no swap space. >>>>=20 >>>> This is on 13.1-RC5. >>>>=20 >>>>> This disables swapping out of process kernel stacks. It >>>>> is just with that option removedfor gaining free RAM, there >>>>> fewer options tried before a kill is initiated. It is not a >>>>> loader-time tunable but is writable, thus the >>>>> /etc/sysctl.conf placement. >>>>=20 >>>> Is that really what it does? =46rom a quick look at the code in = vm/vm_swapout.c, it seems little more complex. >>>=20 >>> I was going by its description: >>>=20 >>> # sysctl -d vm.swap_enabled >>> vm.swap_enabled: Enable entire process swapout >>>=20 >>> Based on the below, it appears that the description >>> presumes vm.swap_idle_enabled=3D=3D0 (the default). In >>> my context vm.swap_idle_enabled=3D=3D0 . Looks like I >>> should also list: >>>=20 >>> vm.swap_idle_enabled=3D0 >>>=20 >>> in my /etc/sysctl.conf with a reminder comment that the >>> pair of =3D0's are required for avoiding the observed >>> hang-ups. >>>=20 >>>=20 >>> The analysis goes like . . . >>>=20 >>> I see in the code that vm.swap_enabled !=3D0 causes >>> VM_SWAP_NORMAL : >>>=20 >>> void >>> vm_swapout_run(void) >>> { >>>=20 >>> if (vm_swap_enabled) >>> vm_req_vmdaemon(VM_SWAP_NORMAL); >>> } >>>=20 >>> and that in turn leads to vm_daemon to: >>>=20 >>> if (swapout_flags !=3D 0) { >>> /* >>> * Drain the per-CPU page queue batches as a = deadlock >>> * avoidance measure. >>> */ >>> if ((swapout_flags & VM_SWAP_NORMAL) !=3D 0) >>> vm_page_pqbatch_drain(); >>> swapout_procs(swapout_flags); >>> } >>>=20 >>> Note: vm.swap_idle_enabled=3D=3D0 && vm.swap_enabled=3D=3D0 ends >>> up with swapout_flags=3D=3D0. vm.swap_idle. . . defaults seem >>> to be (in my context): >>>=20 >>> # sysctl -a | grep swap_idle >>> vm.swap_idle_threshold2: 10 >>> vm.swap_idle_threshold1: 2 >>> vm.swap_idle_enabled: 0 >>>=20 >>> For reference: >>>=20 >>> /* >>> * Idle process swapout -- run once per second when pagedaemons are >>> * reclaiming pages. >>> */ >>> void >>> vm_swapout_run_idle(void) >>> { >>> static long lsec; >>>=20 >>> if (!vm_swap_idle_enabled || time_second =3D=3D lsec) >>> return; >>> vm_req_vmdaemon(VM_SWAP_IDLE); >>> lsec =3D time_second; >>> } >>>=20 >>> [So vm.swap_idle_enabled=3D=3D0 avoids VM_SWAP_IDLE status.] >>>=20 >>> static void >>> vm_req_vmdaemon(int req) >>> { >>> static int lastrun =3D 0; >>>=20 >>> mtx_lock(&vm_daemon_mtx); >>> vm_pageout_req_swapout |=3D req; >>> if ((ticks > (lastrun + hz)) || (ticks < lastrun)) { >>> wakeup(&vm_daemon_needed); >>> lastrun =3D ticks; >>> } >>> mtx_unlock(&vm_daemon_mtx); >>> } >>>=20 >>> [So VM_SWAP_IDLE and VM_SWAP_NORMAL are independent bits >>> in vm_pageout_req_swapout.] >>>=20 >>> vm_deamon does: >>>=20 >>> mtx_lock(&vm_daemon_mtx); >>> msleep(&vm_daemon_needed, &vm_daemon_mtx, PPAUSE, = "psleep", >>> vm_daemon_timeout); >>> swapout_flags =3D vm_pageout_req_swapout; >>> vm_pageout_req_swapout =3D 0; >>> mtx_unlock(&vm_daemon_mtx); >>>=20 >>> So vm_pageout_req_swapout is regenerated after thata >>> each time. >>>=20 >>> I'll not show the code for vm.swap_idle_enabled!=3D0 . >>>=20 >>=20 >> Well, with continued experiments I got an example of >> a hangup for which looking via the db> prompt did not >> show any swapping out of process kernel stacks >> ( vm.swap_enabled=3D0 was the context, so expected ). >> The environment was ZFS (so with ARC). >>=20 >> But this was testing with vm.pageout_oom_seq=3D120 instead >> of the default vm.pageout_oom_seq=3D12 . It may be that >> let sit long enough things would have unhung (external >> perspective). >>=20 >> It is part of what I'm experimenting with so we will see. >>=20 >=20 > Looks like I might have overreacted, in that for my > current tests there can be brief periods of delayed > response, but things respond in a little bit. > Definately not like the hang-ups I was getting with > vm.swap_enabled=3D1 . >=20 The following is based on using vm.pageout_oom_seq=3D120 which greatly delays kills. (I've never waited long enough.) vm.pageout_oom_seq=3D12 tends to get a kill fairly quickly, making the below hard to observe. More testing has shown it can hang up with use of vm.swap_enabled=3D0 with vm.swap_idle_enabled=3D0 --but the details I've observed suggest a livelock rather than a deadlock. It appears that the likes of (db> use output extractions): 1171 1168 1168 0 R+ CPU 2 stress 1170 1168 1168 0 R+ CPU 0 stress and: 18 0 0 0 RL (threaded) [pagedaemon] 100120 Run CPU 1 [dom0] 100132 D launds 0xffff000000f1dc44 [laundry: = dom0] 100133 D umarcl 0xffff0000007d8424 [uma] stay busy using power like when I have just those significantly active and the system is not hung-up. (30.6W..30.8W or so range, where idle is more like 26W and more general activity being involved ends up with the power jumping around over a wider range, for example.) I have observed non-hung-up tests where the 2 stress processes using the memory were getting around 99% in top and and [pagedaemon{dom0}] was getting around 90% but a grep was getting more like 0.04%. This looks like a near-livelock and it was what inspired looking if more suggested a livelock for a hang-up. Looking via db> use always has looked like the above. (Sometimes I've used 3 memory-using stress processes but now usually 2, leaving one CPU typically being idle.) That in turn lead to monitoring the power, ending up as mentioned above. I have also observed hang-up-like cases where the top that had been running would sometimes get individual screen updates many minutes apart. With the power usage pattern it again seems like a (near) livelock. Relative to avoiding hang-ups, so far it seems that use of vm.swap_enabled=3D0 with vm.swap_idle_enabled=3D0 makes hang-ups less likely/less frequent/harder to produce examples of. But is no guarantee of lack of a hang-up. Its does change the cause of the hang-up (in that it avoids processes with kernel stacks swapped out being involved). What I do to avoid rebooting for a hang-up I'd done with is to kill the memory using stress processes via db> use and then c out of the kernel debugger (i.e., continue). So far the system has always returned to normal in response. =3D=3D=3D Mark Millard marklmi at yahoo.com