From nobody Fri Apr 22 04:18:37 2022 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 93FEA199D43B for ; Fri, 22 Apr 2022 04:18:50 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic314-21.consmr.mail.gq1.yahoo.com (sonic314-21.consmr.mail.gq1.yahoo.com [98.137.69.84]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4Kl1PP2vmgz3n4s for ; Fri, 22 Apr 2022 04:18:49 +0000 (UTC) (envelope-from marklmi@yahoo.com) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1650601122; bh=AMxRAcJHYP4a1MF9ItzIuqm55U2r3ahaAWjsIBNlmX0=; h=From:Subject:Date:To:References:From:Subject:Reply-To; b=VdoxvGOFN6x6kkINGEOGQyYh2hl8GMQMpfARK/oKwAlLPPcCW7o54YrXm3nKHUSZiu3WmWjH+AurkaKwsjpD/S1+gtGPUEk0ygusZ5u0wk/qPaq295PS5hFjv+f31bmySqX5gcCgyxz0l90OeMnAmwhMlTl5v7zp47cyw91KsSd7rAMEMbw/7APCSTgy14GLfWtvGFpCBGhkU/9WgrlCmk+cmPd1lAW58eU1xfq4qdSUvCwZb+EFse1LLiAdE9KpGPTD56j3yhyBrWGuOFgCPfyjUyBjzaAso9rjPGVIZH4+I5AQ3ZAdK6pIsZ1Sjs1j1G4CrweGtO9Sl10A6DPL2g== X-SONIC-DKIM-SIGN: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1650601122; bh=ClCHouNOIspJjnrLQgZmS9iMrS0KuNRC6rwCzO7pcwo=; h=X-Sonic-MF:From:Subject:Date:To:From:Subject; b=r/5aupZNsbwLmREC0txGhcvUX8yVOoMvfUfbOBj/Lhs0yWDZ1Q1v9Ma5fSGUjjSpGZ0FE4HOKEaMvtqaCAiYlJjRA9vRjp2HCYVB51eNS6f0zT3YCPpZN9QK9r6GjQoaT/1F7nNuYUOaBe1n8LM0telSUPa37bciD08i6anvmfwn85+Aw10lHobtCwWcFvkpu/IXITzF2TkQc/6Hz9qRpplidIAJfUkmPQeBZrdIJFPwD1oj8dkf7hvlv4ORlX+/pWEKlb7JQGN+mdNEiWKJgvHyFdpRN2KHc2ux/hA/jLyFxpB9VzvsVX8/VYQqmIxHVFuF5eWSUrQyMU97Du+20w== X-YMail-OSG: VPS2OUIVM1m0HTBzTaQs_eyfVAtELPiPWlh76p2Jj4Sg85IFZ43xxsgRapNJPqK 6ONL2kipuqtiY7OGMREFUQsobZPqRbtebIXhuFGB.eseCvYQrW0RsBks0O8HTgM3KEKtxwKhuDuH 4qbtu5vFpZt3h7_60pxYZfpbV771ZLdz_yBvObOGsxgMLpg0g_UpM0.ErGAqVqGq7zebiAI.oCt0 U7n6uraSHLqAz6_aWL1GQ03cu74_F.aKTjEqvo.Z_aHEBQEOzZFlJHqzOQqfjf18db2Rx0qpyDJZ XmKUtzqQvWxesBBNQYL9EJ5RYpqsqghEbgteVTUPq3mlKNPQV7pVXkzAvs.KBO7_F_OU0To7FvbR Wi2wE2wFeQdwrZz1cakwoxldA14xjqYrfbPJ5z_WavLza3fqsc8vJR3Ps4ZYiirtr1ZKMlq45JCV Obr85HVjmhmkDVIADFeFplliu32HWKFQNH387i7hFxcZ86m4q91MmjFMGjfhLYApqSd5elaVc_2L QGoj44tssJJPSyVRHd51fRpCjeSPTIXSe1zKbJSH2K4UExEv7Uglm7f4tYtSOVVm7VENph4k7zEe KQQyMEyxL5oIhYhaixEEICPP1Fuaop7SwjPebdxqW1tEPP0UoIjpCrTFTMeP9uuvqevO.gbF6f83 g4JafTzNVm8Ki8YF1rdas.5pAmpLyRWPTDb0EYGqI3Q7pkLr0g0ZKoZn6ahIg1pQcyYwIBo_CsJ2 5bO5sFhAY5YMboUYjQRao6KczRy2mpP7tRHVl3Qy1W0x1B5VxlS7eOq8gZ.GRKZfbLJ3xjoAnMnp fUbVyGFvb68Fc2TTcYlORvG0iLypCc2zOh5Om1hFXi_c2PJ8he92YDA6hAhfPqz5.Ef3XoxPerjJ TLd21H0Ama_tZ0fRqqui9cakzVGlw2LByXQklBHCFua4IGLgukivIXgiHAN7KAJH79lsXZYSDG87 JgLYpnICR..VS7gARfrrlRQMYTFrtMtzft.68pFAeto58eKDn0glEeWEzJ1mzyKT5Qh6_adVJVQa gYR4e4iB8oXlNf9oh.4qaF1FKFGXAp8yiFSy1nFbxUbx_zRsvvsxkSl.hNpfAcFFE2aZpqIHs4Oj QyyB66gZpVV049IN8u74wTPhnfC0yHcgr6iZjAo4FRXFsVz9XUoXGXuOAQK7oBZqmeSTU7J4B5.S sUtYUwI0fwGWlh.C8FMMN3CpYc2Z.IpE8PM6NSTEZcZIpAiA6RmWMO4Ho_iQLZafKFCQUBXS42sC dzcumM4b8baDV1PQaw.LB9_15F6t7PIhEtC2Eb01SaGT_0bKdyJLx4N9szf0QlZY8b5NZSJNvYAO _LYTUFaDQBmdyjAuE00DLQVWH1kPXJNUP345KVOZSg0NRaIzeJ7d4AOAHhu85_MX6RPYMxWvMGem DUrAopBHKz2KOWZ9DC24U6hYjHwt6MVo725zzlMR6a_9h8vKGcsjBvtIMy2AuEaejtYr7Ds_.xKi FPMNo0jphf.t.sC9tWIbJwMBLgo3MvM0RFObvqUT8rJX52k8WD40FrT.3h.vk3UiidgLlSLvTLq0 8MJoW2fdki5hzIxWoTNMT80iLcRl8ywyHLPJPEAzFzs_OF8zVloPP8lPvu6YqI4PfVLfXSJtsrXs vu_CpY5T39qCX0qykucLTDNchHakbb1HIZ_NQlA58ItPybE6yyfE4.OIAPdw9KFwPDLDK6Pdp6wT WlRdNs4JI3yOwfZR7M3YuW1Y8LayVrdX5DPUkyKoUv.3NDZp75HvrRjm686NeVwY43lqjZe.bzyn NWB5sAz8wBlxcZbisHGA1czixRqOjqi4qSEPXprbsfU4ZUNsZ4ziFAGaljMzSTZSi9_.0QSQvKxt rDNJ6RC1n7x4n0vSzfZHgWBA0UM7hfikUuBv4BmHCTOsDHh1csQLUz5hy.k6_GPt3.nGoeZaonwI kqMg1FGsIlf0dZQGczBN6J_Uf.Pa3TaZa43hJ3TvY9t_aGdHC4y6dAXR2plpOlP.cMftdrPTlWx. 3YTkJ0OQ59tjqzO7tdEPUO.IKRBgW7hrrtpXV8lJYFg4skc86c79stEi2eMRkl1BCFUe5wtlMHJf 1Nnm9ldPuPZU1pu.Btghf3GZI2U8dVL0ccvAuAoecwwE3dzJJuBNF94CngXTjqTo3pksfhb5kt04 KlxaMk0SptHSraoOkE1KF58qkJruNYvzP4Jt5tnbNSB8vUtcaTWkgPe59Y96B9b2aYd16eA0HU_n yx7WDqEBfEz2w5O1lnUUYpDHRVd8WBgsM_UizyFQgRMkQq7rVPN9jhz4SJ4k54yQrXy7XKPaysca 9qJVOdPmIGspE X-Sonic-MF: Received: from sonic.gate.mail.ne1.yahoo.com by sonic314.consmr.mail.gq1.yahoo.com with HTTP; Fri, 22 Apr 2022 04:18:42 +0000 Received: by hermes--canary-production-ne1-6855c48695-8fndv (Yahoo Inc. Hermes SMTP Server) with ESMTPA ID a024e799fcf82a1d9a56cdbcbe773ee9; Fri, 22 Apr 2022 04:18:38 +0000 (UTC) From: Mark Millard Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@freebsd.org Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.120.0.1.13\)) Subject: Re: Chasing OOM Issues - good sysctl metrics to use? Message-Id: <83A713B9-A973-4C97-ACD6-830DF6A50B76@yahoo.com> Date: Thu, 21 Apr 2022 21:18:37 -0700 To: pete@nomadlogic.org, freebsd-current X-Mailer: Apple Mail (2.3654.120.0.1.13) References: <83A713B9-A973-4C97-ACD6-830DF6A50B76.ref@yahoo.com> X-Rspamd-Queue-Id: 4Kl1PP2vmgz3n4s X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=yahoo.com header.s=s2048 header.b=VdoxvGOF; dmarc=pass (policy=reject) header.from=yahoo.com; spf=pass (mx1.freebsd.org: domain of marklmi@yahoo.com designates 98.137.69.84 as permitted sender) smtp.mailfrom=marklmi@yahoo.com X-Spamd-Result: default: False [-2.50 / 15.00]; TO_DN_SOME(0.00)[]; FREEMAIL_FROM(0.00)[yahoo.com]; MV_CASE(0.50)[]; R_SPF_ALLOW(-0.20)[+ptr:yahoo.com]; DKIM_TRACE(0.00)[yahoo.com:+]; RCPT_COUNT_TWO(0.00)[2]; DMARC_POLICY_ALLOW(-0.50)[yahoo.com,reject]; NEURAL_HAM_SHORT(-1.00)[-1.000]; FROM_EQ_ENVFROM(0.00)[]; RCVD_TLS_LAST(0.00)[]; MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[yahoo.com]; ASN(0.00)[asn:36647, ipnet:98.137.64.0/20, country:US]; MID_RHS_MATCH_FROM(0.00)[]; DWL_DNSWL_NONE(0.00)[yahoo.com:dkim]; SUBJECT_ENDS_QUESTION(1.00)[]; ARC_NA(0.00)[]; R_DKIM_ALLOW(-0.20)[yahoo.com:s=s2048]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; FROM_HAS_DN(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; MIME_GOOD(-0.10)[text/plain]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[98.137.69.84:from]; MLMMJ_DEST(0.00)[freebsd-current]; RCVD_COUNT_TWO(0.00)[2] X-ThisMailContainsUnwantedMimeParts: N Pete Wright wrote on Date: Thu, 21 Apr 2022 19:16:42 -0700 : > on my workstation running CURRENT (amd64/32g of ram) i've been running=20= > into a scenario where after 4 or 5 days of daily use I get an OOM = event=20 > and both chromium and firefox are killed. then in the next day or so=20= > the system will become very unresponsive in the morning when i unlock = my=20 > screensaver in the morning forcing a manual power cycle. >=20 > one thing i've noticed is growing swap usage but plenty of free and=20 > inactive memory as well as a GB or so of memory in the Laundry state=20= > according top. my understanding is that seeing swap usage grow over=20= > time is expected and doesn't necessarily indicate a problem. but what=20= > concerns me is the system locking up while seeing quite a bit of disk=20= > i/o (maybe from paging back in?). >=20 > in order to help chase this down i've setup the=20 > prometheus_sysctl_exporter(8) to send data to a local prometheus=20 > instance. the goal is to examine memory utilizaton over time to help=20= > detect any issues. so my question is this: >=20 > what OID's would be useful to help see to help diagnose weird memory=20= > issues like this? >=20 > i'm currently looking at: > sysctl_vm_domain_0_stats_laundry > sysctl_vm_domain_0_stats_active > sysctl_vm_domain_0_stats_free_count > sysctl_vm_domain_0_stats_inactive_pps >=20 >=20 > thanks in advance - and i'd be happy to share my data if anyone is=20 > interested :) Messages in the console out would be appropriate to report. Messages might also be available via the following at appropriate times: # dmesg -a . . . or: # more /var/log/messages . . . Generally messages from after the boot is complete are more relevant. Messages like the following are some examples that would be of interest: pid . . .(c++), jid . . ., uid . . ., was killed: failed to reclaim = memory pid . . .(c++), jid . . ., uid . . ., was killed: a thread waited too = long to allocate a page pid . . .(c++), jid . . ., uid . . ., was killed: out of swap space (That last is somewhat of a misnomer for the internal issue that leads to it.) I'm hoping you got message(s) of one or more of the above kinds. But others are also relevant: . . . kernel: swap_pager: out of swap space . . . kernel: swp_pager_getswapspace(7): failed . . . kernel: swap_pager: indefinite wait buffer: bufobj: . . ., blkno: = . . ., size: . . . (Those messages do not announce a process kill but give some evidence about context.) Some of the messages with part of the text matching actually identify somewhat different contexts --so each message type is relevant. There may be other types of messages that are relevant. The sequencing of the messages could be relevant. Do you have any swap partitions set up and in use? The details could be relevant. Do you have swap set up some other way than via swap partition use? No swap? If 1+ swap partitions are in use, things that suggest the speeds/latency characteristics of the I/O to the drive could be relevant. ZFS (so with ARC)? UFS? Both? The first block of lines from a top display could be relevant, particularly when it is clearly progressing towards having the problem. (After the problem is too late.) (I just picked top as a way to get a bunch of the information all together automatically.) These sorts of things might help folks help you. =3D=3D=3D Mark Millard marklmi at yahoo.com