From nobody Fri Apr 22 02:16:42 2022 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 6C27411D53D0 for ; Fri, 22 Apr 2022 02:16:53 +0000 (UTC) (envelope-from pete@nomadlogic.org) Received: from mail.nomadlogic.org (mail.nomadlogic.org [66.165.241.226]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "mail.nomadlogic.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4Kkyhh235tz3Jl0 for ; Fri, 22 Apr 2022 02:16:52 +0000 (UTC) (envelope-from pete@nomadlogic.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nomadlogic.org; s=04242021; t=1650593804; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=O3a9JFzHB+e8k7I0g4jNizcmOFjpxb+MRMCON7LL+Ss=; b=VkvDuH/SpG48OZUiZCAwmOxHOSUq5O+HkCbK5mozZJQhlhEjZACNgfHqNT0EIUdU+ZP+Lg MPyppO8BKV5GdjtVA4CZUbf7oi+6HJPL8Drfd8EntLPfdSGhXMhdEf6zusWF/OI/dg+2ee 9FnSWs2WKxvGcAPzOkbvsuYXT8/jBhM= Received: from [192.168.1.160] (cpe-24-24-168-214.socal.res.rr.com [24.24.168.214]) by mail.nomadlogic.org (OpenSMTPD) with ESMTPSA id bdf083d9 (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO) for ; Fri, 22 Apr 2022 02:16:43 +0000 (UTC) Message-ID: Date: Thu, 21 Apr 2022 19:16:42 -0700 List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@freebsd.org MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:91.0) Gecko/20100101 Thunderbird/91.8.1 Content-Language: en-US To: FreeBSD Current From: Pete Wright Subject: Chasing OOM Issues - good sysctl metrics to use? Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 4Kkyhh235tz3Jl0 X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=nomadlogic.org header.s=04242021 header.b="VkvDuH/S"; dmarc=pass (policy=quarantine) header.from=nomadlogic.org; spf=pass (mx1.freebsd.org: domain of pete@nomadlogic.org designates 66.165.241.226 as permitted sender) smtp.mailfrom=pete@nomadlogic.org X-Spamd-Result: default: False [-2.98 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; ARC_NA(0.00)[]; R_DKIM_ALLOW(-0.20)[nomadlogic.org:s=04242021]; MID_RHS_MATCH_FROM(0.00)[]; FROM_HAS_DN(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; R_SPF_ALLOW(-0.20)[+mx]; MIME_GOOD(-0.10)[text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-current@freebsd.org]; NEURAL_HAM_LONG(-1.00)[-1.000]; RCPT_COUNT_ONE(0.00)[1]; NEURAL_HAM_MEDIUM(-0.98)[-0.983]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[nomadlogic.org:+]; DMARC_POLICY_ALLOW(-0.50)[nomadlogic.org,quarantine]; NEURAL_HAM_SHORT(-1.00)[-0.999]; MLMMJ_DEST(0.00)[freebsd-current]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; SUBJECT_ENDS_QUESTION(1.00)[]; ASN(0.00)[asn:29802, ipnet:66.165.240.0/22, country:US]; RCVD_COUNT_TWO(0.00)[2]; RCVD_TLS_ALL(0.00)[] X-ThisMailContainsUnwantedMimeParts: N hello - on my workstation running CURRENT (amd64/32g of ram) i've been running into a scenario where after 4 or 5 days of daily use I get an OOM event and both chromium and firefox are killed.  then in the next day or so the system will become very unresponsive in the morning when i unlock my screensaver in the morning forcing a manual power cycle. one thing i've noticed is growing swap usage but plenty of free and inactive memory as well as a GB or so of memory in the Laundry state according top.  my understanding is that seeing swap usage grow over time is expected and doesn't necessarily indicate a problem.  but what concerns me is the system locking up while seeing quite a bit of disk i/o (maybe from paging back in?). in order to help chase this down i've setup the prometheus_sysctl_exporter(8) to send data to a local prometheus instance.  the goal is to examine memory utilizaton over time to help detect any issues. so my question is this: what OID's would be useful to help see to help diagnose weird memory issues like this? i'm currently looking at: sysctl_vm_domain_0_stats_laundry sysctl_vm_domain_0_stats_active sysctl_vm_domain_0_stats_free_count sysctl_vm_domain_0_stats_inactive_pps thanks in advance - and i'd be happy to share my data if anyone is interested :) -pete -- Pete Wright pete@nomadlogic.org @nomadlogicLA