From nobody Mon May 12 18:04:50 2025 X-Original-To: freebsd-cloud@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4Zx6vh1p8qz5vc69 for ; Mon, 12 May 2025 18:05:04 +0000 (UTC) (envelope-from pete@nomadlogic.org) Received: from mail.nomadlogic.org (mail.nomadlogic.org [66.165.241.226]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4Zx6vf5QT3z3vhs for ; Mon, 12 May 2025 18:05:02 +0000 (UTC) (envelope-from pete@nomadlogic.org) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=nomadlogic.org header.s=04242021 header.b=IgeSwbbF; spf=pass (mx1.freebsd.org: domain of pete@nomadlogic.org designates 66.165.241.226 as permitted sender) smtp.mailfrom=pete@nomadlogic.org; dmarc=pass (policy=quarantine) header.from=nomadlogic.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nomadlogic.org; s=04242021; t=1747073070; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=tLZ3JLkdO7Lr18p7w8tQy2wlnX8krJVQrieGySCKLWQ=; b=IgeSwbbFAehslWLsSLk0gxqOT81SrwN5flfhcCKq1HiUmBU6O/loKiiXd/uhNyo0BOI802 a/ssAdlbICnJEh79oZDvg6JnnZUcqEKOF7fuzQW91yCE5alBKqJNoH0rSNavYKMnCzPVFK jHLIsXuttPZG8XR0++znCSRjc8DIv9Y= Received: from [192.168.1.182] (47-154-20-141.fdr01.snmn.ca.ip.frontiernet.net [47.154.20.141]) by mail.nomadlogic.org (OpenSMTPD) with ESMTPSA id 72a02da8 (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO) for ; Mon, 12 May 2025 18:04:30 +0000 (UTC) Message-ID: Date: Mon, 12 May 2025 11:04:50 -0700 List-Id: FreeBSD on cloud platforms (EC2, GCE, Azure, etc.) List-Archive: https://lists.freebsd.org/archives/freebsd-cloud List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-cloud@FreeBSD.org MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Content-Language: en-US To: freebsd-cloud@FreeBSD.org From: Pete Wright Subject: ena(4) tx timeout messages in dmesg Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 4Zx6vf5QT3z3vhs X-Spamd-Bar: --- X-Spamd-Result: default: False [-3.99 / 15.00]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_SHORT(-0.99)[-0.994]; DMARC_POLICY_ALLOW(-0.50)[nomadlogic.org,quarantine]; R_DKIM_ALLOW(-0.20)[nomadlogic.org:s=04242021]; R_SPF_ALLOW(-0.20)[+mx]; MIME_GOOD(-0.10)[text/plain]; MIME_TRACE(0.00)[0:+]; RCVD_VIA_SMTP_AUTH(0.00)[]; ARC_NA(0.00)[]; RCVD_COUNT_ONE(0.00)[1]; ASN(0.00)[asn:29802, ipnet:66.165.240.0/22, country:US]; MLMMJ_DEST(0.00)[freebsd-cloud@FreeBSD.org]; RCPT_COUNT_ONE(0.00)[1]; FROM_EQ_ENVFROM(0.00)[]; FROM_HAS_DN(0.00)[]; MID_RHS_MATCH_FROM(0.00)[]; RCVD_TLS_ALL(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; TO_DN_NONE(0.00)[]; PREVIOUSLY_DELIVERED(0.00)[freebsd-cloud@freebsd.org]; DKIM_TRACE(0.00)[nomadlogic.org:+] hey there - i have an ec2 instance that i'm using as a nfs server and have noticed the following messages in my dmesg buffer: ena0: Found a Tx that wasn't completed on time, qid 2, index 593. 10 msecs have passed since last cleanup. Missing Tx timeout value 5000 msecs. ena0: Found a Tx that wasn't completed on time, qid 2, index 220. 1 msecs have passed since last cleanup. Missing Tx timeout value 5000 msecs. ena0: Found a Tx that wasn't completed on time, qid 3, index 240. 1 msecs have passed since last cleanup. Missing Tx timeout value 5000 msecs. ena0: Found a Tx that wasn't completed on time, qid 3, index 974. 1 msecs have passed since last cleanup. Missing Tx timeout value 5000 msecs. ena0: Found a Tx that wasn't completed on time, qid 2, index 730. 1 msecs have passed since last cleanup. Missing Tx timeout value 5000 msecs. ena0: Found a Tx that wasn't completed on time, qid 2, index 864. 10 msecs have passed since last cleanup. Missing Tx timeout value 5000 msecs. ena0: Found a Tx that wasn't completed on time, qid 3, index 998. 1 msecs have passed since last cleanup. Missing Tx timeout value 5000 msecs. the system is not overly loaded, but does have a steady %25 CPU usage and averages around 2MB/sec network throughput (the system serves a python virtual-environment to a cluster of data processing systems). The man page states: "Packet was pushed to the NIC but not sent within given time limit. It may be caused by hang of the IO queue." I was curious if anyone had any idea if these messages indicate a poorly tuned system, or are they just informational. Looking at the basics like mbuf's and other base metrics and the system looks OK from that perspective. thanks! -pete -- Pete Wright pete@nomadlogic.org From nobody Mon May 12 18:25:43 2025 X-Original-To: freebsd-cloud@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4Zx7Mh18Ywz5vcys for ; Mon, 12 May 2025 18:25:52 +0000 (UTC) (envelope-from pete@nomadlogic.org) Received: from mail.nomadlogic.org (mail.nomadlogic.org [66.165.241.226]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4Zx7Mg1bfRz45pv; Mon, 12 May 2025 18:25:51 +0000 (UTC) (envelope-from pete@nomadlogic.org) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=nomadlogic.org header.s=04242021 header.b=Esmqjarl; spf=pass (mx1.freebsd.org: domain of pete@nomadlogic.org designates 66.165.241.226 as permitted sender) smtp.mailfrom=pete@nomadlogic.org; dmarc=pass (policy=quarantine) header.from=nomadlogic.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nomadlogic.org; s=04242021; t=1747074325; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=e9tWEpMG+COpVblfO/MiBWmCJbi2FnYHa4/AeKx+j1c=; b=EsmqjarlR5EeQuYYXo43CZQV0XBAQbcJKsPJo7kFLDYCFt0LqO1pBM299La9e4ZnOOyvy3 szununpE0wPlOCHcDtbWDpmhYuxqEtyWO3Re4cnsblUNqcqYQwJMUN6OsIPlk0YOaxf3X5 XoCYeATmDATk/WNS3Erx9C6alQJ0ZM4= Received: from [192.168.1.182] (47-154-20-141.fdr01.snmn.ca.ip.frontiernet.net [47.154.20.141]) by mail.nomadlogic.org (OpenSMTPD) with ESMTPSA id 6ed6108c (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO); Mon, 12 May 2025 18:25:25 +0000 (UTC) Message-ID: <527aa929-4083-4935-8147-e59b6416c3bf@nomadlogic.org> Date: Mon, 12 May 2025 11:25:43 -0700 List-Id: FreeBSD on cloud platforms (EC2, GCE, Azure, etc.) List-Archive: https://lists.freebsd.org/archives/freebsd-cloud List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-cloud@FreeBSD.org MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: ena(4) tx timeout messages in dmesg To: Colin Percival , freebsd-cloud@FreeBSD.org Cc: Arthur Kiyanovski References: <01000196c5b6fa5f-b8ed430e-23ca-47fd-9dd9-374a1de9c67c-000000@email.amazonses.com> Content-Language: en-US From: Pete Wright In-Reply-To: <01000196c5b6fa5f-b8ed430e-23ca-47fd-9dd9-374a1de9c67c-000000@email.amazonses.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 4Zx7Mg1bfRz45pv X-Spamd-Bar: --- X-Spamd-Result: default: False [-3.97 / 15.00]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_SHORT(-0.97)[-0.970]; DMARC_POLICY_ALLOW(-0.50)[nomadlogic.org,quarantine]; R_SPF_ALLOW(-0.20)[+mx]; R_DKIM_ALLOW(-0.20)[nomadlogic.org:s=04242021]; MIME_GOOD(-0.10)[text/plain]; ARC_NA(0.00)[]; TO_DN_SOME(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:29802, ipnet:66.165.240.0/22, country:US]; RCVD_COUNT_ONE(0.00)[1]; RCVD_VIA_SMTP_AUTH(0.00)[]; RCVD_TLS_ALL(0.00)[]; MID_RHS_MATCH_FROM(0.00)[]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; FROM_EQ_ENVFROM(0.00)[]; MLMMJ_DEST(0.00)[freebsd-cloud@FreeBSD.org]; TO_MATCH_ENVRCPT_SOME(0.00)[]; DKIM_TRACE(0.00)[nomadlogic.org:+] On 5/12/25 11:17, Colin Percival wrote: > [+ akiyano, maintainer of the ena(4) driver] > > On 5/12/25 11:04, Pete Wright wrote: >> hey there - i have an ec2 instance that i'm using as a nfs server and >> have noticed the following messages in my dmesg buffer: >> >> ena0: Found a Tx that wasn't completed on time, qid 2, index 593. 10 >> msecs have passed since last cleanup. Missing Tx timeout value 5000 >> msecs. >> ena0: Found a Tx that wasn't completed on time, qid 2, index 220. 1 >> msecs have passed since last cleanup. Missing Tx timeout value 5000 >> msecs. >> ena0: Found a Tx that wasn't completed on time, qid 3, index 240. 1 >> msecs have passed since last cleanup. Missing Tx timeout value 5000 >> msecs. >> ena0: Found a Tx that wasn't completed on time, qid 3, index 974. 1 >> msecs have passed since last cleanup. Missing Tx timeout value 5000 >> msecs. >> ena0: Found a Tx that wasn't completed on time, qid 2, index 730. 1 >> msecs have passed since last cleanup. Missing Tx timeout value 5000 >> msecs. >> ena0: Found a Tx that wasn't completed on time, qid 2, index 864. 10 >> msecs have passed since last cleanup. Missing Tx timeout value 5000 >> msecs. >> ena0: Found a Tx that wasn't completed on time, qid 3, index 998. 1 >> msecs have passed since last cleanup. Missing Tx timeout value 5000 >> msecs. >> >> the system is not overly loaded, but does have a steady %25 CPU usage >> and averages around 2MB/sec network throughput (the system serves a >> python virtual-environment to a cluster of data processing systems). >> >> The man page states: "Packet was pushed to the NIC but not sent within >> given time limit.  It may be caused by hang of the IO queue." >> >> I was curious if anyone had any idea if these messages indicate a >> poorly tuned system, or are they just informational.  Looking at the >> basics like mbuf's and other base metrics and the system looks OK from >> that perspective. > > I've heard that this can be caused by a thread being starved for CPU, > possibly > due to FreeBSD kernel scheduler issues, but that was on a far more heavily > loaded system.  What instance type are you running on? > oh of course, forgot to provide useful info: # uname -ar FreeBSD airflow-nfs.q0.ringdna.net 14.2-RELEASE-p1 FreeBSD 14.2-RELEASE-p1 GENERIC amd64 Instance type: t3a.xlarge I also verified I have plenty of available "burstable credit" available since this is a t class system (current balance is steady at 2,300 credits). the exported filesystem resides on a dedicated zfs pool, and the dataset itself can reside fully in memory as such there is basically zero disk i/o happening while serving %99 reads from nfsd. the virtual-env is ~500MB thanks! -pete -- Pete Wright pete@nomadlogic.org From nobody Mon May 12 19:29:57 2025 X-Original-To: freebsd-cloud@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4Zx8np6vqyz5vjvw for ; Mon, 12 May 2025 19:30:06 +0000 (UTC) (envelope-from pete@nomadlogic.org) Received: from mail.nomadlogic.org (mail.nomadlogic.org [66.165.241.226]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4Zx8nn4rVsz3llc; Mon, 12 May 2025 19:30:05 +0000 (UTC) (envelope-from pete@nomadlogic.org) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=nomadlogic.org header.s=04242021 header.b=MIQQFz0t; spf=pass (mx1.freebsd.org: domain of pete@nomadlogic.org designates 66.165.241.226 as permitted sender) smtp.mailfrom=pete@nomadlogic.org; dmarc=pass (policy=quarantine) header.from=nomadlogic.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nomadlogic.org; s=04242021; t=1747078179; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=dSMTwCp9T/Z2GAJLuhowNxiXVxnXhYg5K+IVKdoa2iQ=; b=MIQQFz0tBj4JVsjiOOqVx4ID7bKGUcgxlf+kfeoEQn0xAuSdYMftseKLISNviFl7shSnRu MYpXNWlx2/gCxHviEoDvMrgNNHlNCrDI+q0CZjTpYtcpFbDrMHWBHKuHZzBABssmXQSCPu oi2lwQhcrM46qeOT7eS9cFVW9+JG9HQ= Received: from [192.168.1.182] (47-154-20-141.fdr01.snmn.ca.ip.frontiernet.net [47.154.20.141]) by mail.nomadlogic.org (OpenSMTPD) with ESMTPSA id 769f0f8c (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO); Mon, 12 May 2025 19:29:38 +0000 (UTC) Message-ID: Date: Mon, 12 May 2025 12:29:57 -0700 List-Id: FreeBSD on cloud platforms (EC2, GCE, Azure, etc.) List-Archive: https://lists.freebsd.org/archives/freebsd-cloud List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-cloud@FreeBSD.org MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: ena(4) tx timeout messages in dmesg To: Colin Percival , freebsd-cloud@FreeBSD.org Cc: Arthur Kiyanovski References: <01000196c5b6fa5f-b8ed430e-23ca-47fd-9dd9-374a1de9c67c-000000@email.amazonses.com> <527aa929-4083-4935-8147-e59b6416c3bf@nomadlogic.org> <01000196c5db82dc-cfa5bf54-9758-4125-bdca-f1794b76ac9f-000000@email.amazonses.com> Content-Language: en-US From: Pete Wright In-Reply-To: <01000196c5db82dc-cfa5bf54-9758-4125-bdca-f1794b76ac9f-000000@email.amazonses.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 4Zx8nn4rVsz3llc X-Spamd-Bar: --- X-Spamd-Result: default: False [-4.00 / 15.00]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_SHORT(-1.00)[-0.996]; DMARC_POLICY_ALLOW(-0.50)[nomadlogic.org,quarantine]; R_SPF_ALLOW(-0.20)[+mx]; R_DKIM_ALLOW(-0.20)[nomadlogic.org:s=04242021]; MIME_GOOD(-0.10)[text/plain]; ARC_NA(0.00)[]; TO_DN_SOME(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:29802, ipnet:66.165.240.0/22, country:US]; RCVD_COUNT_ONE(0.00)[1]; RCVD_VIA_SMTP_AUTH(0.00)[]; RCVD_TLS_ALL(0.00)[]; MID_RHS_MATCH_FROM(0.00)[]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; FROM_EQ_ENVFROM(0.00)[]; MLMMJ_DEST(0.00)[freebsd-cloud@FreeBSD.org]; TO_MATCH_ENVRCPT_SOME(0.00)[]; DKIM_TRACE(0.00)[nomadlogic.org:+] On 5/12/25 11:56, Colin Percival wrote: > On 5/12/25 11:25, Pete Wright wrote: >> On 5/12/25 11:17, Colin Percival wrote: >>> On 5/12/25 11:04, Pete Wright wrote: >>>> hey there - i have an ec2 instance that i'm using as a nfs server >>>> and have noticed the following messages in my dmesg buffer: >>>> [...] >>>> ena0: Found a Tx that wasn't completed on time, qid 3, index 998. 1 >>>> msecs have passed since last cleanup. Missing Tx timeout value 5000 >>>> msecs. >>>> >>> I've heard that this can be caused by a thread being starved for CPU, >>> possibly >>> due to FreeBSD kernel scheduler issues, but that was on a far more >>> heavily >>> loaded system.  What instance type are you running on? >> >> oh of course, forgot to provide useful info: >> >> # uname -ar >> FreeBSD airflow-nfs.q0.ringdna.net 14.2-RELEASE-p1 FreeBSD 14.2- >> RELEASE-p1 GENERIC amd64 >> >> Instance type: >> t3a.xlarge >> >> I also verified I have plenty of available "burstable credit" >> available since this is a t class system (current balance is steady at >> 2,300 credits). > > Ah, this won't necessarily help you -- T family instances are on shared > hardware so even if you have burstable credits it's possible that you'll > be unlucky with "noisy neighbours" and the sibling instances will all want > CPU at the same time as you.  But I think there's probably something else > going on as well. > oh that's a good point, since this is a pre-prod system that is less of a concern as we want to limit spend when possible. i'll be spinning up production systems in the following week or so that will be on a "c" class system, i'll keep an eye out to see if see similar messages in that environment. -pete -- Pete Wright pete@nomadlogic.org From nobody Tue May 13 14:43:07 2025 X-Original-To: freebsd-cloud@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4ZxfNV32JDz5vRm6 for ; Tue, 13 May 2025 14:43:22 +0000 (UTC) (envelope-from pete@nomadlogic.org) Received: from mail.nomadlogic.org (mail.nomadlogic.org [66.165.241.226]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4ZxfNT1Tlsz3gjj for ; Tue, 13 May 2025 14:43:21 +0000 (UTC) (envelope-from pete@nomadlogic.org) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=nomadlogic.org header.s=04242021 header.b=YEB4pKEt; spf=pass (mx1.freebsd.org: domain of pete@nomadlogic.org designates 66.165.241.226 as permitted sender) smtp.mailfrom=pete@nomadlogic.org; dmarc=pass (policy=quarantine) header.from=nomadlogic.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nomadlogic.org; s=04242021; t=1747147374; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=7gm+Rg2JnY+dQlhjCylCcfzvvRDyQI2isuV+81ZY8p4=; b=YEB4pKEtEDyL0pmu849OKcz3kP6cxn3gViVgamOwoZ189C5/BfrvAXfbpuZZjyI4hDpGIT RpT/vsGEvC1jNo3uwp+pQEPRsiOYB5G1xXhiPtE2BQoZfwuUg0B4dIuoaVeHK1dp+vF8OD onuH4x3Zn7BQBD8iokYvRprtSb0cwIY= Received: from [192.168.1.182] (47-154-20-141.fdr01.snmn.ca.ip.frontiernet.net [47.154.20.141]) by mail.nomadlogic.org (OpenSMTPD) with ESMTPSA id 38881baf (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO); Tue, 13 May 2025 14:42:53 +0000 (UTC) Message-ID: Date: Tue, 13 May 2025 07:43:07 -0700 List-Id: FreeBSD on cloud platforms (EC2, GCE, Azure, etc.) List-Archive: https://lists.freebsd.org/archives/freebsd-cloud List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-cloud@FreeBSD.org MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: ena(4) tx timeout messages in dmesg To: "Kiyanovski, Arthur" , Colin Percival , "freebsd-cloud@freebsd.org" Cc: "Arinzon, David" References: <01000196c5b6fa5f-b8ed430e-23ca-47fd-9dd9-374a1de9c67c-000000@email.amazonses.com> <527aa929-4083-4935-8147-e59b6416c3bf@nomadlogic.org> <01000196c5db82dc-cfa5bf54-9758-4125-bdca-f1794b76ac9f-000000@email.amazonses.com> <1c8e7c62067845ab9cd5fca6198a78e8@amazon.com> Content-Language: en-US From: Pete Wright In-Reply-To: <1c8e7c62067845ab9cd5fca6198a78e8@amazon.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 4ZxfNT1Tlsz3gjj X-Spamd-Bar: -- X-Spamd-Result: default: False [-2.12 / 15.00]; NEURAL_HAM_SHORT(-0.92)[-0.917]; NEURAL_HAM_LONG(-0.71)[-0.706]; NEURAL_SPAM_MEDIUM(0.51)[0.508]; DMARC_POLICY_ALLOW(-0.50)[nomadlogic.org,quarantine]; R_SPF_ALLOW(-0.20)[+mx]; R_DKIM_ALLOW(-0.20)[nomadlogic.org:s=04242021]; MIME_GOOD(-0.10)[text/plain]; ASN(0.00)[asn:29802, ipnet:66.165.240.0/22, country:US]; ARC_NA(0.00)[]; TO_DN_SOME(0.00)[]; RCVD_COUNT_ONE(0.00)[1]; MIME_TRACE(0.00)[0:+]; TO_DN_EQ_ADDR_SOME(0.00)[]; RCVD_TLS_ALL(0.00)[]; RCPT_COUNT_THREE(0.00)[4]; FROM_EQ_ENVFROM(0.00)[]; FROM_HAS_DN(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; TO_MATCH_ENVRCPT_SOME(0.00)[]; MID_RHS_MATCH_FROM(0.00)[]; MLMMJ_DEST(0.00)[freebsd-cloud@freebsd.org]; DKIM_TRACE(0.00)[nomadlogic.org:+] On 5/12/25 19:52, Kiyanovski, Arthur wrote: >> ---------- Forwarded message --------- >> From: Pete Wright >> Date: Mon, 12 May 2025 at 12:30 >> Subject: Re: ena(4) tx timeout messages in dmesg >> To: Colin Percival , >> Cc: Arthur Kiyanovski >> >> >> >> >> On 5/12/25 11:56, Colin Percival wrote: >>> On 5/12/25 11:25, Pete Wright wrote: >>>> On 5/12/25 11:17, Colin Percival wrote: >>>>> On 5/12/25 11:04, Pete Wright wrote: >>>>>> hey there - i have an ec2 instance that i'm using as a nfs server >>>>>> and have noticed the following messages in my dmesg buffer: >>>>>> [...] >>>>>> ena0: Found a Tx that wasn't completed on time, qid 3, index 998. 1 >>>>>> msecs have passed since last cleanup. Missing Tx timeout value 5000 >>>>>> msecs. >>>>>> >>>>> I've heard that this can be caused by a thread being starved for >>>>> CPU, possibly due to FreeBSD kernel scheduler issues, but that was >>>>> on a far more heavily loaded system. What instance type are you >>>>> running on? >>>> >>>> oh of course, forgot to provide useful info: >>>> >>>> # uname -ar >>>> FreeBSD airflow-nfs.q0.ringdna.net 14.2-RELEASE-p1 FreeBSD 14.2- >>>> RELEASE-p1 GENERIC amd64 >>>> >>>> Instance type: >>>> t3a.xlarge >>>> >>>> I also verified I have plenty of available "burstable credit" >>>> available since this is a t class system (current balance is steady >>>> at >>>> 2,300 credits). >>> >>> Ah, this won't necessarily help you -- T family instances are on >>> shared hardware so even if you have burstable credits it's possible >>> that you'll be unlucky with "noisy neighbours" and the sibling >>> instances will all want CPU at the same time as you. But I think >>> there's probably something else going on as well. >>> >> >> >> oh that's a good point, since this is a pre-prod system that is less of a concern >> as we want to limit spend when possible. i'll be spinning up production >> systems in the following week or so that will be on a "c" >> class system, i'll keep an eye out to see if see similar messages in that >> environment. >> >> -pete >> >> -- >> Pete Wright >> pete@nomadlogic.org > > HI Colin, Pete, > > Your analysis regarding CPU being occupied is the classic explanation for this kind > prints. > > The prints are consistent with cpu not being available to the interrupt > handler to run. > Although you say you have burstable credits available, the fact that you are using > T instance types does make you more susceptible to such issues. > > Also when you say you have 25% CPU usage, how did you check that? > Are you using tools that give you an average over some time? so you may > have 75% of the time 0 cpu usage and 25% of the time 100% cpu usage. > > As you already suggested, the first thing we would like to eliminate is the T instance > Type. > If all works - great! > > If not you may want to look into the spreading of interrupts over the different cpus > using https://github.com/amzn/amzn-drivers/tree/master/kernel/fbsd/ena#io-irq-affinity > And also make sure that the cpu heavy processes you have, are run on different cpus than > ones you handle the interrupts on. > > Hope this helps, > Arthur > > > thanks for the context Arthur, I'll take a look at that sysctl knob. as i said the box is only serving a python virtual environment to a pool of ec2 compute nodes, and the dataset resides in memory. so nothing too crazy. the load does have spikes but they are pretty brief and rarely over %70. i'm collecting metrics via telegraph, and also observe load via the usual suspects like top, systat etc. it sounds like ena(4) seems to be particularly sensitive to cpu spikes though - at least with this vm configuration. if i continue to see these messages in dmesg i'll test out distributing IRQ's, otherwise i think i can chalk this up to a noisy neighbor or something similar. thanks! -pete -- Pete Wright pete@nomadlogic.org From nobody Tue May 13 17:22:18 2025 X-Original-To: freebsd-cloud@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4Zxjw52CZJz5vgSH for ; Tue, 13 May 2025 17:22:29 +0000 (UTC) (envelope-from pete@nomadlogic.org) Received: from mail.nomadlogic.org (mail.nomadlogic.org [66.165.241.226]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4Zxjw41VZzz414s for ; Tue, 13 May 2025 17:22:27 +0000 (UTC) (envelope-from pete@nomadlogic.org) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=nomadlogic.org header.s=04242021 header.b=14mY54m0; spf=pass (mx1.freebsd.org: domain of pete@nomadlogic.org designates 66.165.241.226 as permitted sender) smtp.mailfrom=pete@nomadlogic.org; dmarc=pass (policy=quarantine) header.from=nomadlogic.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nomadlogic.org; s=04242021; t=1747156918; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Yzk5p3wO3q9xSYeUOJTxarlvFYY6fKOT66dtNPEDGHk=; b=14mY54m0a3+qSL2gLc4CFCQT0OVCobZ/VEntT/YEpPWbgDHACzuO9khZU4e/cSozfEVLM0 UoDvuLASF8tQDTOhqe0g9zseg6LDFOHpuvlussHhdzfAdXkOId6hE5PyqZPUMCujKDEixO 1vdsufWz61pdQpECJjQoMj02aLk/MvU= Received: from [192.168.1.182] (47-154-20-141.fdr01.snmn.ca.ip.frontiernet.net [47.154.20.141]) by mail.nomadlogic.org (OpenSMTPD) with ESMTPSA id 5e08f01b (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO) for ; Tue, 13 May 2025 17:21:58 +0000 (UTC) Message-ID: <2fe4e22b-acde-4a43-9359-bd6a4e028a37@nomadlogic.org> Date: Tue, 13 May 2025 10:22:18 -0700 List-Id: FreeBSD on cloud platforms (EC2, GCE, Azure, etc.) List-Archive: https://lists.freebsd.org/archives/freebsd-cloud List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-cloud@FreeBSD.org MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: ena(4) tx timeout messages in dmesg From: Pete Wright To: freebsd-cloud@FreeBSD.org References: Content-Language: en-US In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 4Zxjw41VZzz414s X-Spamd-Bar: - X-Spamd-Result: default: False [-1.35 / 15.00]; NEURAL_HAM_SHORT(-1.00)[-0.995]; NEURAL_SPAM_LONG(0.97)[0.971]; DMARC_POLICY_ALLOW(-0.50)[nomadlogic.org,quarantine]; NEURAL_HAM_MEDIUM(-0.33)[-0.330]; R_DKIM_ALLOW(-0.20)[nomadlogic.org:s=04242021]; R_SPF_ALLOW(-0.20)[+mx]; MIME_GOOD(-0.10)[text/plain]; RCVD_VIA_SMTP_AUTH(0.00)[]; RCPT_COUNT_ONE(0.00)[1]; RCVD_COUNT_ONE(0.00)[1]; ASN(0.00)[asn:29802, ipnet:66.165.240.0/22, country:US]; MIME_TRACE(0.00)[0:+]; RCVD_TLS_ALL(0.00)[]; MLMMJ_DEST(0.00)[freebsd-cloud@FreeBSD.org]; ARC_NA(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; FROM_HAS_DN(0.00)[]; MID_RHS_MATCH_FROM(0.00)[]; TO_DN_NONE(0.00)[]; PREVIOUSLY_DELIVERED(0.00)[freebsd-cloud@freebsd.org]; TO_MATCH_ENVRCPT_ALL(0.00)[]; DKIM_TRACE(0.00)[nomadlogic.org:+] On 5/12/25 11:04, Pete Wright wrote: > hey there - i have an ec2 instance that i'm using as a nfs server and > have noticed the following messages in my dmesg buffer: > > ena0: Found a Tx that wasn't completed on time, qid 2, index 593. 10 > msecs have passed since last cleanup. Missing Tx timeout value 5000 msecs. > ena0: Found a Tx that wasn't completed on time, qid 2, index 220. 1 > msecs have passed since last cleanup. Missing Tx timeout value 5000 msecs. > ena0: Found a Tx that wasn't completed on time, qid 3, index 240. 1 > msecs have passed since last cleanup. Missing Tx timeout value 5000 msecs. > ena0: Found a Tx that wasn't completed on time, qid 3, index 974. 1 > msecs have passed since last cleanup. Missing Tx timeout value 5000 msecs. > ena0: Found a Tx that wasn't completed on time, qid 2, index 730. 1 > msecs have passed since last cleanup. Missing Tx timeout value 5000 msecs. > ena0: Found a Tx that wasn't completed on time, qid 2, index 864. 10 > msecs have passed since last cleanup. Missing Tx timeout value 5000 msecs. > ena0: Found a Tx that wasn't completed on time, qid 3, index 998. 1 > msecs have passed since last cleanup. Missing Tx timeout value 5000 msecs. > > So I've found an interesting pattern, the above messages get printed to /var/log/messages and the dmesg buffer when i "su" to root apparently: May 9 19:19:23 airflow-nfs su[66523]: ec2-user to root on /dev/pts/3 May 9 19:19:23 airflow-nfs kernel: Found a Tx that wasn't completed on time, qid 2, index 593. 10 msecs have passed since last cleanup. Missing Tx timeout value 5000 msecs. May 9 19:19:23 airflow-nfs kernel: Found a Tx that wasn't completed on time, qid 2, index 220. 1 msecs have passed since last cleanup. Missing Tx timeout value 5000 msecs. M May 12 17:55:25 airflow-nfs su[29272]: ec2-user to root on /dev/pts/0 May 12 17:55:25 airflow-nfs kernel: Found a Tx that wasn't completed on time, qid 3, index 998. 1 msecs have passed since last cleanup. Missing Tx timeout value 5000 msecs. May 12 17:55:25 airflow-nfs kernel: Found a Tx that wasn't completed on time, qid 1, index 975. 1 msecs have passed since last cleanup. Missing Tx timeout value 5000 msecs. May 12 17:55:25 airflow-nfs kernel: Found a Tx that wasn't completed on time, qid 1, index 428. 1 msecs have passed since last cleanup. Missing Tx timeout value 5000 msecs. May 13 17:17:14 airflow-nfs su[16099]: ec2-user to root on /dev/pts/0 May 13 17:17:14 airflow-nfs kernel: Found a Tx that wasn't completed on time, qid 1, index 289. 1 msecs have passed since last cleanup. Missing Tx timeout value 5000 msecs. May 13 17:17:14 airflow-nfs kernel: Found a Tx that wasn't completed on time, qid 1, index 159. 1 msecs have passed since last cleanup. Missing Tx timeout value 5000 msecs. I have no idea what that means, but certainly feels like an interesting data-point. i'm ssh'ing as the ec2-user, then "su -" to become root and as you can see from the timestamps something triggers those log events. i'm not seeing any other occurances of these log messages outside of su'ing too. this is a very vanilla system, not krb auth or other network interactions should happen when i become root. -pete -- Pete Wright pete@nomadlogic.org From nobody Tue May 13 20:20:56 2025 X-Original-To: freebsd-cloud@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4ZxntS0CD7z5vvcX for ; Tue, 13 May 2025 20:21:20 +0000 (UTC) (envelope-from dan@langille.org) Received: from fout-b4-smtp.messagingengine.com (fout-b4-smtp.messagingengine.com [202.12.124.147]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4ZxntR2Klmz3jpl for ; Tue, 13 May 2025 20:21:19 +0000 (UTC) (envelope-from dan@langille.org) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=langille.org header.s=fm3 header.b=V3vQPxqd; dkim=pass header.d=messagingengine.com header.s=fm3 header.b=Y42op1bB; spf=pass (mx1.freebsd.org: domain of dan@langille.org designates 202.12.124.147 as permitted sender) smtp.mailfrom=dan@langille.org; dmarc=pass (policy=none) header.from=langille.org Received: from phl-compute-01.internal (phl-compute-01.phl.internal [10.202.2.41]) by mailfout.stl.internal (Postfix) with ESMTP id D8BC411400B0 for ; Tue, 13 May 2025 16:21:17 -0400 (EDT) Received: from phl-imap-18 ([10.202.2.89]) by phl-compute-01.internal (MEProxy); Tue, 13 May 2025 16:21:17 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=langille.org; h= cc:content-transfer-encoding:content-type:content-type:date:date :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to; s=fm3; t=1747167677; x=1747254077; bh=tKpyAfezFBaNoB7gf0+YYGrK2ZTQVZsWqD0aboU+78c=; b= V3vQPxqduYzgdzks0ZNHCjvAbbKss2H/einb+XdaFRfE4gkDWD2cHEGB4njJrc3V 91KiKp9HR/fC+V2bF8OWtcqkGaQddgS513+jcRi/Ek9vUyqk+UcXxJYATHzqjmGC ro+hvlQ9hexytfsJ8gc4hVQXaRz4tZ6MEjnd0017ekRuI6fwZGPsq74692wmT3s+ +krSU07J6XaUE+RYiBmp+KObaQ2zlpl6R9WsySppeFO/QSNgh5UYTB3D7FLzc1GW NJi4MjsItkMHyoAy65A2xGIKEujnXkeSty29a4MOwMwqU66gDFZmJ5AB6iRQeD1a d3osEVTrAbh1Ny+9a47iVA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm3; t=1747167677; x=1747254077; bh=t KpyAfezFBaNoB7gf0+YYGrK2ZTQVZsWqD0aboU+78c=; b=Y42op1bBsuKjph4Ja S1ZKpGMJSiGL/Gd5VFTx8rcrwxuMoRnf1rvz3e53IYYAaLPde2nZiUpJpX0y91TT Rj8yGlvKDeMLHXLU+0AJgeS7ALfoiSCiIu3FUU8ZBG3IEUwEAYSWA6d6jVH62q5m GXonMSSUlWGC/xtlE3Nu9XxatmSPQLA3PLC/Y6fFNL0hKtuh7KzM2e+8EJVvU2hH EMRy9w3q5Ox00lGa/fewDMYUmzxrfUf3Aw132NMwqgdlqjQyPQ9O/nGzsjwhKRtL C8qbPfsqNppbYLTT5u/ajN59fHJt1yDgcZxcwxetR/V2Nqnb6oVbtKccoa8ptQDj /5smQ== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefvddrtddtgdeftdehtdehucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdggtfgfnhhsuhgsshgtrhhisggv pdfurfetoffkrfgpnffqhgenuceurghilhhouhhtmecufedttdenucesvcftvggtihhpih gvnhhtshculddquddttddmnecujfgurhepofggfffhvffkjghfufgtgfesthejredtredt tdenucfhrhhomhepfdffrghnucfnrghnghhilhhlvgdfuceouggrnheslhgrnhhgihhllh gvrdhorhhgqeenucggtffrrghtthgvrhhnpedtveegjeeihfffteduteekvdfgveehffdt gefgvddtgeehfeejheejtdfgueeugeenucevlhhushhtvghrufhiiigvpedtnecurfgrrh grmhepmhgrihhlfhhrohhmpegurghnsehlrghnghhilhhlvgdrohhrghdpnhgspghrtghp thhtohepuddpmhhouggvpehsmhhtphhouhhtpdhrtghpthhtohepfhhrvggvsghsugdqtg hlohhuugesfhhrvggvsghsugdrohhrgh X-ME-Proxy: Feedback-ID: ifbf9424e:Fastmail Received: by mailuser.phl.internal (Postfix, from userid 501) id 7074A15C0069; Tue, 13 May 2025 16:21:17 -0400 (EDT) X-Mailer: MessagingEngine.com Webmail Interface List-Id: FreeBSD on cloud platforms (EC2, GCE, Azure, etc.) List-Archive: https://lists.freebsd.org/archives/freebsd-cloud List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-cloud@FreeBSD.org MIME-Version: 1.0 X-ThreadId: Tf2ec06965819b1c3 Date: Tue, 13 May 2025 16:20:56 -0400 From: "Dan Langille" To: "Application Certification Support via freebsd-cloud" Message-Id: <707702db-2eb9-475b-9170-bed740efd0c2@app.fastmail.com> In-Reply-To: <2fe4e22b-acde-4a43-9359-bd6a4e028a37@nomadlogic.org> References: <2fe4e22b-acde-4a43-9359-bd6a4e028a37@nomadlogic.org> Subject: Re: ena(4) tx timeout messages in dmesg Content-Type: text/plain Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 4ZxntR2Klmz3jpl X-Spamd-Bar: --- X-Spamd-Result: default: False [-3.36 / 15.00]; NEURAL_HAM_SHORT(-0.81)[-0.811]; NEURAL_HAM_MEDIUM(-0.78)[-0.778]; NEURAL_HAM_LONG(-0.69)[-0.685]; DMARC_POLICY_ALLOW(-0.50)[langille.org,none]; R_DKIM_ALLOW(-0.20)[langille.org:s=fm3,messagingengine.com:s=fm3]; R_SPF_ALLOW(-0.20)[+ip4:202.12.124.128/27]; MIME_GOOD(-0.10)[text/plain]; RCVD_IN_DNSWL_LOW(-0.10)[202.12.124.147:from]; XM_UA_NO_VERSION(0.01)[]; DWL_DNSWL_NONE(0.00)[messagingengine.com:dkim]; RCPT_COUNT_ONE(0.00)[1]; ARC_NA(0.00)[]; FREEFALL_USER(0.00)[dan]; MIME_TRACE(0.00)[0:+]; TO_DN_ALL(0.00)[]; MLMMJ_DEST(0.00)[freebsd-cloud@freebsd.org]; FROM_EQ_ENVFROM(0.00)[]; FROM_HAS_DN(0.00)[]; RCVD_TLS_LAST(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; RCVD_COUNT_THREE(0.00)[3]; PREVIOUSLY_DELIVERED(0.00)[freebsd-cloud@freebsd.org]; DKIM_TRACE(0.00)[langille.org:+,messagingengine.com:+] On Tue, May 13, 2025, at 1:22 PM, Pete Wright wrote: > On 5/12/25 11:04, Pete Wright wrote: >> hey there - i have an ec2 instance that i'm using as a nfs server and >> have noticed the following messages in my dmesg buffer: >> >> ena0: Found a Tx that wasn't completed on time, qid 2, index 593. 10 >> msecs have passed since last cleanup. Missing Tx timeout value 5000 msecs. >> ena0: Found a Tx that wasn't completed on time, qid 2, index 220. 1 >> msecs have passed since last cleanup. Missing Tx timeout value 5000 msecs. >> ena0: Found a Tx that wasn't completed on time, qid 3, index 240. 1 >> msecs have passed since last cleanup. Missing Tx timeout value 5000 msecs. >> ena0: Found a Tx that wasn't completed on time, qid 3, index 974. 1 >> msecs have passed since last cleanup. Missing Tx timeout value 5000 msecs. >> ena0: Found a Tx that wasn't completed on time, qid 2, index 730. 1 >> msecs have passed since last cleanup. Missing Tx timeout value 5000 msecs. >> ena0: Found a Tx that wasn't completed on time, qid 2, index 864. 10 >> msecs have passed since last cleanup. Missing Tx timeout value 5000 msecs. >> ena0: Found a Tx that wasn't completed on time, qid 3, index 998. 1 >> msecs have passed since last cleanup. Missing Tx timeout value 5000 msecs. >> >> > > > So I've found an interesting pattern, the above messages get printed to > /var/log/messages and the dmesg buffer when i "su" to root apparently: > > May 9 19:19:23 airflow-nfs su[66523]: ec2-user to root on /dev/pts/3 > May 9 19:19:23 airflow-nfs kernel: Found a Tx that wasn't completed on > time, qid 2, index 593. 10 msecs have passed since last cleanup. Missing > Tx timeout value 5000 msecs. > May 9 19:19:23 airflow-nfs kernel: Found a Tx that wasn't completed on > time, qid 2, index 220. 1 msecs have passed since last cleanup. Missing > Tx timeout value 5000 msecs. > M > > > May 12 17:55:25 airflow-nfs su[29272]: ec2-user to root on /dev/pts/0 > May 12 17:55:25 airflow-nfs kernel: Found a Tx that wasn't completed on > time, qid 3, index 998. 1 msecs have passed since last cleanup. Missing > Tx timeout value 5000 msecs. > May 12 17:55:25 airflow-nfs kernel: Found a Tx that wasn't completed on > time, qid 1, index 975. 1 msecs have passed since last cleanup. Missing > Tx timeout value 5000 msecs. > May 12 17:55:25 airflow-nfs kernel: Found a Tx that wasn't completed on > time, qid 1, index 428. 1 msecs have passed since last cleanup. Missing > Tx timeout value 5000 msecs. > > > May 13 17:17:14 airflow-nfs su[16099]: ec2-user to root on /dev/pts/0 > May 13 17:17:14 airflow-nfs kernel: Found a Tx that wasn't completed on > time, qid 1, index 289. 1 msecs have passed since last cleanup. Missing > Tx timeout value 5000 msecs. > May 13 17:17:14 airflow-nfs kernel: Found a Tx that wasn't completed on > time, qid 1, index 159. 1 msecs have passed since last cleanup. Missing > Tx timeout value 5000 msecs. > > > I have no idea what that means, but certainly feels like an interesting > data-point. i'm ssh'ing as the ec2-user, then "su -" to become root and > as you can see from the timestamps something triggers those log events. > i'm not seeing any other occurances of these log messages outside of > su'ing too. this is a very vanilla system, not krb auth or other > network interactions should happen when i become root. I get them too: May 9 21:20:50 aws-1 kernel: ena0: Found a Tx that wasn't completed on time, qid 0, index 105. 1 msecs have passed since last cleanup. Missing Tx timeout value 5000 msecs. -- Dan Langille dan@langille.org From nobody Tue May 13 20:26:22 2025 X-Original-To: freebsd-cloud@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4Zxp0Y1QDZz5vvmZ for ; Tue, 13 May 2025 20:26:37 +0000 (UTC) (envelope-from pete@nomadlogic.org) Received: from mail.nomadlogic.org (mail.nomadlogic.org [66.165.241.226]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4Zxp0W6yygz3nkv for ; Tue, 13 May 2025 20:26:35 +0000 (UTC) (envelope-from pete@nomadlogic.org) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=nomadlogic.org header.s=04242021 header.b=CjEb2uLw; spf=pass (mx1.freebsd.org: domain of pete@nomadlogic.org designates 66.165.241.226 as permitted sender) smtp.mailfrom=pete@nomadlogic.org; dmarc=pass (policy=quarantine) header.from=nomadlogic.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nomadlogic.org; s=04242021; t=1747167964; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ga2jdT62R7DpzYUHoObcb7oKScMbIBicg8A0JjH9zpM=; b=CjEb2uLwcpdosRzmDiitq4TftSVb8t03LkVMcvK9iB2BFVihy2VJcN7+zY1u5ScO2J1/vg 0DWYFJzS0PfTSIViOCPbNb68SfKP14mCW++cL5jHOc+G3TfRHgfMwXYasHqk0DTy5AekNn wvCsz42jKQp4RMyR+3ohrIMkNxNvIrA= Received: from [192.168.1.182] (47-154-20-141.fdr01.snmn.ca.ip.frontiernet.net [47.154.20.141]) by mail.nomadlogic.org (OpenSMTPD) with ESMTPSA id 2cd5c6f5 (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO); Tue, 13 May 2025 20:26:02 +0000 (UTC) Message-ID: <8bc8c246-52bc-4fad-81d3-54f777893754@nomadlogic.org> Date: Tue, 13 May 2025 13:26:22 -0700 List-Id: FreeBSD on cloud platforms (EC2, GCE, Azure, etc.) List-Archive: https://lists.freebsd.org/archives/freebsd-cloud List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-cloud@FreeBSD.org MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: ena(4) tx timeout messages in dmesg To: Colin Percival , freebsd-cloud@FreeBSD.org References: <2fe4e22b-acde-4a43-9359-bd6a4e028a37@nomadlogic.org> <01000196cb23eca0-d4b771c7-f4a9-4406-bc20-4f8b7dff09d3-000000@email.amazonses.com> Content-Language: en-US From: Pete Wright In-Reply-To: <01000196cb23eca0-d4b771c7-f4a9-4406-bc20-4f8b7dff09d3-000000@email.amazonses.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 4Zxp0W6yygz3nkv X-Spamd-Bar: --- X-Spamd-Result: default: False [-3.87 / 15.00]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_MEDIUM(-0.99)[-0.993]; NEURAL_HAM_SHORT(-0.87)[-0.872]; DMARC_POLICY_ALLOW(-0.50)[nomadlogic.org,quarantine]; R_SPF_ALLOW(-0.20)[+mx]; R_DKIM_ALLOW(-0.20)[nomadlogic.org:s=04242021]; MIME_GOOD(-0.10)[text/plain]; ARC_NA(0.00)[]; ASN(0.00)[asn:29802, ipnet:66.165.240.0/22, country:US]; RCVD_TLS_ALL(0.00)[]; MIME_TRACE(0.00)[0:+]; RCVD_COUNT_ONE(0.00)[1]; TO_DN_SOME(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; MID_RHS_MATCH_FROM(0.00)[]; MLMMJ_DEST(0.00)[freebsd-cloud@FreeBSD.org]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_TWO(0.00)[2]; FROM_EQ_ENVFROM(0.00)[]; TO_MATCH_ENVRCPT_SOME(0.00)[]; DKIM_TRACE(0.00)[nomadlogic.org:+] On 5/13/25 12:34, Colin Percival wrote: > On 5/13/25 10:22, Pete Wright wrote: >> So I've found an interesting pattern, the above messages get printed >> to /var/ log/messages and the dmesg buffer when i "su" to root >> apparently: >> >> May  9 19:19:23 airflow-nfs su[66523]: ec2-user to root on /dev/pts/3 >> May  9 19:19:23 airflow-nfs kernel: Found a Tx that wasn't completed >> on time, qid 2, index 593. 10 msecs have passed since last cleanup. >> Missing Tx timeout value 5000 msecs. >> May  9 19:19:23 airflow-nfs kernel: Found a Tx that wasn't completed >> on time, qid 2, index 220. 1 msecs have passed since last cleanup. >> Missing Tx timeout value 5000 msecs. >> [...] >> >> I have no idea what that means, but certainly feels like an >> interesting data- point.  i'm ssh'ing as the ec2-user, then "su -" to >> become root and as you can see from the timestamps something triggers >> those log events. i'm not seeing any other occurances of these log >> messages outside of su'ing too.  this is a very vanilla system, not >> krb auth or other network interactions should happen when i become root. > > Ooh, very interesting, and points to something I had wondered about > earlier. > There should be a line 'hw.broken_txfifo="1"' in /boot/loader.conf; can you > try removing that and see if the problem goes away?  (In fact, it's a > sysctl > so you can flip it on and off without taking the system down.) > > If the system reproducibly prints that warning with broken_txfifo=1 and > does > not print the warning with broken_txfifo=0, we have the culprit.  And I can > just remove that from EC2 images; it's a workaround for an old emulation > bug > which *should* be long since fixed in all EC2 instance types. > oh interesting! cool i've toggled that sysctl knob: # sysctl hw.broken_txfifo=0 hw.broken_txfifo: 1 -> 0 # i did an initial test and it looks good so far, i'll let it soak for the rest of the day today and check-in tomorrow. thanks Colin! -pete -- Pete Wright pete@nomadlogic.org