Date: Mon, 12 May 2025 11:25:43 -0700 From: Pete Wright <pete@nomadlogic.org> To: Colin Percival <cperciva@tarsnap.com>, freebsd-cloud@FreeBSD.org Cc: Arthur Kiyanovski <akiyano@FreeBSD.org> Subject: Re: ena(4) tx timeout messages in dmesg Message-ID: <527aa929-4083-4935-8147-e59b6416c3bf@nomadlogic.org> In-Reply-To: <01000196c5b6fa5f-b8ed430e-23ca-47fd-9dd9-374a1de9c67c-000000@email.amazonses.com> References: <fec4cb4f-2a36-4a3d-bf02-539fd1a1273c@nomadlogic.org> <01000196c5b6fa5f-b8ed430e-23ca-47fd-9dd9-374a1de9c67c-000000@email.amazonses.com>
index | next in thread | previous in thread | raw e-mail
On 5/12/25 11:17, Colin Percival wrote: > [+ akiyano, maintainer of the ena(4) driver] > > On 5/12/25 11:04, Pete Wright wrote: >> hey there - i have an ec2 instance that i'm using as a nfs server and >> have noticed the following messages in my dmesg buffer: >> >> ena0: Found a Tx that wasn't completed on time, qid 2, index 593. 10 >> msecs have passed since last cleanup. Missing Tx timeout value 5000 >> msecs. >> ena0: Found a Tx that wasn't completed on time, qid 2, index 220. 1 >> msecs have passed since last cleanup. Missing Tx timeout value 5000 >> msecs. >> ena0: Found a Tx that wasn't completed on time, qid 3, index 240. 1 >> msecs have passed since last cleanup. Missing Tx timeout value 5000 >> msecs. >> ena0: Found a Tx that wasn't completed on time, qid 3, index 974. 1 >> msecs have passed since last cleanup. Missing Tx timeout value 5000 >> msecs. >> ena0: Found a Tx that wasn't completed on time, qid 2, index 730. 1 >> msecs have passed since last cleanup. Missing Tx timeout value 5000 >> msecs. >> ena0: Found a Tx that wasn't completed on time, qid 2, index 864. 10 >> msecs have passed since last cleanup. Missing Tx timeout value 5000 >> msecs. >> ena0: Found a Tx that wasn't completed on time, qid 3, index 998. 1 >> msecs have passed since last cleanup. Missing Tx timeout value 5000 >> msecs. >> >> the system is not overly loaded, but does have a steady %25 CPU usage >> and averages around 2MB/sec network throughput (the system serves a >> python virtual-environment to a cluster of data processing systems). >> >> The man page states: "Packet was pushed to the NIC but not sent within >> given time limit. It may be caused by hang of the IO queue." >> >> I was curious if anyone had any idea if these messages indicate a >> poorly tuned system, or are they just informational. Looking at the >> basics like mbuf's and other base metrics and the system looks OK from >> that perspective. > > I've heard that this can be caused by a thread being starved for CPU, > possibly > due to FreeBSD kernel scheduler issues, but that was on a far more heavily > loaded system. What instance type are you running on? > oh of course, forgot to provide useful info: # uname -ar FreeBSD airflow-nfs.q0.ringdna.net 14.2-RELEASE-p1 FreeBSD 14.2-RELEASE-p1 GENERIC amd64 Instance type: t3a.xlarge I also verified I have plenty of available "burstable credit" available since this is a t class system (current balance is steady at 2,300 credits). the exported filesystem resides on a dedicated zfs pool, and the dataset itself can reside fully in memory as such there is basically zero disk i/o happening while serving %99 reads from nfsd. the virtual-env is ~500MB thanks! -pete -- Pete Wright pete@nomadlogic.orghelp
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?527aa929-4083-4935-8147-e59b6416c3bf>
