Date: Thu, 6 Nov 2025 10:00:19 -0800 From: Mark Millard <marklmi@yahoo.com> To: bob prohaska <fbsd@www.zefox.net> Cc: Ronald Klop <ronald-lists@klop.ws>, freebsd-current@freebsd.org, freebsd-arm@freebsd.org Subject: Re: Arm v7 RPi2 -current unresponsive to debugger escape during buildworld Message-ID: <05ADBC62-E111-42F9-ACF2-3A92F2781870@yahoo.com> In-Reply-To: <aQzPBMjvuCoY-kY-@www.zefox.net> References: <aQyrBArxXq-JSaqu@www.zefox.net> <475995705.6919.1762440301455@localhost> <aQzPBMjvuCoY-kY-@www.zefox.net>
next in thread | previous in thread | raw e-mail | index | archive | help
On Nov 6, 2025, at 08:38, bob prohaska <fbsd@www.zefox.net> wrote: > On Thu, Nov 06, 2025 at 03:45:01PM +0100, Ronald Klop wrote: >> Hi, >> >> To me it sounds like your machine is overwhelmed by swapping. >> >> Try -j1 buildworld. > > In most cases of stoppage the swap use is low, 50 MB or sometimes less. > Up to about 6-700MB the machines slow their progress, but keep going and > there are no complaints on the console about swap taking too long or > insufficient. If there's a connection to swap use, it isn't obvious. > > It seems to be related more to hours of runtime than swap use. > > More to the point of my question, if the machine is swap-bound, > shouldn't the debugger escape still work? Are your descriptions of the lack of gaining control for use of the serial console? Do you also have ssh or such? Do all such see hangs as hung-up/crashed? Do you get notices about loss of network connections to the RPi2 v1.1 in question? Do any of those happen automatically? If so, the time of such a message could put a bound on when the RPi2 v1.1 hang-up/deadlocked/crashed, the message about failing communication having occurred after the problem starts on the RPi2 v1.1. I'll note that your prior reporting of the end-of-log content gives evidence of things that completed, including being flushed to the disk. But there likely was more that was not flushed to the disk, some of which may have otherwise completed. Also, what was actually active at the time of the potential deadlock (or other form of crash) is unlikely to show in the logs with such a known status. The I/O tries to keep the file system media content from being corrupted, but not necessarily that it is up to date. (Fully attempting both leads to either a contradiction or horrible performance. UFS has different tradeoffs than ZFS for such issues but the same general goal applies to both. At least that is how I'd summarize it.) Knowing where the logs stop can give some idea what might follow or have been active, but it involves other analysis. I do not know if tail -f reports buffered information vs. only data that makes it to media. It might be that tail -f in an ssh session on the/a log file might report closer to the failure time, showing information that does not make it to the media. That need not be the same as showing the actual failure time: just possibly closer. As for debugger use, there are thousands of processes. If you mean gdb or lldb, there is no uniquely relevant process to attach to and monitor that survives across all the activity. Are your kernel builds debug/invariants/witness builds? Is world a debug build? (I do not mean just having symbols and such as a debug build.) I wonder what the behavior would be for avoiding the resource overhead involved in having and using the debug code. (But, if it does fail, extracting information is normally a problem.) === Mark Millard marklmi at yahoo.com
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?05ADBC62-E111-42F9-ACF2-3A92F2781870>
