FreeBSD Mail Archives

Date:      Thu, 6 Nov 2025 10:00:19 -0800
From:      Mark Millard <marklmi@yahoo.com>
To:        bob prohaska <fbsd@www.zefox.net>
Cc:        Ronald Klop <ronald-lists@klop.ws>, freebsd-current@freebsd.org, freebsd-arm@freebsd.org
Subject:   Re: Arm v7 RPi2 -current unresponsive to debugger escape during buildworld
Message-ID:  <05ADBC62-E111-42F9-ACF2-3A92F2781870@yahoo.com>
In-Reply-To: <aQzPBMjvuCoY-kY-@www.zefox.net>
References:  <aQyrBArxXq-JSaqu@www.zefox.net> <475995705.6919.1762440301455@localhost> <aQzPBMjvuCoY-kY-@www.zefox.net>

On Nov 6, 2025, at 08:38, bob prohaska <fbsd@www.zefox.net> wrote:

> On Thu, Nov 06, 2025 at 03:45:01PM +0100, Ronald Klop wrote:
>> Hi,
>> 
>> To me it sounds like your machine is overwhelmed by swapping.
>> 
>> Try -j1 buildworld.
> 
> In most cases of stoppage the swap use is low, 50 MB or sometimes less.
> Up to about 6-700MB the machines slow their progress, but keep going and
> there are no complaints on the console about swap taking too long or
> insufficient. If there's a connection to swap use, it isn't obvious.
> 
> It seems to be related more to hours of runtime than swap use. 
> 
> More to the point of my question, if the machine is swap-bound,
> shouldn't the debugger escape still work?

Are your descriptions of the lack of gaining control for use
of the serial console? Do you also have ssh or such? Do all
such see hangs as hung-up/crashed? Do you get notices about
loss of network connections to the RPi2 v1.1 in question?
Do any of those happen automatically? If so, the time
of such a message could put a bound on when the RPi2 v1.1
hang-up/deadlocked/crashed, the message about failing
communication having occurred after the problem starts on
the RPi2 v1.1.

I'll note that your prior reporting of the end-of-log
content gives evidence of things that completed, including
being flushed to the disk. But there likely was more that
was not flushed to the disk, some of which may have
otherwise completed. Also, what was actually active at the
time of the potential deadlock (or other form of crash) is
unlikely to show in the logs with such a known status.

The I/O tries to keep the file system media content from being
corrupted, but not necessarily that it is up to date. (Fully
attempting both leads to either a contradiction or horrible
performance. UFS has different tradeoffs than ZFS for such
issues but the same general goal applies to both. At least
that is how I'd summarize it.)

Knowing where the logs stop can give some idea what might
follow or have  been active, but it involves other analysis.

I do not know if tail -f reports buffered information vs.
only data that makes it to media. It might be that tail -f
in an ssh session on the/a log file might report closer
to the failure time, showing information that does not
make it to the media. That need not be the same as showing
the actual failure time: just possibly closer.

As for debugger use, there are thousands of processes.
If you mean gdb or lldb, there is no uniquely relevant
process to attach to and monitor that survives across all
the activity.

Are your kernel builds debug/invariants/witness builds?
Is world a debug build? (I do not mean just having symbols
and such as a debug build.) I wonder what the behavior would
be for avoiding the resource overhead involved in having and
using the debug code. (But, if it does fail, extracting
information is normally a problem.)

===
Mark Millard
marklmi at yahoo.com

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?05ADBC62-E111-42F9-ACF2-3A92F2781870>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation