Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 11 Jun 2026 11:57:07 -0700
From:      Mark Millard <marklmi@yahoo.com>
To:        bob prohaska <fbsd@www.zefox.net>
Cc:        freebsd-current@freebsd.org
Subject:   Re: Buildworld finishes despite swap exhaustion
Message-ID:  <cedb9383-0b9f-4ad9-9953-e426d78f0483@yahoo.com>
In-Reply-To: <airZdW54ZsQ8tjPR@www.zefox.net>
References:  <aieNCJnCU3QfyJDV@www.zefox.net> <040df279-5f61-4f4f-ae4a-79bd44797b53@yahoo.com> <airZdW54ZsQ8tjPR@www.zefox.net>

index | next in thread | previous in thread | raw e-mail

On 6/11/26 08:51, bob prohaska wrote:
> On Tue, Jun 09, 2026 at 08:22:02AM -0700, Mark Millard wrote:
>> On 6/8/26 20:48, bob prohaska wrote:
>>> Lately a Pi2B running buildworld reported an
>>> exhaustion of swap, but buildworld kept running
>>> and seemingly finished successfully.
>>>
>>> The report came on the serial console, I didn't
>>> find anything in the buildworld log. 
>>>
>>> This seems a very great improvement. Swap exhaustion
>>> differs from other sorts of failure, in that one can 
>>> simply re-try the job with some hope of success when 
>>> the workload is lighter.
>>>
>>> Am I interpreting this correctly?
>>
>> [Because the actual messages are not reported, I'm making some
>> assumptions about the exact messages that you got.]
>>
>>
>> Remember vm.pageout_oom_seq ?
>>
> 
> Yes, /boot/loader.conf contains:
> vm.pageout_oom_seq="4096"
> vm.pfault_oom_attempts="3"
> #vm.pfault_oom_attempts="120"
> vm.pfault_oom_wait="20"
> 
> I'll admit to not remembering how 4096 was chosen....
> probably just a wild guess.
> 
>> The larger that value used, the longer the system operates with the
>> amount of free RAM below the target threshold: in other words, it makes
>> more tries at getting to the threshold before giving up and starting to
>> kill processes to get the free RAM.
>>
>> Running out of swap of itself just means that SWAP can not be used to
>> gain free RAM when such is not essential. RAM+SWAP can still be
>> (marginally) sufficient over such a time if no memory allocations
>> actually fail. If sufficient RAM/SWAP ends up being freed before
>> vm.pageout_oom_seq related kills happen, no overall failure happens.
>>
>>
>> As for the messages as I understand them:
>>
>> kernel: swap_pager: out of swap space
>>
>> does not report a failure, just a limiting condition.
>>
>> By contrast:
>>
>> kernel: swp_pager_getswapspace(2): failed
>>
>> reports a failure: the swap space allocation was necessary. It normally
>> nleads to the likes of:
>>
>> kernel: pid ??? (???), jid ???, uid ???, was killed: failed to reclaim
>> memory
> 
> A more recent incident reported in /var/log/messages:
> 
> Jun  4 12:34:39 www kernel: swap_pager: out of swap space
> Jun  4 12:34:39 www kernel: swp_pager_getswapspace(12): failed
> 
> but wasn't followed  by a "...was killed..." message.

Interesting. I've not had that combination as far as I know. Now I know
it is possible. Thanks.

> 
> Eventually there appeared what look like repeated disk errors, ending with:
> 
> Jun 11 02:04:59 www kernel: (da0:umass-sim0:0:0:0): Info: 0
> Jun 11 02:04:59 www kernel: (da0:umass-sim0:0:0:0): Retrying command (per sense 
> data)
> Jun 11 02:04:59 www kernel: (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 04 63 3
> 4 50 00 00 18 00 
> Jun 11 02:04:59 www kernel: (da0:umass-sim0:0:0:0): CAM status: SCSI Status Erro
> r
> Jun 11 02:04:59 www kernel: (da0:umass-sim0:0:0:0): SCSI status: Check Condition
> Jun 11 02:04:59 www kernel: (da0:umass-sim0:0:0:0): SCSI sense: MEDIUM ERROR asc
> :10,0 (ID CRC or ECC error)

The above looks like reporting of a drive problem. Getting to be time
for a replacement?

> Jun 11 02:04:59 www kernel: (da0:umass-sim0:0:0:0): Info: 0
> Jun 11 02:04:59 www kernel: (da0:umass-sim0:0:0:0): Retrying command (per sense data)
> 
> which ended in a debugger prompt on the console. 
> 
> There was considerable network
> activity around the same time 
> which resembled an ssh attack.

I'd guess that such was not likely to contribute to a false "MEDIUM
ERROR" with "(ID CRC or ECC error)".

> 
> The machine rebooted without incident, buildworld has been resumed with -j3.
> 
> If it happens again I'll save a backtrace if it'll be of interest.
> 



-- 
===
Mark Millard
marklmi at yahoo.com


home | help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?cedb9383-0b9f-4ad9-9953-e426d78f0483>