Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 15 Aug 2018 18:51:12 -0700
From:      Mark Millard <marklmi@yahoo.com>
To:        bob prohaska <fbsd@www.zefox.net>
Cc:        Mark Johnston <markj@FreeBSD.org>, John Kennedy <warlock@phouka.net>, freebsd-arm <freebsd-arm@freebsd.org>
Subject:   Re: RPI3 swap experiments ["was killed: out of swap space" with: "v_free_count: 5439, v_inactive_count: 1"]
Message-ID:  <9EA5D75D-A03F-4B25-B65E-03E93DE30130@yahoo.com>
In-Reply-To: <20180815221728.GA59074@www.zefox.net>
References:  <20180808153800.GF26133@www.zefox.net> <20180808204841.GA19379@raichu> <2DC1A479-92A0-48E6-9245-3FF5CFD89DEF@yahoo.com> <20180809033735.GJ30738@phouka1.phouka.net> <20180809175802.GA32974@www.zefox.net> <20180812173248.GA81324@phouka1.phouka.net> <20180812224021.GA46372@www.zefox.net> <B81E53A9-459E-4489-883B-24175B87D049@yahoo.com> <20180813021226.GA46750@www.zefox.net> <0D8B9A29-DD95-4FA3-8F7D-4B85A3BB54D7@yahoo.com> <20180815221728.GA59074@www.zefox.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On 2018-Aug-15, at 3:17 PM, bob prohaska <fbsd at www.zefox.net> wrote:

> On Sun, Aug 12, 2018 at 08:36:01PM -0700, Mark Millard wrote:
> [snip]
>>=20
>> I'd keep multiplying by 10 until it works (or fails some
>> other way), then back off by smaller factors if you want
>> a narrower range to be known between failing and working
>> (or failing differently).
>>=20
> [snip]
>>=20
>> The factor of 10 rule makes the number of tests
>> logarithmic to find an sufficient upper bound (if
>> there is an upper bound). After that with high/low
>> bounds binary searching is a possibility.
>>=20
> Updated to r337688 with 2 GB of swap divided between USB and microSD.
> Using vm.pageout_oom_seq=3D1024, a -j4 buildworld failed with da0 =
errors=20
> reported, panic'd and rebooted. A -j3 buildworld ran to completion =
with=20
> several dozen "indefinite wait..." messages but no other complaints.
>=20
> IIRC, this swap configuration wouldn't even run -j2 to completion with
> vm.pageout_oom_seq=3D12, so the increase to 1024 clearly helps and the
> da0 errors suggest something else gives up if taxed to -j4.

You are likely subject to da0 failing sometimes for any -jN. But
4 buildworld form scratchs with -j1 might be less likely to have
a failure than one from scratch buildworld with -j4, depending on
why da0 fails.

> During the -j3 buildworld, at about the 26 MB point in the log file=20
> top reported ld.lld having "size" of 1037 MB, but only 60 MB of swap
> were in use. With only 1 GB of main memory, is that to be believed?

I've no clue if FreeBSD gets into the concepts of reserved space
vs. allocated space, some "space" being reserved but not allocated.
In such a context, some of the space that is not allocated but is
reserved might not need RAM allocated nor swap space allocated.

top's figure seems to trace back to struct vm_map's size field,
which is commented as /* virtual size */. The map entries are
described as forming a binary search tree and a doubly-linked
list.

For all I know a map entry might always span a power of 2 bytes
(for example) even when what is contained need not use all of
it. There might be some binning sizes such that the minimum
power of 2 that would hold the content need not be the power
of 2 used (in the example). That extra might be a form of
a reserved area in virtual space that is not allocated (yet).

If something like that is going on then top's SIZE will tend to
be bigger than what is allocated (summation of RAM and swap).

(The power of 2 criteria is just an illustration. I've no clue
of the actual criteria involved.)

> The backtrace emitted by the -j4 panic is somewhat longer than
> usual, it's in the file named console at=20
> http://www.zefox.net/~fbsd/rpi3/swaptests/r337688/1gbsd_1gbusb/
> in case it's of interest.

I greatly doubt that FreeBSD is designed to survive the
likes of:

. . .
(da0:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 00 30 42 b8 00 00 20 00=20
(da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
(da0:umass-sim0:0:0:0): Retrying command, 0 more tries remain
(da0:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 00 30 42 b8 00 00 20 00=20
(da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
(da0:umass-sim0:0:0:0): Error 5, Retries exhausted
swap_pager: I/O error - pageout failed; blkno 133211,size 8192, error 5
swap_pager: I/O error - pageout failed; blkno 133213,size 8192, error 5
swap_pager: I/O error - pageout failed; blkno 133215,size 131072, error =
5
g_vfs_done():da0d[READ(offset=3D11492274176, length=3D4096)]error =3D 5
g_vfs_done():da0d[READ(offset=3D17284657152, length=3D4096)]error =3D 5
g_vfs_done():da0d[READ(offset=3D37422759936, length=3D32768)]error =3D 5
g_vfs_done():da0a[WRITE(offset=3D805568512, length=3D32768)]error =3D 5
g_vfs_done():da0a[WRITE(offset=3D827752448, length=3D32768)]error =3D 5
g_vfs_done():da0a[READ(offset=3D827916288, length=3D32768)]error =3D 5
swap_pager: I/O error - pagein failed; blkno 3780,size 8192, error 5
swap_pager: I/O error - pagein failed; blkno 11127,size 4096, error 5
swap_pager: I/O error - pagein failed; blkno 12162,size 4096, error 5
swap_pager: I/O error - pagein failed; blkno 128272,size 4096, error 5
vm_fault: pager read error, pid 1 (init)
vm_fault: pager read error, pid 13373 (c++)
vm_fault: pager read error, pid 13733 (c++)
vm_fault: pager read error, pid 324 (devd)
swap_pager: I/O error - pageout failed; blkno 133207,size 16384, error 5
(da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 05 25 50 50 00 00 08 00=20
(da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
(da0:umass-sim0:0:0:0): Retrying command, 3 more tries remain
(da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 05 25 50 50 00 00 08 00=20
(da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
(da0:umass-sim0:0:0:0): Retrying command, 2 more tries remain
(da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 05 25 50 50 00 00 08 00=20
(da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
(da0:umass-sim0:0:0:0): Retrying command, 1 more tries remain
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 200056, size: 4096
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 3797, size: 4096
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 173323, size: =
36864
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 3767, size: 4096
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 3793, size: 4096
(da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 05 25 50 50 00 00 08 00=20
(da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
(da0:umass-sim0:0:0:0): Retrying command, 0 more tries remain
(da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 05 25 50 50 00 00 08 00=20
(da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
(da0:umass-sim0:0:0:0): Error 5, Retries exhausted
. . .

(There is a lot more, not quoted.)

As I understand it the messages with "(da0:umass-sim0:0:0:0)"
with retries exhausted are indicating that da0 simply failed
to work, even with FreeBSD retrying the operation several times.

As I understand, only the device reporting its own errors leads
to those kind of messages by FreeBSD. (If Warner cares to, he
should be able to correct any misimpression I might make here.
There are others around that could as well.)

So: then getting a panic such as . . .

panic: vm_page_assert_unbusied: page 0xfffffd002e60ec40 busy @ =
/usr/src/sys/vm/vm_object.c:736

may not be all that surprising.

Your disk (or its power or connections or some such that is required)
is not reliable overall: it is failing to operate correctly. (If I'm
correct above.)

If it is a connection problem, it could be on the rpi3/rpi2 side
of things. If no powered hub is used, the same could be true for
power.

Expect that, no matter what you do (even -j1), you will likely
see at least occasional failures in the environment that you
have. Pushing the I/O system harder over the same time period
would likely increase the observed failure rate over that
duration. But I doubt there being a positive threshold below
which there would be no failures for buildworld (on a time
scale for the whole build that you would be willing to wait
for).


=3D=3D=3D
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?9EA5D75D-A03F-4B25-B65E-03E93DE30130>