Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 9 Aug 2018 09:44:21 -0700
From:      Mark Millard <marklmi@yahoo.com>
To:        bob prohaska <fbsd@www.zefox.net>
Cc:        Mark Millard via freebsd-arm <freebsd-arm@freebsd.org>, Mark Johnston <markj@FreeBSD.org>
Subject:   Re: RPI3 swap experiments ["was killed: out of swap space" with: "v_free_count: 5439, v_inactive_count: 1"]
Message-ID:  <F61322C1-9B93-48B8-8FAB-269C256F5244@yahoo.com>
In-Reply-To: <20180809152152.GC68459@raichu>
References:  <20180801034511.GA96616@www.zefox.net> <201808010405.w7145RS6086730@donotpassgo.dyslexicfish.net> <6BFE7B77-A0E2-4FAF-9C68-81951D2F6627@yahoo.com> <20180802002841.GB99523@www.zefox.net> <20180802015135.GC99523@www.zefox.net> <EC74A5A6-0DF4-48EB-88DA-543FD70FEA07@yahoo.com> <20180806155837.GA6277@raichu> <20180808153800.GF26133@www.zefox.net> <20180808204841.GA19379@raichu> <20180809065648.GB30347@www.zefox.net> <20180809152152.GC68459@raichu>

next in thread | previous in thread | raw e-mail | index | archive | help
On 2018-Aug-9, at 8:21 AM, Mark Johnston <markj at FreeBSD.org> wrote:

> On Wed, Aug 08, 2018 at 11:56:48PM -0700, bob prohaska wrote:
>> On Wed, Aug 08, 2018 at 04:48:41PM -0400, Mark Johnston wrote:
>>> On Wed, Aug 08, 2018 at 08:38:00AM -0700, bob prohaska wrote:
>>>> The patched kernel ran longer than default but OOMA still halted =
buildworld around
>>>> 13 MB. That's considerably farther than a default build world have =
run but less than
>>>> observed when setting vm.pageout_oom_seq=3D120 alone. Log files are =
at
>>>> =
http://www.zefox.net/~fbsd/rpi3/swaptests/r337226M/1gbsdflash_1gbusbflash/=
batchqueue/
>>>>=20
>>>> Both changes are now in place and -j4 buildworld has been =
restarted.=20
>>>=20
>>> Looking through the gstat output, I'm seeing some pretty abysmal =
average
>>> write latencies for da0, the flash drive.  I also realized that my
>>> reference to r329882 lowering the pagedaemon sleep period was wrong =
-
>>> things have been this way for much longer than that.  Moreover, as =
you
>>> pointed out, bumping oom_seq to a much larger value wasn't quite
>>> sufficient.
>>>=20
>>> I'm curious as to what the worst case swap I/O latencies are in your
>>> test, since the average latencies reported in your logs are high =
enough
>>> to trigger OOM kills even with the increased oom_seq value.  When =
the
>>> current test finishes, could you try repeating it with this patch
>>> applied on top? =
https://people.freebsd.org/~markj/patches/slow_swap.diff
>>> That is, keep the non-default oom_seq setting and modification to
>>> VM_BATCHQUEUE_SIZE, and apply this patch on top.  It'll cause the =
kernel
>>> to print messages to the console under certain conditions, so a log =
of
>>> console output will be interesting.
>>=20
>> The run finished with a panic, I've collected the logs and terminal =
output at
>> =
http://www.zefox.net/~fbsd/rpi3/swaptests/r337226M/1gbsdflash_1gbusbflash/=
batchqueue/pageout120/slow_swap/
>>=20
>> There seems to be a considerable discrepancy between the wait times =
reported
>> by the patch and the wait times reported by gstat in the first couple =
of=20
>> occurrences. The fun begins at timestamp Wed Aug  8 21:26:03 PDT 2018 =
in
>> swapscript.log.=20
>=20
> The reports of "waited for swap buffer" are especially bad: during =
those
> periods, the laundry thread is blocked waiting for in-flight swap =
writes
> to finish before sending any more.  Because the system is generally
> quite starved for clean pages that it can reuse, it's relying on swap
> I/O to clean more.  If that fails, the system eventually has no choice
> but to start killing processes (where the time period corresponding to
> "eventually" is determined by vm.pageout_oom_seq).

Just for reference from=20

=
http://www.zefox.net/~fbsd/rpi3/swaptests/r337226M/1gbsdflash_1gbusbflash/=
batchqueue/pageout120/slow_swap/swapscript.log

is some context for two of the swap buffer messages that gives some
information on the timing of the messages and some other activity:

Wed Aug  8 22:01:14 PDT 2018
Device          1K-blocks     Used    Avail Capacity
/dev/da0b         1048576    23912  1024664     2%
/dev/mmcsd0s3b    1048576    23892  1024684     2%
Total             2097152    47804  2049348     2%
Aug  8 21:36:13 www syslogd: last message repeated 1 times
Aug  8 21:36:13 www kernel: waited 3s for async swap write
dT: 10.004s  w: 10.000s
 L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w    d/s   kBps   =
ms/d   %busy Name
    2     20      1      4   25.5     19   1216   41.4      0      0    =
0.0   29.3  mmcsd0
    3     29      1     11   10.8     28   1244   23.9      0      0    =
0.0   18.7  da0
    0      0      0      0    0.0      0      3   42.9      0      0    =
0.0    0.4  mmcsd0s2
    2     20      1      4   25.6     19   1213   41.5      0      0    =
0.0   29.4  mmcsd0s3
    0      0      0      0    0.0      0      3   43.0      0      0    =
0.0    0.4  ufs/rootfs
    2     20      1      4   25.6     19   1213   41.6      0      0    =
0.0   29.4  mmcsd0s3b
    2     20      1     11   10.9     19   1238   20.6      0      0    =
0.0   18.6  da0b
Wed Aug  8 22:01:24 PDT 2018
Device          1K-blocks     Used    Avail Capacity
/dev/da0b         1048576    37284  1011292     4%
/dev/mmcsd0s3b    1048576    36920  1011656     4%
Total             2097152    74204  2022948     4%
Aug  8 21:36:13 www syslogd: last message repeated 1 times
Aug  8 21:36:13 www kernel: waited 3s for async swap write
dT: 10.002s  w: 10.000s
 L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w    d/s   kBps   =
ms/d   %busy Name
    0      8      0      0    4.7      8    693   41.7      0      0    =
0.0   12.2  mmcsd0
   13     13      0      0    0.0     13    682  132.7      0      0    =
0.0   78.3  da0
    0      8      0      0    4.7      8    693   41.8      0      0    =
0.0   12.2  mmcsd0s3
    0      8      0      0    4.8      8    693   41.9      0      0    =
0.0   12.3  mmcsd0s3b
    5      7      0      0    0.0      7    676   79.5      0      0    =
0.0   40.1  da0b
Wed Aug  8 22:01:34 PDT 2018
Device          1K-blocks     Used    Avail Capacity
/dev/da0b         1048576    44680  1003896     4%
/dev/mmcsd0s3b    1048576    44132  1004444     4%
Total             2097152    88812  2008340     4%
Aug  8 21:36:13 www kernel: waited 3s for async swap write
Aug  8 22:01:30 www syslogd: last message repeated 1 times
Aug  8 22:01:41 www kernel: waited 10s for async swap write
Aug  8 22:01:41 www kernel: waited 6s for swap buffer
Aug  8 22:01:41 www kernel: waited 13s for async swap write
Aug  8 22:01:41 www kernel: waited 3s for swap buffer
Aug  8 22:01:41 www kernel: waited 12s for async swap write
Aug  8 22:01:41 www kernel: waited 10s for async swap write
Aug  8 22:01:41 www kernel: waited 3s for async swap write
dT: 10.001s  w: 10.000s
 L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w    d/s   kBps   =
ms/d   %busy Name
    0      2      1     26    3.4      1     17    3.8      0      0    =
0.0    0.9  mmcsd0
    0      4      3     61    3.8      2     70    5.7      0      0    =
0.0    1.7  da0
    0      2      1     26    3.5      1     17    3.9      0      0    =
0.0    0.9  mmcsd0s3
    0      1      0      0    0.0      1     17    4.0      0      0    =
0.0    0.4  mmcsd0s3a
    0      1      1     26    3.5      0      0    0.0      0      0    =
0.0    0.5  mmcsd0s3b
    0      2      2     54    3.8      0      0    0.0      0      0    =
0.0    0.9  da0b
    0      2      0      7    4.3      1     70    5.8      0      0    =
0.0    0.8  da0d
Wed Aug  8 22:01:51 PDT 2018
Device          1K-blocks     Used    Avail Capacity
/dev/da0b         1048576    46156  1002420     4%
/dev/mmcsd0s3b    1048576    45480  1003096     4%
Total             2097152    91636  2005516     4%
Aug  8 22:01:41 www kernel: waited 10s for async swap write
Aug  8 22:01:41 www kernel: waited 3s for async swap write
dT: 10.002s  w: 10.000s
 L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w    d/s   kBps   =
ms/d   %busy Name
    0      8      7    251    3.4      0      8    2.6      0      0    =
0.0    2.5  mmcsd0
    0      9      7    186    4.2      2     77    7.4      0      0    =
0.0    3.6  da0
    0      8      7    251    3.5      0      8    2.6      0      0    =
0.0    2.6  mmcsd0s3
    0      0      0      0    0.0      0      8    2.7      0      0    =
0.0    0.1  mmcsd0s3a
    0      7      7    251    3.5      0      0    0.0      0      0    =
0.0    2.5  mmcsd0s3b
    0      7      7    184    4.2      0      0    0.0      0      0    =
0.0    2.7  da0b
    0      2      0      2    4.6      2     76    7.5      0      0    =
0.0    0.9  da0d



> Based on these latencies, I think the system is behaving more or less =
as
> expected from the VM's perspective.  I do think the default oom_seq =
value
> is too low and will get that addressed in 12.0.

=3D=3D=3D
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?F61322C1-9B93-48B8-8FAB-269C256F5244>