Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 7 Mar 2018 14:39:13 +0400
From:      Roman Bogorodskiy <novel@FreeBSD.org>
To:        "Danilo G. Baio" <dbaio@FreeBSD.org>
Cc:        "Rodney W. Grimes" <freebsd-rwg@pdx.rh.CN85.dnsmgr.net>, Trond Endrest?l <Trond.Endrestol@fagskolen.gjovik.no>, FreeBSD current <freebsd-current@freebsd.org>, Kurt Jaeger <lists@opsec.eu>
Subject:   Re: Strange ARC/Swap/CPU on yesterday's -CURRENT
Message-ID:  <20180307103911.GA72239@kloomba>
In-Reply-To: <20180306221554.uyshbzbboai62rdf@dx240.localdomain>
References:  <20180306173455.oacyqlbib4sbafqd@ler-imac.lerctr.org> <201803061816.w26IGaW5050053@pdx.rh.CN85.dnsmgr.net> <20180306193645.vv3ogqrhauivf2tr@ler-imac.lerctr.org> <20180306221554.uyshbzbboai62rdf@dx240.localdomain>

next in thread | previous in thread | raw e-mail | index | archive | help

--2fHTh5uZTiUOsy+g
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

  Danilo G. Baio wrote:

> On Tue, Mar 06, 2018 at 01:36:45PM -0600, Larry Rosenman wrote:
> > On Tue, Mar 06, 2018 at 10:16:36AM -0800, Rodney W. Grimes wrote:
> > > > On Tue, Mar 06, 2018 at 08:40:10AM -0800, Rodney W. Grimes wrote:
> > > > > > On Mon, 5 Mar 2018 14:39-0600, Larry Rosenman wrote:
> > > > > >=20
> > > > > > > Upgraded to:
> > > > > > >=20
> > > > > > > FreeBSD borg.lerctr.org 12.0-CURRENT FreeBSD 12.0-CURRENT #11=
 r330385: Sun Mar  4 12:48:52 CST 2018     root@borg.lerctr.org:/usr/obj/us=
r/src/amd64.amd64/sys/VT-LER  amd64
> > > > > > > +1200060 1200060
> > > > > > >=20
> > > > > > > Yesterday, and I'm seeing really strange slowness, ARC use, a=
nd SWAP use and swapping.
> > > > > > >=20
> > > > > > > See http://www.lerctr.org/~ler/FreeBSD/Swapuse.png
> > > > > >=20
> > > > > > I see these symptoms on stable/11. One of my servers has 32 GiB=
 of=20
> > > > > > RAM. After a reboot all is well. ARC starts to fill up, and I s=
till=20
> > > > > > have more than half of the memory available for user processes.
> > > > > >=20
> > > > > > After running the periodic jobs at night, the amount of wired m=
emory=20
> > > > > > goes sky high. /etc/periodic/weekly/310.locate is a particular =
nasty=20
> > > > > > one.
> > > > >=20
> > > > > I would like to find out if this is the same person I have
> > > > > reporting this problem from another source, or if this is
> > > > > a confirmation of a bug I was helping someone else with.
> > > > >=20
> > > > > Have you been in contact with Michael Dexter about this
> > > > > issue, or any other forum/mailing list/etc? =20
> > > > Just IRC/Slack, with no response.
> > > > >=20
> > > > > If not then we have at least 2 reports of this unbound
> > > > > wired memory growth, if so hopefully someone here can
> > > > > take you further in the debug than we have been able
> > > > > to get.
> > > > What can I provide?  The system is still in this state as the full =
backup is slow.
> > >=20
> > > One place to look is to see if this is the recently fixed:
> > > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D222288
> > > g_bio leak.
> > >=20
> > > vmstat -z | egrep 'ITEM|g_bio|UMA'
> > >=20
> > > would be a good first look
> > >=20
> > borg.lerctr.org /home/ler $ vmstat -z | egrep 'ITEM|g_bio|UMA'
> > ITEM                   SIZE  LIMIT     USED     FREE      REQ FAIL SLEEP
> > UMA Kegs:               280,      0,     346,       5,     560,   0,   0
> > UMA Zones:             1928,      0,     363,       1,     577,   0,   0
> > UMA Slabs:              112,      0,25384098,  977762,102033225,   0,  =
 0
> > UMA Hash:               256,      0,      59,      16,     105,   0,   0
> > g_bio:                  384,      0,      33,    1627,542482056,   0,  =
 0
> > borg.lerctr.org /home/ler $
> > > > > > Limiting the ARC to, say, 16 GiB, has no effect of the high amo=
unt of=20
> > > > > > wired memory. After a few more days, the kernel consumes virtua=
lly all=20
> > > > > > memory, forcing processes in and out of the swap device.
> > > > >=20
> > > > > Our experience as well.
> > > > >=20
> > > > > ...
> > > > >=20
> > > > > Thanks,
> > > > > Rod Grimes                                                 rgrime=
s@freebsd.org
> > > > Larry Rosenman                     http://www.lerctr.org/~ler
> > >=20
> > > --=20
> > > Rod Grimes                                                 rgrimes@fr=
eebsd.org
> >=20
> > --=20
> > Larry Rosenman                     http://www.lerctr.org/~ler
> > Phone: +1 214-642-9640                 E-Mail: ler@lerctr.org
> > US Mail: 5708 Sabbia Drive, Round Rock, TX 78665-2106
>=20
>=20
> Hi.
>=20
> I noticed this behavior as well and changed vfs.zfs.arc_max for a smaller=
 size.
>=20
> For me it started when I upgraded to 1200058, in this box I'm only using
> poudriere for building tests.

I've noticed that as well.

I have 16G of RAM and two disks, the first one is UFS with the system
installation and the second one is ZFS which I use to store media and
data files and for poudreire.

I don't recall the exact date, but it started fairly recently. System would
swap like crazy to a point when I cannot even ssh to it, and can hardly
login through tty: it might take 10-15 minutes to see a command typed in
the shell.

I've updated loader.conf to have the following:

vfs.zfs.arc_max=3D"4G"
vfs.zfs.prefetch_disable=3D"1"

It fixed the problem, but introduced a new one. When I'm building stuff
with poudriere with ccache enabled, it takes hours to build even small
projects like curl or gnutls.

For example, current build:

[10i386-default] [2018-03-07_07h44m45s] [parallel_build:] Queued: 3  Built:=
 1  Failed: 0  Skipped: 0  Ignored: 0  Tobuild: 2   Time: 06:48:35
        [02]: security/gnutls           | gnutls-3.5.18             build  =
         (06:47:51)

Almost 7 hours already and still going!

gstat output looks like this:

dT: 1.002s  w: 1.000s
 L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
    0      0      0      0    0.0      0      0    0.0    0.0  da0
    0      1      0      0    0.0      1    128    0.7    0.1  ada0
    1    106    106    439   64.6      0      0    0.0   98.8  ada1
    0      1      0      0    0.0      1    128    0.7    0.1  ada0s1
    0      0      0      0    0.0      0      0    0.0    0.0  ada0s1a
    0      0      0      0    0.0      0      0    0.0    0.0  ada0s1b
    0      1      0      0    0.0      1    128    0.7    0.1  ada0s1d

ada0 here is UFS driver, and ada1 is ZFS.

> Regards.
> --=20
> Danilo G. Baio (dbaio)



Roman Bogorodskiy

--2fHTh5uZTiUOsy+g
Content-Type: application/pgp-signature; name="signature.asc"

-----BEGIN PGP SIGNATURE-----

iQEcBAEBAgAGBQJan8FPAAoJEMltX/4IwiJqbHgH/jzaMFhsJlL/wVZo+iVfdbxL
4Z8CPlUb8Ng56yw5iR63yE1ny99lC2tcFGsB0D0Q1lbIrhpDmO4oxRFMtV6JlMN9
RA8QFTMzqCZ86DWfkvj65uJBswLDEZm+5COU5R1bRVN/xrKSJukJ2nYAJI6pwpsE
0o1YFCPJdw93kyooDBdDt/vyqmutOBDDumGGYex9EbkNu+3EK6qCSq62Y1ZSYbod
tS8wJU3D4a0JtjpvREsUrUabF+lFDn+Qv3fVsoZham8nSJVynh33S+Nr1O4dhsIg
wyIOc3HouVZdG6+Y+uD9zvGFNb2tBlisY2G+WUywyRJ1AQB3CfKTiTCUQw5m2p4=
=HubY
-----END PGP SIGNATURE-----

--2fHTh5uZTiUOsy+g--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20180307103911.GA72239>