From owner-freebsd-current@freebsd.org Sat Mar 10 23:50:08 2018 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4C585F418BE for ; Sat, 10 Mar 2018 23:50:08 +0000 (UTC) (envelope-from ohartmann@walstatt.org) Received: from mout.gmx.net (mout.gmx.net [212.227.15.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "mout.gmx.net", Issuer "TeleSec ServerPass DE-2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id AA27F6C63C; Sat, 10 Mar 2018 23:50:07 +0000 (UTC) (envelope-from ohartmann@walstatt.org) Received: from thor.intern.walstatt.dynvpn.de ([77.180.133.220]) by mail.gmx.com (mrgmx002 [212.227.17.190]) with ESMTPSA (Nemesis) id 0MAgvb-1eozRx2BbP-00BwRU; Sun, 11 Mar 2018 00:47:46 +0100 Date: Sun, 11 Mar 2018 00:47:10 +0100 From: "O. Hartmann" To: Roman Bogorodskiy Cc: "Danilo G. Baio" , "Rodney W. Grimes" , Trond Endrest?l , FreeBSD current , Kurt Jaeger Subject: Re: Strange ARC/Swap/CPU on yesterday's -CURRENT Message-ID: <20180311004737.3441dbf9@thor.intern.walstatt.dynvpn.de> In-Reply-To: <20180307103911.GA72239@kloomba> References: <20180306173455.oacyqlbib4sbafqd@ler-imac.lerctr.org> <201803061816.w26IGaW5050053@pdx.rh.CN85.dnsmgr.net> <20180306193645.vv3ogqrhauivf2tr@ler-imac.lerctr.org> <20180306221554.uyshbzbboai62rdf@dx240.localdomain> <20180307103911.GA72239@kloomba> Organization: WALSTATT User-Agent: OutScare 3.1415926 X-Operating-System: ImNotAnOperatingSystem 3.141592527 MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; boundary="Sig_/wdsAemlN0h2nwHJVkiIQ/UN"; protocol="application/pgp-signature" X-Provags-ID: V03:K0:0IiDGVa2tbpkzUSOXJD56D8nWvExQJAFmHONZQWbM27X2Ws79oP 8HVx4RCZE/xAI7NDvv0HCrnRr8B+6xp6sEpn9Xf5l8AZmpZZ3aBME0D9u2gHM5QU5nAhTar VmYZdLgYoUVF5XqV+CffGZvIMWbkiJ8e8IQEf5rOYanvxCQivnnc86hbCJPeTUV24pr1olV laXvmhKpQhdcHAc6tCCQw== X-UI-Out-Filterresults: notjunk:1;V01:K0:/b5tmrgtanA=:kukzx68eQo7RNsalyj+3Io f0htgu/xeNlf9L8XIIwmB9lWdJAlqaSyXmoVK/TzLPwcbMFrNoiqMkcXDs+sMoyTF1XCdLl8J 0w2M3ebYELv1bPBQUzp0wXRxJX0/j4r9pnhoEd+pR/rzmxP576d9M+8THhtxR/rDEm6bs3LqC JqUNwCH4tXc//JFO/m/dzsSp9OTCLkpp2bm8roe8J5VShO2j4zvT9Qkl8Ms/IF6IUXONQG++A Gvf/AbDpCqAmdCHHGlt57Hs2c0zlOYom9NeHTnwZIyrloujvhoYjAMVJL/2O2dy7cT8p+o6p1 JJnugNssXA7K6SQUwdBKRVFg9clU3HzXw3WnUIAm448URev9OSVF8zZOdtQtTyfYs88Mxj5Vg 6U+IMC7hhLMD8+ju+owmr0HdI3QBdw2MDpKdnLBgt1ZPVt2z6wXju2Flk2+mSpuraCvaoBeO0 1ynjZdYcCfQrMnDacH9sbYjNMef7b+JobJyCOaUGqB4n4HPy2WrLr4Z17x5+5TyPNiz2PHXqm APOFT0rbPEqnosBmIQKYzWz7G5X7exVdvxe0QmWlRd2N8zSLybAYa/ryPnkaAyIo9ePYigmB6 RShNY+Aae8L3XUEYSEN1UW90r0KsxRFHPVV99VkcoIowEkoXuTae5elbxZAmFsqeE2rNgnrWy G82iYGPeJBctAy//w5AnHX8bkKIfkkmIkSOEtxe2hYgUGIveLvcGN0NwDO5qw9ZW1UGeX9HPx ZlYbwmeFqZDC8fYlUTkyL0wIvB7pPHh2hRHNIkq820fL3nGoMe7C+lrk2ODrfN1KtzEDMQ9U5 qBPLxgKxYzsW6QidYMu8J2mROGARw== X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 10 Mar 2018 23:50:08 -0000 --Sig_/wdsAemlN0h2nwHJVkiIQ/UN Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Am Wed, 7 Mar 2018 14:39:13 +0400 Roman Bogorodskiy schrieb: > Danilo G. Baio wrote: >=20 > > On Tue, Mar 06, 2018 at 01:36:45PM -0600, Larry Rosenman wrote: =20 > > > On Tue, Mar 06, 2018 at 10:16:36AM -0800, Rodney W. Grimes wrote: =20 > > > > > On Tue, Mar 06, 2018 at 08:40:10AM -0800, Rodney W. Grimes wrote:= =20 > > > > > > > On Mon, 5 Mar 2018 14:39-0600, Larry Rosenman wrote: > > > > > > > =20 > > > > > > > > Upgraded to: > > > > > > > >=20 > > > > > > > > FreeBSD borg.lerctr.org 12.0-CURRENT FreeBSD 12.0-CURRENT #= 11 r330385: > > > > > > > > Sun Mar 4 12:48:52 CST 2018 > > > > > > > > root@borg.lerctr.org:/usr/obj/usr/src/amd64.amd64/sys/VT-LE= R amd64 > > > > > > > > +1200060 1200060 > > > > > > > >=20 > > > > > > > > Yesterday, and I'm seeing really strange slowness, ARC use,= and SWAP use > > > > > > > > and swapping. > > > > > > > >=20 > > > > > > > > See http://www.lerctr.org/~ler/FreeBSD/Swapuse.png =20 > > > > > > >=20 > > > > > > > I see these symptoms on stable/11. One of my servers has 32 G= iB of=20 > > > > > > > RAM. After a reboot all is well. ARC starts to fill up, and I= still=20 > > > > > > > have more than half of the memory available for user processe= s. > > > > > > >=20 > > > > > > > After running the periodic jobs at night, the amount of wired= memory=20 > > > > > > > goes sky high. /etc/periodic/weekly/310.locate is a particula= r nasty=20 > > > > > > > one. =20 > > > > > >=20 > > > > > > I would like to find out if this is the same person I have > > > > > > reporting this problem from another source, or if this is > > > > > > a confirmation of a bug I was helping someone else with. > > > > > >=20 > > > > > > Have you been in contact with Michael Dexter about this > > > > > > issue, or any other forum/mailing list/etc? =20 > > > > > Just IRC/Slack, with no response. =20 > > > > > >=20 > > > > > > If not then we have at least 2 reports of this unbound > > > > > > wired memory growth, if so hopefully someone here can > > > > > > take you further in the debug than we have been able > > > > > > to get. =20 > > > > > What can I provide? The system is still in this state as the ful= l backup is > > > > > slow. =20 > > > >=20 > > > > One place to look is to see if this is the recently fixed: > > > > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D222288 > > > > g_bio leak. > > > >=20 > > > > vmstat -z | egrep 'ITEM|g_bio|UMA' > > > >=20 > > > > would be a good first look > > > > =20 > > > borg.lerctr.org /home/ler $ vmstat -z | egrep 'ITEM|g_bio|UMA' > > > ITEM SIZE LIMIT USED FREE REQ FAIL SL= EEP > > > UMA Kegs: 280, 0, 346, 5, 560, 0, = 0 > > > UMA Zones: 1928, 0, 363, 1, 577, 0, = 0 > > > UMA Slabs: 112, 0,25384098, 977762,102033225, 0,= 0 > > > UMA Hash: 256, 0, 59, 16, 105, 0, = 0 > > > g_bio: 384, 0, 33, 1627,542482056, 0,= 0 > > > borg.lerctr.org /home/ler $ =20 > > > > > > > Limiting the ARC to, say, 16 GiB, has no effect of the high a= mount of=20 > > > > > > > wired memory. After a few more days, the kernel consumes virt= ually all=20 > > > > > > > memory, forcing processes in and out of the swap device. =20 > > > > > >=20 > > > > > > Our experience as well. > > > > > >=20 > > > > > > ... > > > > > >=20 > > > > > > Thanks, > > > > > > Rod Grimes > > > > > > rgrimes@freebsd.org =20 > > > > > Larry Rosenman http://www.lerctr.org/~ler =20 > > > >=20 > > > > --=20 > > > > Rod Grimes rgrimes@= freebsd.org =20 > > >=20 > > > --=20 > > > Larry Rosenman http://www.lerctr.org/~ler > > > Phone: +1 214-642-9640 E-Mail: ler@lerctr.org > > > US Mail: 5708 Sabbia Drive, Round Rock, TX 78665-2106 =20 > >=20 > >=20 > > Hi. > >=20 > > I noticed this behavior as well and changed vfs.zfs.arc_max for a small= er size. > >=20 > > For me it started when I upgraded to 1200058, in this box I'm only using > > poudriere for building tests. =20 >=20 > I've noticed that as well. >=20 > I have 16G of RAM and two disks, the first one is UFS with the system > installation and the second one is ZFS which I use to store media and > data files and for poudreire. >=20 > I don't recall the exact date, but it started fairly recently. System wou= ld > swap like crazy to a point when I cannot even ssh to it, and can hardly > login through tty: it might take 10-15 minutes to see a command typed in > the shell. >=20 > I've updated loader.conf to have the following: >=20 > vfs.zfs.arc_max=3D"4G" > vfs.zfs.prefetch_disable=3D"1" >=20 > It fixed the problem, but introduced a new one. When I'm building stuff > with poudriere with ccache enabled, it takes hours to build even small > projects like curl or gnutls. >=20 > For example, current build: >=20 > [10i386-default] [2018-03-07_07h44m45s] [parallel_build:] Queued: 3 Buil= t: 1 Failed: > 0 Skipped: 0 Ignored: 0 Tobuild: 2 Time: 06:48:35 [02]: security/gnu= tls > | gnutls-3.5.18 build (06:47:51) >=20 > Almost 7 hours already and still going! >=20 > gstat output looks like this: >=20 > dT: 1.002s w: 1.000s > L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name > 0 0 0 0 0.0 0 0 0.0 0.0 da0 > 0 1 0 0 0.0 1 128 0.7 0.1 ada0 > 1 106 106 439 64.6 0 0 0.0 98.8 ada1 > 0 1 0 0 0.0 1 128 0.7 0.1 ada0s1 > 0 0 0 0 0.0 0 0 0.0 0.0 ada0s1a > 0 0 0 0 0.0 0 0 0.0 0.0 ada0s1b > 0 1 0 0 0.0 1 128 0.7 0.1 ada0s1d >=20 > ada0 here is UFS driver, and ada1 is ZFS. >=20 > > Regards. > > --=20 > > Danilo G. Baio (dbaio) =20 >=20 >=20 >=20 > Roman Bogorodskiy This is from a APU, no ZFS, UFS on a small mSATA device, the APU (PCenigine= ) works as a firewall, router, PBX): last pid: 9665; load averages: 0.13, 0.13, 0.11 up 3+06:53:55 00:26:26 19 processes: 1 running, 18 sleeping CPU: 0.3% us= er, 0.0% nice, 0.2% system, 0.0% interrupt, 99.5% idle Mem: 27M Active, 6200K Inac= t, 83M Laundry, 185M Wired, 128K Buf, 675M Free Swap: 7808M Total, 2856K Used, 780= 5M Free [...] The APU is running CURRENT ( FreeBSD 12.0-CURRENT #42 r330608: Wed Mar 7 1= 6:55:59 CET 2018 amd64). Usually, the APU never(!) uses swap, now it is starting to swa= p like hell for a couple of days and I have to reboot it failty often. Another box, 16 GB RAM, ZFS, poudriere, the packaging box, is right now unr= esponsible: after hours of building packages, I tried to copy the repository from one l= ocation on the same ZFS volume to another - usually this task takes a couple of minute= s for ~ 2200 ports. Now, I has taken 2 1/2 hours and the box got stuck, Ctrl-T on the c= onsole delivers: load: 0.00 cmd: make 91199 [pfault] 7239.56r 0.03u 0.04s 0% 740k No response from the box anymore. The problem of swapping like hell and performing slow isn't an issue of the= past days, it is present at least since 1 1/2 weeks for now, even more. Since I build por= ts fairly often, time taken on that specific box has increased from 2 to 3 days for a= ll ~2200 ports. The system has 16 GB of RAM, IvyBridge 4-core XEON at 3,4 GHz, if th= is information matters. The box is consuming swap really fast. Today is the first time the machine got inresponsible (no ssh, no console l= ogin so far). Need to coldstart. OS is CURRENT as well. Regards, O. Hartmann --=20 O. Hartmann Ich widerspreche der Nutzung oder =C3=9Cbermittlung meiner Daten f=C3=BCr Werbezwecke oder f=C3=BCr die Markt- oder Meinungsforschung (=C2=A7 28 Abs.= 4 BDSG). --Sig_/wdsAemlN0h2nwHJVkiIQ/UN Content-Type: application/pgp-signature Content-Description: OpenPGP digital signature -----BEGIN PGP SIGNATURE----- iLUEARMKAB0WIQQZVZMzAtwC2T/86TrS528fyFhYlAUCWqRumQAKCRDS528fyFhY lLlmAf9B2A2tYr22TZ1ykt2D20L5bC+hi9PUulHwwkwK0ntatPQu9AvGOcP91D+V rhPC4/DPv3HjajwNYQcOZ7BhXjmBAf9BDQYvZboSBPUAU1jHiBDeV9LcqZR6wNuE AYoPnsv5r7sT9Q+EQftRAaesK9KIUmiA13aCm7u00IhR+tRc40Pn =MneR -----END PGP SIGNATURE----- --Sig_/wdsAemlN0h2nwHJVkiIQ/UN--