Date: Sun, 11 Mar 2018 01:40:58 +0100 From: Michael Gmelin <freebsd@grem.de> To: "O. Hartmann" <ohartmann@walstatt.org> Cc: Roman Bogorodskiy <novel@FreeBSD.org>, "Danilo G. Baio" <dbaio@FreeBSD.org>, "Rodney W. Grimes" <freebsd-rwg@pdx.rh.CN85.dnsmgr.net>, Trond Endrest?l <Trond.Endrestol@fagskolen.gjovik.no>, FreeBSD current <freebsd-current@freebsd.org>, Kurt Jaeger <lists@opsec.eu> Subject: Re: Strange ARC/Swap/CPU on yesterday's -CURRENT Message-ID: <CF9EA28E-965E-4BDA-9093-C13E70793338@grem.de> In-Reply-To: <20180311004737.3441dbf9@thor.intern.walstatt.dynvpn.de> References: <20180306173455.oacyqlbib4sbafqd@ler-imac.lerctr.org> <201803061816.w26IGaW5050053@pdx.rh.CN85.dnsmgr.net> <20180306193645.vv3ogqrhauivf2tr@ler-imac.lerctr.org> <20180306221554.uyshbzbboai62rdf@dx240.localdomain> <20180307103911.GA72239@kloomba> <20180311004737.3441dbf9@thor.intern.walstatt.dynvpn.de>
next in thread | previous in thread | raw e-mail | index | archive | help
> On 11. Mar 2018, at 00:47, O. Hartmann <ohartmann@walstatt.org> wrote: >=20 > Am Wed, 7 Mar 2018 14:39:13 +0400 > Roman Bogorodskiy <novel@FreeBSD.org> schrieb: >=20 >> Danilo G. Baio wrote: >>=20 >>>> On Tue, Mar 06, 2018 at 01:36:45PM -0600, Larry Rosenman wrote: =20 >>>> On Tue, Mar 06, 2018 at 10:16:36AM -0800, Rodney W. Grimes wrote: =20 >>>>>> On Tue, Mar 06, 2018 at 08:40:10AM -0800, Rodney W. Grimes wrote: =20= >>>>>>>> On Mon, 5 Mar 2018 14:39-0600, Larry Rosenman wrote: >>>>>>>>=20 >>>>>>>>> Upgraded to: >>>>>>>>>=20 >>>>>>>>> FreeBSD borg.lerctr.org 12.0-CURRENT FreeBSD 12.0-CURRENT #11 r330= 385: >>>>>>>>> Sun Mar 4 12:48:52 CST 2018 >>>>>>>>> root@borg.lerctr.org:/usr/obj/usr/src/amd64.amd64/sys/VT-LER amd6= 4 >>>>>>>>> +1200060 1200060 >>>>>>>>>=20 >>>>>>>>> Yesterday, and I'm seeing really strange slowness, ARC use, and SW= AP use >>>>>>>>> and swapping. >>>>>>>>>=20 >>>>>>>>> See http://www.lerctr.org/~ler/FreeBSD/Swapuse.png =20 >>>>>>>>=20 >>>>>>>> I see these symptoms on stable/11. One of my servers has 32 GiB of=20= >>>>>>>> RAM. After a reboot all is well. ARC starts to fill up, and I still= =20 >>>>>>>> have more than half of the memory available for user processes. >>>>>>>>=20 >>>>>>>> After running the periodic jobs at night, the amount of wired memor= y=20 >>>>>>>> goes sky high. /etc/periodic/weekly/310.locate is a particular nast= y=20 >>>>>>>> one. =20 >>>>>>>=20 >>>>>>> I would like to find out if this is the same person I have >>>>>>> reporting this problem from another source, or if this is >>>>>>> a confirmation of a bug I was helping someone else with. >>>>>>>=20 >>>>>>> Have you been in contact with Michael Dexter about this >>>>>>> issue, or any other forum/mailing list/etc? =20 >>>>>> Just IRC/Slack, with no response. =20 >>>>>>>=20 >>>>>>> If not then we have at least 2 reports of this unbound >>>>>>> wired memory growth, if so hopefully someone here can >>>>>>> take you further in the debug than we have been able >>>>>>> to get. =20 >>>>>> What can I provide? The system is still in this state as the full ba= ckup is >>>>>> slow. =20 >>>>>=20 >>>>> One place to look is to see if this is the recently fixed: >>>>> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D222288 >>>>> g_bio leak. >>>>>=20 >>>>> vmstat -z | egrep 'ITEM|g_bio|UMA' >>>>>=20 >>>>> would be a good first look >>>>>=20 >>>> borg.lerctr.org /home/ler $ vmstat -z | egrep 'ITEM|g_bio|UMA' >>>> ITEM SIZE LIMIT USED FREE REQ FAIL SLEE= P >>>> UMA Kegs: 280, 0, 346, 5, 560, 0, 0= >>>> UMA Zones: 1928, 0, 363, 1, 577, 0, 0= >>>> UMA Slabs: 112, 0,25384098, 977762,102033225, 0, = 0 >>>> UMA Hash: 256, 0, 59, 16, 105, 0, 0= >>>> g_bio: 384, 0, 33, 1627,542482056, 0, = 0 >>>> borg.lerctr.org /home/ler $ =20 >>>>>>>> Limiting the ARC to, say, 16 GiB, has no effect of the high amount o= f=20 >>>>>>>> wired memory. After a few more days, the kernel consumes virtually a= ll=20 >>>>>>>> memory, forcing processes in and out of the swap device. =20 >>>>>>>=20 >>>>>>> Our experience as well. >>>>>>>=20 >>>>>>> ... >>>>>>>=20 >>>>>>> Thanks, >>>>>>> Rod Grimes >>>>>>> rgrimes@freebsd.org =20 >>>>>> Larry Rosenman http://www.lerctr.org/~ler =20 >>>>>=20 >>>>> --=20 >>>>> Rod Grimes rgrimes@fre= ebsd.org =20 >>>>=20 >>>> --=20 >>>> Larry Rosenman http://www.lerctr.org/~ler >>>> Phone: +1 214-642-9640 E-Mail: ler@lerctr.org >>>> US Mail: 5708 Sabbia Drive, Round Rock, TX 78665-2106 =20 >>>=20 >>>=20 >>> Hi. >>>=20 >>> I noticed this behavior as well and changed vfs.zfs.arc_max for a smalle= r size. >>>=20 >>> For me it started when I upgraded to 1200058, in this box I'm only using= >>> poudriere for building tests. =20 >>=20 >> I've noticed that as well. >>=20 >> I have 16G of RAM and two disks, the first one is UFS with the system >> installation and the second one is ZFS which I use to store media and >> data files and for poudreire. >>=20 >> I don't recall the exact date, but it started fairly recently. System wou= ld >> swap like crazy to a point when I cannot even ssh to it, and can hardly >> login through tty: it might take 10-15 minutes to see a command typed in >> the shell. >>=20 >> I've updated loader.conf to have the following: >>=20 >> vfs.zfs.arc_max=3D"4G" >> vfs.zfs.prefetch_disable=3D"1" >>=20 >> It fixed the problem, but introduced a new one. When I'm building stuff >> with poudriere with ccache enabled, it takes hours to build even small >> projects like curl or gnutls. >>=20 >> For example, current build: >>=20 >> [10i386-default] [2018-03-07_07h44m45s] [parallel_build:] Queued: 3 Buil= t: 1 Failed: >> 0 Skipped: 0 Ignored: 0 Tobuild: 2 Time: 06:48:35 [02]: security/gnu= tls >> | gnutls-3.5.18 build (06:47:51) >>=20 >> Almost 7 hours already and still going! >>=20 >> gstat output looks like this: >>=20 >> dT: 1.002s w: 1.000s >> L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name >> 0 0 0 0 0.0 0 0 0.0 0.0 da0 >> 0 1 0 0 0.0 1 128 0.7 0.1 ada0 >> 1 106 106 439 64.6 0 0 0.0 98.8 ada1 >> 0 1 0 0 0.0 1 128 0.7 0.1 ada0s1 >> 0 0 0 0 0.0 0 0 0.0 0.0 ada0s1a >> 0 0 0 0 0.0 0 0 0.0 0.0 ada0s1b >> 0 1 0 0 0.0 1 128 0.7 0.1 ada0s1d >>=20 >> ada0 here is UFS driver, and ada1 is ZFS. >>=20 >>> Regards. >>> --=20 >>> Danilo G. Baio (dbaio) =20 >>=20 >>=20 >>=20 >> Roman Bogorodskiy >=20 >=20 > This is from a APU, no ZFS, UFS on a small mSATA device, the APU (PCenigin= e) works as a > firewall, router, PBX): >=20 > last pid: 9665; load averages: 0.13, 0.13, 0.11 > up 3+06:53:55 00:26:26 19 processes: 1 running, 18 sleeping CPU: 0.3% u= ser, 0.0% > nice, 0.2% system, 0.0% interrupt, 99.5% idle Mem: 27M Active, 6200K Ina= ct, 83M > Laundry, 185M Wired, 128K Buf, 675M Free Swap: 7808M Total, 2856K Used, 78= 05M Free > [...] >=20 > The APU is running CURRENT ( FreeBSD 12.0-CURRENT #42 r330608: Wed Mar 7 1= 6:55:59 CET > 2018 amd64). Usually, the APU never(!) uses swap, now it is starting to sw= ap like hell > for a couple of days and I have to reboot it failty often. >=20 > Another box, 16 GB RAM, ZFS, poudriere, the packaging box, is right now un= responsible: > after hours of building packages, I tried to copy the repository from one l= ocation on > the same ZFS volume to another - usually this task takes a couple of minut= es for ~ 2200 > ports. Now, I has taken 2 1/2 hours and the box got stuck, Ctrl-T on the c= onsole > delivers: > load: 0.00 cmd: make 91199 [pfault] 7239.56r 0.03u 0.04s 0% 740k >=20 > No response from the box anymore. >=20 >=20 > The problem of swapping like hell and performing slow isn't an issue of th= e past days, it > is present at least since 1 1/2 weeks for now, even more. Since I build po= rts fairly > often, time taken on that specific box has increased from 2 to 3 days for a= ll ~2200 > ports. The system has 16 GB of RAM, IvyBridge 4-core XEON at 3,4 GHz, if t= his information > matters. The box is consuming swap really fast. >=20 > Today is the first time the machine got inresponsible (no ssh, no console l= ogin so far). > Need to coldstart. OS is CURRENT as well. >=20 Any chance this is related to meltdown/spectre mitigation patches? Best, Michael > Regards, >=20 > O. Hartmann >=20 >=20 > --=20 > O. Hartmann >=20 > Ich widerspreche der Nutzung oder =C3=9Cbermittlung meiner Daten f=C3=BCr > Werbezwecke oder f=C3=BCr die Markt- oder Meinungsforschung (=C2=A7 28 Abs= . 4 BDSG).
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CF9EA28E-965E-4BDA-9093-C13E70793338>