From owner-freebsd-current@freebsd.org Sun Mar 11 20:44:53 2018 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id AC99EF4D85E for ; Sun, 11 Mar 2018 20:44:53 +0000 (UTC) (envelope-from jroberson@jroberson.net) Received: from mail-pl0-x22a.google.com (mail-pl0-x22a.google.com [IPv6:2607:f8b0:400e:c01::22a]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 27D8E7E137 for ; Sun, 11 Mar 2018 20:44:53 +0000 (UTC) (envelope-from jroberson@jroberson.net) Received: by mail-pl0-x22a.google.com with SMTP id 93-v6so8171416plc.9 for ; Sun, 11 Mar 2018 13:44:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=jroberson-net.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:in-reply-to:message-id:references :user-agent:mime-version; bh=/mKr7BGI9A4FgCJoD4P+hqvU8L/z5ycS4B5+s4jNjE4=; b=huDiubKzTVM3+TZQsoaRLUIhqQcDMaO6tEPf70gxEPzQMJfdRuwzX3VCu2GeHFrQu0 0qkz9JVhAP4cq4fadG8WN5p5zndFr3yJSsQzkwlYkGbfXAZ2RQkuS7/VNxuM7YskO9HY 8RW708/sG+h1xbrth7sZMM4T/A/waPYZczCyUxrMs2XQTLzl7Wyy1ZiSKY7psazNja+O Vmeh/I5b0sqQbGEke5I0cib0IVnRDIbz/AL+RiiKbfJ7+MyLmjRH0ivYyr+e2+0bWGA5 fjw6J7tPd3Im6g/C25RVu3cSZXA6O/jmhv97eY+YzuXSKzr2xZ5VpC8gwHmcdO4cCqZi BUMg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:user-agent:mime-version; bh=/mKr7BGI9A4FgCJoD4P+hqvU8L/z5ycS4B5+s4jNjE4=; b=BJQcSm5dFLRfAGRecHv563NDg/pNlrI0Pbu1EajRFwKFqekSTvIzj1D0QwjauoZ1uh +I0qZysfviH4FYSo/dxhhSbXcLXDLVyrqjdpXzA2KWuWVmpY+kJ+JLKrDW+anUl2gJVb iyV2hBdyPXvTR+myMmQ4yVHPwAX0aUzHrD3PGuN3X2y+i+T3PkGjRR/v8wvhyVFBJS7L +ER3783jI6bKBtunpKqdJRQSPJFd/bC7bBEcB/C1PL8dcNOhCavhGCyDSbLRFZUUCd0g IQF8xjD3NdJMDAf17+3usnB5Jdehw7iI/aj9SAtNC8gfTstS62OUw5CUD+a9Bx3BkFcA ZRnQ== X-Gm-Message-State: AElRT7GonIxayzIAStIVQqrhgq65eJ7TvkQyAMizKyHwt0c17AR9c/Cp xaIrjAh+ERcnXP99WXY+W1j59w== X-Google-Smtp-Source: AG47ELsDzfB0wHhqN4jc4TO7RLxA9nDz+z7I0+0wx4t0gCdxygyGOymAbwMBuwjpepP14sQZhaQV8A== X-Received: by 2002:a17:902:6c46:: with SMTP id h6-v6mr5922461pln.333.1520801091897; Sun, 11 Mar 2018 13:44:51 -0700 (PDT) Received: from rrcs-66-91-135-210.west.biz.rr.com (rrcs-66-91-135-210.west.biz.rr.com. [66.91.135.210]) by smtp.gmail.com with ESMTPSA id n14sm13555328pfj.154.2018.03.11.13.44.49 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 11 Mar 2018 13:44:50 -0700 (PDT) Date: Sun, 11 Mar 2018 10:43:58 -1000 (HST) From: Jeff Roberson X-X-Sender: jroberson@desktop To: "O. Hartmann" cc: Roman Bogorodskiy , "Danilo G. Baio" , "Rodney W. Grimes" , Trond Endrest?l , FreeBSD current , Kurt Jaeger Subject: Re: Strange ARC/Swap/CPU on yesterday's -CURRENT In-Reply-To: <20180311004737.3441dbf9@thor.intern.walstatt.dynvpn.de> Message-ID: References: <20180306173455.oacyqlbib4sbafqd@ler-imac.lerctr.org> <201803061816.w26IGaW5050053@pdx.rh.CN85.dnsmgr.net> <20180306193645.vv3ogqrhauivf2tr@ler-imac.lerctr.org> <20180306221554.uyshbzbboai62rdf@dx240.localdomain> <20180307103911.GA72239@kloomba> <20180311004737.3441dbf9@thor.intern.walstatt.dynvpn.de> User-Agent: Alpine 2.21 (BSF 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8BIT X-Content-Filtered-By: Mailman/MimeDel 2.1.25 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 11 Mar 2018 20:44:53 -0000 On Sun, 11 Mar 2018, O. Hartmann wrote: > Am Wed, 7 Mar 2018 14:39:13 +0400 > Roman Bogorodskiy schrieb: > >> Danilo G. Baio wrote: >> >>> On Tue, Mar 06, 2018 at 01:36:45PM -0600, Larry Rosenman wrote: >>>> On Tue, Mar 06, 2018 at 10:16:36AM -0800, Rodney W. Grimes wrote: >>>>>> On Tue, Mar 06, 2018 at 08:40:10AM -0800, Rodney W. Grimes wrote: >>>>>>>> On Mon, 5 Mar 2018 14:39-0600, Larry Rosenman wrote: >>>>>>>> >>>>>>>>> Upgraded to: >>>>>>>>> >>>>>>>>> FreeBSD borg.lerctr.org 12.0-CURRENT FreeBSD 12.0-CURRENT #11 r330385: >>>>>>>>> Sun Mar 4 12:48:52 CST 2018 >>>>>>>>> root@borg.lerctr.org:/usr/obj/usr/src/amd64.amd64/sys/VT-LER amd64 >>>>>>>>> +1200060 1200060 >>>>>>>>> >>>>>>>>> Yesterday, and I'm seeing really strange slowness, ARC use, and SWAP use >>>>>>>>> and swapping. >>>>>>>>> >>>>>>>>> See http://www.lerctr.org/~ler/FreeBSD/Swapuse.png >>>>>>>> >>>>>>>> I see these symptoms on stable/11. One of my servers has 32 GiB of >>>>>>>> RAM. After a reboot all is well. ARC starts to fill up, and I still >>>>>>>> have more than half of the memory available for user processes. >>>>>>>> >>>>>>>> After running the periodic jobs at night, the amount of wired memory >>>>>>>> goes sky high. /etc/periodic/weekly/310.locate is a particular nasty >>>>>>>> one. >>>>>>> >>>>>>> I would like to find out if this is the same person I have >>>>>>> reporting this problem from another source, or if this is >>>>>>> a confirmation of a bug I was helping someone else with. >>>>>>> >>>>>>> Have you been in contact with Michael Dexter about this >>>>>>> issue, or any other forum/mailing list/etc? >>>>>> Just IRC/Slack, with no response. >>>>>>> >>>>>>> If not then we have at least 2 reports of this unbound >>>>>>> wired memory growth, if so hopefully someone here can >>>>>>> take you further in the debug than we have been able >>>>>>> to get. >>>>>> What can I provide? The system is still in this state as the full backup is >>>>>> slow. >>>>> >>>>> One place to look is to see if this is the recently fixed: >>>>> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=222288 >>>>> g_bio leak. >>>>> >>>>> vmstat -z | egrep 'ITEM|g_bio|UMA' >>>>> >>>>> would be a good first look >>>>> >>>> borg.lerctr.org /home/ler $ vmstat -z | egrep 'ITEM|g_bio|UMA' >>>> ITEM SIZE LIMIT USED FREE REQ FAIL SLEEP >>>> UMA Kegs: 280, 0, 346, 5, 560, 0, 0 >>>> UMA Zones: 1928, 0, 363, 1, 577, 0, 0 >>>> UMA Slabs: 112, 0,25384098, 977762,102033225, 0, 0 >>>> UMA Hash: 256, 0, 59, 16, 105, 0, 0 >>>> g_bio: 384, 0, 33, 1627,542482056, 0, 0 >>>> borg.lerctr.org /home/ler $ >>>>>>>> Limiting the ARC to, say, 16 GiB, has no effect of the high amount of >>>>>>>> wired memory. After a few more days, the kernel consumes virtually all >>>>>>>> memory, forcing processes in and out of the swap device. >>>>>>> >>>>>>> Our experience as well. >>>>>>> >>>>>>> ... >>>>>>> >>>>>>> Thanks, >>>>>>> Rod Grimes >>>>>>> rgrimes@freebsd.org >>>>>> Larry Rosenman http://www.lerctr.org/~ler >>>>> >>>>> -- >>>>> Rod Grimes rgrimes@freebsd.org >>>> >>>> -- >>>> Larry Rosenman http://www.lerctr.org/~ler >>>> Phone: +1 214-642-9640 E-Mail: ler@lerctr.org >>>> US Mail: 5708 Sabbia Drive, Round Rock, TX 78665-2106 >>> >>> >>> Hi. >>> >>> I noticed this behavior as well and changed vfs.zfs.arc_max for a smaller size. >>> >>> For me it started when I upgraded to 1200058, in this box I'm only using >>> poudriere for building tests. >> >> I've noticed that as well. >> >> I have 16G of RAM and two disks, the first one is UFS with the system >> installation and the second one is ZFS which I use to store media and >> data files and for poudreire. >> >> I don't recall the exact date, but it started fairly recently. System would >> swap like crazy to a point when I cannot even ssh to it, and can hardly >> login through tty: it might take 10-15 minutes to see a command typed in >> the shell. >> >> I've updated loader.conf to have the following: >> >> vfs.zfs.arc_max="4G" >> vfs.zfs.prefetch_disable="1" >> >> It fixed the problem, but introduced a new one. When I'm building stuff >> with poudriere with ccache enabled, it takes hours to build even small >> projects like curl or gnutls. >> >> For example, current build: >> >> [10i386-default] [2018-03-07_07h44m45s] [parallel_build:] Queued: 3 Built: 1 Failed: >> 0 Skipped: 0 Ignored: 0 Tobuild: 2 Time: 06:48:35 [02]: security/gnutls >> | gnutls-3.5.18 build (06:47:51) >> >> Almost 7 hours already and still going! >> >> gstat output looks like this: >> >> dT: 1.002s w: 1.000s >> L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name >> 0 0 0 0 0.0 0 0 0.0 0.0 da0 >> 0 1 0 0 0.0 1 128 0.7 0.1 ada0 >> 1 106 106 439 64.6 0 0 0.0 98.8 ada1 >> 0 1 0 0 0.0 1 128 0.7 0.1 ada0s1 >> 0 0 0 0 0.0 0 0 0.0 0.0 ada0s1a >> 0 0 0 0 0.0 0 0 0.0 0.0 ada0s1b >> 0 1 0 0 0.0 1 128 0.7 0.1 ada0s1d >> >> ada0 here is UFS driver, and ada1 is ZFS. >> >>> Regards. >>> -- >>> Danilo G. Baio (dbaio) >> >> >> >> Roman Bogorodskiy > > > This is from a APU, no ZFS, UFS on a small mSATA device, the APU (PCenigine) works as a > firewall, router, PBX): > > last pid: 9665; load averages: 0.13, 0.13, 0.11 > up 3+06:53:55 00:26:26 19 processes: 1 running, 18 sleeping CPU: 0.3% user, 0.0% > nice, 0.2% system, 0.0% interrupt, 99.5% idle Mem: 27M Active, 6200K Inact, 83M > Laundry, 185M Wired, 128K Buf, 675M Free Swap: 7808M Total, 2856K Used, 7805M Free > [...] > > The APU is running CURRENT ( FreeBSD 12.0-CURRENT #42 r330608: Wed Mar 7 16:55:59 CET > 2018 amd64). Usually, the APU never(!) uses swap, now it is starting to swap like hell > for a couple of days and I have to reboot it failty often. > > Another box, 16 GB RAM, ZFS, poudriere, the packaging box, is right now unresponsible: > after hours of building packages, I tried to copy the repository from one location on > the same ZFS volume to another - usually this task takes a couple of minutes for ~ 2200 > ports. Now, I has taken 2 1/2 hours and the box got stuck, Ctrl-T on the console > delivers: > load: 0.00 cmd: make 91199 [pfault] 7239.56r 0.03u 0.04s 0% 740k > > No response from the box anymore. > > > The problem of swapping like hell and performing slow isn't an issue of the past days, it > is present at least since 1 1/2 weeks for now, even more. Since I build ports fairly > often, time taken on that specific box has increased from 2 to 3 days for all ~2200 > ports. The system has 16 GB of RAM, IvyBridge 4-core XEON at 3,4 GHz, if this information > matters. The box is consuming swap really fast. > > Today is the first time the machine got inresponsible (no ssh, no console login so far). > Need to coldstart. OS is CURRENT as well. > > Regards, > > O. Hartmann Hi Folks, This could be my fault from recent NUMA and concurrency related work. I did touch some of the arc back-pressure mechanisms. First, I would like to identify whether the wired memory is in the buffer cache. Can those of you that have a repro look at sysctl vfs.bufspace and tell me if that accounts for the bulk of your wired memory usage? I'm wondering if a job ran that pulled in all of the bufs from your root disk and filled up the buffer cache which doesn't have a back-pressure mechanism. Then arc didn't respond appropriately to lower its usage. Also, if you could try going back to r328953 or r326346 and let me know if the problem exists in either. That would be very helpful. If anyone is willing to debug this with me contact me directly and I will send some test patches or debugging info after you have done the above steps. Thank you for the reports. Jeff > > > -- > O. Hartmann > > Ich widerspreche der Nutzung oder Übermittlung meiner Daten für > Werbezwecke oder für die Markt- oder Meinungsforschung (§ 28 Abs. 4 BDSG). > From owner-freebsd-current@freebsd.org Sun Mar 11 22:13:23 2018 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9DD8CF2AA24 for ; Sun, 11 Mar 2018 22:13:23 +0000 (UTC) (envelope-from fullermd@over-yonder.net) Received: from mail.infocus-llc.com (mail.infocus-llc.com [199.15.120.13]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4571382383 for ; Sun, 11 Mar 2018 22:13:22 +0000 (UTC) (envelope-from fullermd@over-yonder.net) Received: from draco.over-yonder.net (c-75-65-60-66.hsd1.ms.comcast.net [75.65.60.66]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.tarragon.infocus-llc.com (Postfix) with ESMTPSA id 3zzwNq4CwSz1Lg; Sun, 11 Mar 2018 17:13:15 -0500 (CDT) Received: by draco.over-yonder.net (Postfix, from userid 100) id 3zzwNn3TMJz94s; Sun, 11 Mar 2018 17:13:13 -0500 (CDT) Date: Sun, 11 Mar 2018 17:13:13 -0500 From: "Matthew D. Fuller" To: Jeff Roberson Cc: FreeBSD current Subject: Re: Strange ARC/Swap/CPU on yesterday's -CURRENT Message-ID: <20180311221313.GF42539@over-yonder.net> References: <20180306173455.oacyqlbib4sbafqd@ler-imac.lerctr.org> <201803061816.w26IGaW5050053@pdx.rh.CN85.dnsmgr.net> <20180306193645.vv3ogqrhauivf2tr@ler-imac.lerctr.org> <20180306221554.uyshbzbboai62rdf@dx240.localdomain> <20180307103911.GA72239@kloomba> <20180311004737.3441dbf9@thor.intern.walstatt.dynvpn.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Editor: vi X-OS: FreeBSD User-Agent: Mutt/1.9.4 (2018-02-28) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 11 Mar 2018 22:13:23 -0000 On Sun, Mar 11, 2018 at 10:43:58AM -1000 I heard the voice of Jeff Roberson, and lo! it spake thus: > > First, I would like to identify whether the wired memory is in the > buffer cache. Can those of you that have a repro look at sysctl > vfs.bufspace and tell me if that accounts for the bulk of your wired > memory usage? I'm wondering if a job ran that pulled in all of the > bufs from your root disk and filled up the buffer cache which > doesn't have a back-pressure mechanism. If by "root disk", you mean the one that isn't ZFS, that wouldn't touch anything here; apart from a md-backed UFS /tmp and some NFS mounts, everything on my system is ZFS. I believe vfs.bufspace is what shows up as "Buf" on top? I don't recall it looking particularly interesting when things were madly swapping. I'll uncork arc_max again for a bit and see if anything odd shows up in it, but it's only a dozen megs or so now. -- Matthew Fuller (MF4839) | fullermd@over-yonder.net Systems/Network Administrator | http://www.over-yonder.net/~fullermd/ On the Internet, nobody can hear you scream.