From owner-freebsd-current@freebsd.org  Tue Apr  5 06:46:20 2016
Return-Path: <owner-freebsd-current@freebsd.org>
Delivered-To: freebsd-current@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5C133B034AD
 for <freebsd-current@mailman.ysv.freebsd.org>;
 Tue,  5 Apr 2016 06:46:20 +0000 (UTC)
 (envelope-from cy.schubert@komquats.com)
Received: from smtp-out-no.shaw.ca (smtp-out-no.shaw.ca [64.59.134.13])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client CN "Client", Issuer "CA" (not verified))
 by mx1.freebsd.org (Postfix) with ESMTPS id 2734116FD;
 Tue,  5 Apr 2016 06:46:19 +0000 (UTC)
 (envelope-from cy.schubert@komquats.com)
Received: from spqr.komquats.com ([96.50.22.10]) by shaw.ca with SMTP
 id nKkcatdHDQeymnKkdaXUxY; Tue, 05 Apr 2016 00:46:13 -0600
X-Authority-Analysis: v=2.2 cv=H9KZ+KQi c=1 sm=1 tr=0
 a=jvE2nwUzI0ECrNeyr98KWA==:117 a=jvE2nwUzI0ECrNeyr98KWA==:17
 a=L9H7d07YOLsA:10 a=9cW_t1CCXrUA:10 a=s5jvgZ67dGcA:10 a=kziv93cY1bsA:10
 a=BWvPGDcYAAAA:8 a=zxA2vyXaAAAA:8 a=YxBL1-UpAAAA:8 a=6I5d2MoRAAAA:8
 a=ij63hNtKrUkdOZeuT7AA:9
Received: from slippy.cwsent.com (slippy [10.1.1.91])
 by spqr.komquats.com (Postfix) with ESMTP id 0A3EC13751;
 Mon,  4 Apr 2016 23:46:10 -0700 (PDT)
Received: from slippy (localhost [127.0.0.1])
 by slippy.cwsent.com (8.15.2/8.15.2) with ESMTP id u356k850078565;
 Mon, 4 Apr 2016 23:46:08 -0700 (PDT)
 (envelope-from Cy.Schubert@komquats.com)
Message-Id: <201604050646.u356k850078565@slippy.cwsent.com>
X-Mailer: exmh version 2.8.0 04/21/2012 with nmh-1.6
Reply-to: Cy Schubert <Cy.Schubert@komquats.com>
From: Cy Schubert <Cy.Schubert@komquats.com>
X-os: FreeBSD
X-Sender: cy@cwsent.com
X-URL: http://www.komquats.com/
To: "O. Hartmann" <ohartman@zedat.fu-berlin.de>
cc: Cy Schubert <Cy.Schubert@komquats.com>,
 Michael Butler <imb@protected-networks.net>, "K. Macy" <kmacy@freebsd.org>,
 FreeBSD CURRENT <freebsd-current@freebsd.org>
Subject: Re: CURRENT slow and shaky network stability
In-Reply-To: Message from "O. Hartmann" <ohartman@zedat.fu-berlin.de> of "Tue,
 05 Apr 2016 08:20:47 +0200."
 <20160405082047.670d7241@freyja.zeit4.iv.bundesimmobilien.de>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Mon, 04 Apr 2016 23:46:08 -0700
X-CMAE-Envelope: MS4wfOb0/5zKoeB4Dsa9fLix6RAQLwEXPHPhyjM2N65be73d0ej5CwfS7irXDOvLjbxuqTjO0lEWT23YnmLDU43Zp1jHTQ73I/Edfw4Or+F4VkZ1cxXze8DU
 VMqjuLVLObAs/jnm9xPP8fszP4yRbHX6K5463Ac+L/mZn6BqmMNGq1CukNTc8+1Wp2zsL+2t2abds7005OUcriHe7EuNJ0thFM1uvotWuVj57J9dXWYR218B
 qOCijtFnZ9dE3igVIEF6d9/WVDqvkCwLCV6C68++JFMrKqhryLtR/WfQgmu0YRA8
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.21
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
 <freebsd-current.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-current>, 
 <mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current/>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
 <mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Apr 2016 06:46:20 -0000

In message <20160405082047.670d7241@freyja.zeit4.iv.bundesimmobilien.de>, 
"O. H
artmann" writes:
> On Sat, 02 Apr 2016 16:14:57 -0700
> Cy Schubert <Cy.Schubert@komquats.com> wrote:
> 
> > In message <20160402231955.41b05526.ohartman@zedat.fu-berlin.de>, "O. 
> > Hartmann"
> >  writes:
> > > --Sig_/eJJPtbrEuK1nN2zIpc7BmVr
> > > Content-Type: text/plain; charset=US-ASCII
> > > Content-Transfer-Encoding: quoted-printable
> > > 
> > > Am Sat, 2 Apr 2016 11:39:10 +0200
> > > "O. Hartmann" <ohartman@zedat.fu-berlin.de> schrieb:
> > >   
> > > > Am Sat, 2 Apr 2016 10:55:03 +0200
> > > > "O. Hartmann" <ohartman@zedat.fu-berlin.de> schrieb:
> > > >=20  
> > > > > Am Sat, 02 Apr 2016 01:07:55 -0700
> > > > > Cy Schubert <Cy.Schubert@komquats.com> schrieb:
> > > > >  =20  
> > > > > > In message <56F6C6B0.6010103@protected-networks.net>, Michael Butle
> r
> > > > > > =  
> > > writes:   =20  
> > > > > > > -current is not great for interactive use at all. The strategy of
> > > > > > > pre-emptively dropping idle processes to swap is hurting .. big
> > > > > > > tim=  
> > > e.     =20  
> > > > > >=20
> > > > > > FreeBSD doesn't "preemptively" or arbitrarily push pages out to
> > > > > > disk.=  
> > >  LRU=20  
> > > > > > doesn't do this.
> > > > > >    =20  
> > > > > > >=20
> > > > > > > Compare inactive memory to swap in this example ..
> > > > > > >=20
> > > > > > > 110 processes: 1 running, 108 sleeping, 1 zombie
> > > > > > > CPU:  1.2% user,  0.0% nice,  4.3% system,  0.0% interrupt, 94.5%
> > > > > > > i=  
> > > dle  
> > > > > > > Mem: 474M Active, 1609M Inact, 764M Wired, 281M Buf, 119M Free
> > > > > > > Swap: 4096M Total, 917M Used, 3178M Free, 22% Inuse     =20  
> > > > > >=20
> > > > > > To analyze this you need to capture vmstat output. You'll see the
> > > > > > fre=  
> > > e pool=20  
> > > > > > dip below a threshold and pages go out to disk in response. If you
> > > > > > ha=  
> > > ve=20  
> > > > > > daemons with small working sets, pages that are not part of the
> > > > > > worki=  
> > > ng=20  
> > > > > > sets for daemons or applications will eventually be paged out. This
> > > > > > i=  
> > > s not=20  
> > > > > > a bad thing. In your example above, the 281 MB of UFS buffers are
> > > > > > mor=  
> > > e=20  
> > > > > > active than the 917 MB paged out. If it's paged out and never used
> > > > > > ag=  
> > > ain,=20  
> > > > > > then it doesn't hurt. However the 281 MB of buffers saves you I/O.
> > > > > > Th=  
> > > e=20  
> > > > > > inactive pages are part of your free pool that were active at one
> > > > > > tim=  
> > > e but=20  
> > > > > > now are not. They may be reclaimed and if they are, you've just
> > > > > > saved=  
> > >  more=20  
> > > > > > I/O.
> > > > > >=20
> > > > > > Top is a poor tool to analyze memory use. Vmstat is the better tool
> > > > > > t=  
> > > o help=20  
> > > > > > understand memory use. Inactive memory isn't a bad thing per se.
> > > > > > Moni=  
> > > tor=20  
> > > > > > page outs, scan rate and page reclaims.
> > > > > >=20
> > > > > >    =20  
> > > > >=20
> > > > > I give up! Tried to check via ssh/vmstat what is going on. Last lines
> > > > > b=  
> > > efore broken  
> > > > > pipe:
> > > > >=20
> > > > > [...]
> > > > > procs  memory       page                    disks     faults         
> cpu
> > > > > r b w  avm   fre   flt  re  pi  po    fr   sr ad0 ad1   in    sy    c
> s
> > > > > =  
> > > us sy id  
> > > > > 22 0 22 5.8G  1.0G 46319   0   0   0 55721 1297   0   4  219 23907
> > > > > 540=  
> > > 0 95  5  0  
> > > > > 22 0 22 5.4G  1.3G 51733   0   0   0 72436 1162   0   0  108 40869
> > > > > 345=  
> > > 9 93  7  0  
> > > > > 15 0 22  12G  1.2G 54400   0  27   0 52188 1160   0  42  148 52192
> > > > > 436=  
> > > 6 91  9  0  
> > > > > 14 0 22  12G  1.0G 44954   0  37   0 37550 1179   0  39  141 86209
> > > > > 436=  
> > > 8 88 12  0  
> > > > > 26 0 22  12G  1.1G 60258   0  81   0 69459 1119   0  27  123 779569
> > > > > 704=  
> > > 359 87 13  0  
> > > > > 29 3 22  13G  774M 50576   0  68   0 32204 1304   0   2  102 507337
> > > > > 484=  
> > > 861 93  7  0  
> > > > > 27 0 22  13G  937M 47477   0  48   0 59458 1264   3   2  112 68131
> > > > > 4440=  
> > > 7 95  5  0  
> > > > > 36 0 22  13G  829M 83164   0   2   0 82575 1225   1   0  126 99366
> > > > > 3806=  
> > > 0 89 11  0  
> > > > > 35 0 22 6.2G  1.1G 98803   0  13   0 121375 1217   2   8  112 99371
> > > > > 49=  
> > > 99 85 15  0  
> > > > > 34 0 22  13G  723M 54436   0  20   0 36952 1276   0  17  153 29142
> > > > > 443=  
> > > 1 95  5  0  
> > > > > Fssh_packet_write_wait: Connection to 192.168.0.1 port 22: Broken pip
> e
> > > > >=20
> > > > >=20
> > > > > This makes this crap system completely unusable. The server (FreeBSD
> > > > > 11=  
> > > .0-CURRENT #20  
> > > > > r297503: Sat Apr  2 09:02:41 CEST 2016 amd64) in question did
> > > > > poudriere=  
> > >  bulk job. I  
> > > > > can not even determine what terminal goes down first - another one,
> > > > > muc=  
> > > h more time  
> > > > > idle than the one shwoing the "vmstat 5" output, is still alive!=20
> > > > >=20
> > > > > i consider this a serious bug and it is no benefit what happened sinc
> e
> > > > > =  
> > > this "fancy"  
> > > > > update. :-( =20  
> > > >=20
> > > > By the way - it might be of interest and some hint.
> > > >=20
> > > > One of my boxes is acting as server and gateway. It utilises NAT, IPFW,
> > > > w=  
> > > hen it is under  
> > > > high load, as it was today, sometimes passing the network flow from ISP
> > > > i=  
> > > nto the network  
> > > > for clients is extremely slow. I do not consider this the reason for
> > > > coll=  
> > > apsing ssh  
> > > > sessions, since this incident happens also under no-load, but in the
> > > > over=  
> > > all-view onto  
> > > > the problem, this could be a hint - I hope.=20  
> > > 
> > > I just checked on one box, that "broke pipe" very quickly after I started
>  p=
> > > oudriere,
> > > while it did well a couple of hours before until the pipe broke. It seems
>  i=
> > > t's load
> > > dependend when the ssh session gets wrecked, but more important, after th
> e =
> > > long-haul
> > > poudriere run, I rebooted the box and tried again with the mentioned brok
> en=
> > >  pipe after a
> > > couple of minutes after poudriere ran. Then I left the box for several ho
> ur=
> > > s and logged
> > > in again and checked the swap. Although there was for hours no load or ot
> he=
> > > r pressure,
> > > there were 31% of of swap used - still (box has 16 GB of RAM and is prope
> ll=
> > > ed by a XEON
> > > E3-1245 V2).
> > >   
> > 
> > 31%! Is it *actively* paging or is the 31% previously paged out and no 
> > paging is *currently* being experienced? 31% of how swap space in total?
> > 
> > Also, what does ps aumx or ps aumxww say? Pipe it to head -40 or similar.
> > 
> > 
> 
> On FreeBSD 11.0-CURRENT #4 r297573: Tue Apr  5 07:01:19 CEST 2016 amd64, loca
> l
> network, no NAT. Stuck ssh session in the middle of administering and leaving
> the console/ssh session for a couple of minutes:
> 
> root        2064   0.0  0.1  91416  8492  -  Is   07:18     0:00.03 sshd:
> hartmann [priv] (sshd)
> 
> hartmann    2108   0.0  0.1  91416  8664  -  I    07:18     0:07.33 sshd:
> hartmann@pts/0 (sshd)
> 
> root       72961   0.0  0.1  91416  8496  -  Is   08:11     0:00.03 sshd:
> hartmann [priv] (sshd)
> 
> hartmann   72970   0.0  0.1  91416  8564  -  S    08:11     0:00.02 sshd:
> hartmann@pts/1 (sshd)
> 
> The situation is worse and i consider this a serious bug.
> 

There's not a lot to go on here. Do you have physical access to the machine 
to pop into DDB and take a look? You did say you're using a lot of swap. 
IIRC 30%. You didn't answer how much 30% was of. Without more data I can't 
help you. At the best I can take wild guesses but that won't help you. Try 
to answer the questions I asked last week and we can go further. Until then 
all we can do is wildly guess.


-- 
Cheers,
Cy Schubert <Cy.Schubert@komquats.com> or <Cy.Schubert@cschubert.com>
FreeBSD UNIX:  <cy@FreeBSD.org>   Web:  http://www.FreeBSD.org

	The need of the many outweighs the greed of the few.