From owner-freebsd-questions@freebsd.org Wed Sep 21 09:38:12 2016 Return-Path: Delivered-To: freebsd-questions@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4E74CBE2DB1 for ; Wed, 21 Sep 2016 09:38:12 +0000 (UTC) (envelope-from chiller@driftfun.com) Received: from mail-ext01.uio.no (mail-ext01.uio.no [IPv6:2001:700:100:10::41]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 0FFE233C for ; Wed, 21 Sep 2016 09:38:12 +0000 (UTC) (envelope-from chiller@driftfun.com) Received: from mail-mx2.uio.no ([129.240.10.30]) by mail-out01.uio.no with esmtp (Exim 4.82_1-5b7a7c0-XX) (envelope-from ) id 1bmdyk-0005NP-6P; Wed, 21 Sep 2016 11:38:10 +0200 Received: from putsch.kolbu.ws ([158.36.191.193]) by mail-mx2.uio.no with esmtps (TLSv1.2:DHE-RSA-AES256-GCM-SHA384:256) (Exim 4.80) (envelope-from ) id 1bmdyj-00059J-NU; Wed, 21 Sep 2016 11:38:10 +0200 Received: from chiller by putsch.kolbu.ws with local (Exim 4.87 (FreeBSD)) (envelope-from ) id 1bmdyj-0003il-8q; Wed, 21 Sep 2016 11:38:09 +0200 Date: Wed, 21 Sep 2016 11:38:09 +0200 From: =?utf-8?B?U3TDpWxl?= Kristoffersen To: Anton Yuzhaninov Cc: "freebsd-questions@freebsd.org" Subject: Re: Server gets a high load, but no CPU use, and then later stops respond on the network Message-ID: <20160921093809.GB13386@putsch.kolbu.ws> References: <20160913232351.GA36091@putsch.kolbu.ws> <68f553b9-8546-7707-df86-88851b3283f8@citrin.ru> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <68f553b9-8546-7707-df86-88851b3283f8@citrin.ru> User-Agent: Mutt/1.6.1 (2016-04-27) X-UiO-Spam-info: not spam, SpamAssassin (score=-5.0, required=5.0, autolearn=disabled, UIO_MAIL_IS_INTERNAL=-5, uiobl=NO, uiouri=NO) X-UiO-Scanned: 3358F8AFEEFD4C8F4E797B7A5D11B842BEB44DA3 X-UiO-SPAM-Test: remote_host: 158.36.191.193 spam_score: -49 maxlevel 80 minaction 2 bait 0 mail/h: 3 total 34664 max/h 377 blacklist 0 greylist 0 ratelimit 0 X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 21 Sep 2016 09:38:12 -0000 On 2016-09-20 at 16:57, Anton Yuzhaninov wrote: > On 2016-09-13 19:23, Stxe5le Bordal Kristoffersen wrote: > > about once a day, but not in any pattern, it starts getting a load of 5-10 > > and usually stops responding over the network before I notice it. > > Does it stop responding completely (including ping) or only some > services and ssh doesn't respond? It just starts getting more and more lagged. It usually responds to ping, but ssh can start to time out. Already opened ssh sessions can live quite long, but running stuff can be a problem after a while. > > > From googling a bit, I have tried to disable msix on the igb network > > interface, and increased the nmbclusters with no apparent change in behaviour. > > (kern.ipc.nmbclusters="1000000" and hw.igb.enable_msix=0 in loader.conf) > > kern.ipc.nmbclusters on modern FreeBSD version autotuned to very big > value and manual increasing is rarely need. > > Disabling msix on igb is also unlikely need. This was more of a "grasping at straws"-move, and only included that for completeness. > > All I see is that the igb0 taskq pid is almost always in the RUN state when > > the machine is having trouble. > > There is no igb0 taskq in top output below. > > To see and inspect how top output looks when machine stops responding it > is useful to run top from cron and log output. > > Example script for top logging: > https://bitbucket.org/snippets/citrin/BpeXb > > In top output you should look at WCPU and STATE for kernel threads and > for unresponding network daemons. I've now configured that script to run, and I'll share the results the next time the server has issues. > Also do you have network load graph (bytes and packets per second) for > this host (I saw munin in process list) - may be load is too high in > moments when host not responding. When this happens network traffic crawls to a stop. I've also checked that there isn't any other traffic on the network port causing problems. I also tried doing 'ifconfig igb0 down' on the interface just to see if the server would unclog itself. > Do you use firewalls or netgraph? No, nothing configured. > Which is the primary function of this server? Its a fileserver, sharing files via samba and FTP. -- Ståle Kristoffersen