Date: Wed, 21 Sep 2016 11:38:09 +0200 From: =?utf-8?B?U3TDpWxl?= Kristoffersen <chiller@driftfun.com> To: Anton Yuzhaninov <citrin+bsd@citrin.ru> Cc: "freebsd-questions@freebsd.org" <freebsd-questions@freebsd.org> Subject: Re: Server gets a high load, but no CPU use, and then later stops respond on the network Message-ID: <20160921093809.GB13386@putsch.kolbu.ws> In-Reply-To: <68f553b9-8546-7707-df86-88851b3283f8@citrin.ru> References: <20160913232351.GA36091@putsch.kolbu.ws> <68f553b9-8546-7707-df86-88851b3283f8@citrin.ru>
next in thread | previous in thread | raw e-mail | index | archive | help
On 2016-09-20 at 16:57, Anton Yuzhaninov wrote: > On 2016-09-13 19:23, Stxe5le Bordal Kristoffersen wrote: > > about once a day, but not in any pattern, it starts getting a load of 5-10 > > and usually stops responding over the network before I notice it. > > Does it stop responding completely (including ping) or only some > services and ssh doesn't respond? It just starts getting more and more lagged. It usually responds to ping, but ssh can start to time out. Already opened ssh sessions can live quite long, but running stuff can be a problem after a while. > > > From googling a bit, I have tried to disable msix on the igb network > > interface, and increased the nmbclusters with no apparent change in behaviour. > > (kern.ipc.nmbclusters="1000000" and hw.igb.enable_msix=0 in loader.conf) > > kern.ipc.nmbclusters on modern FreeBSD version autotuned to very big > value and manual increasing is rarely need. > > Disabling msix on igb is also unlikely need. This was more of a "grasping at straws"-move, and only included that for completeness. > > All I see is that the igb0 taskq pid is almost always in the RUN state when > > the machine is having trouble. > > There is no igb0 taskq in top output below. > > To see and inspect how top output looks when machine stops responding it > is useful to run top from cron and log output. > > Example script for top logging: > https://bitbucket.org/snippets/citrin/BpeXb > > In top output you should look at WCPU and STATE for kernel threads and > for unresponding network daemons. I've now configured that script to run, and I'll share the results the next time the server has issues. > Also do you have network load graph (bytes and packets per second) for > this host (I saw munin in process list) - may be load is too high in > moments when host not responding. When this happens network traffic crawls to a stop. I've also checked that there isn't any other traffic on the network port causing problems. I also tried doing 'ifconfig igb0 down' on the interface just to see if the server would unclog itself. > Do you use firewalls or netgraph? No, nothing configured. > Which is the primary function of this server? Its a fileserver, sharing files via samba and FTP. -- Ståle Kristoffersen
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20160921093809.GB13386>