Date: Fri, 23 Sep 2016 22:55:55 +0200 From: "lokadamus@gmx.de" <lokadamus@gmx.de> To: =?UTF-8?Q?St=c3=a5le_Kristoffersen?= <chiller@driftfun.com>, Anton Yuzhaninov <citrin+bsd@citrin.ru> Cc: "freebsd-questions@freebsd.org" <freebsd-questions@freebsd.org> Subject: Re: Server gets a high load, but no CPU use, and then later stops respond on the network Message-ID: <94870139-26b7-ef0f-dbd9-df599642bac3@gmx.de> In-Reply-To: <20160921093809.GB13386@putsch.kolbu.ws> References: <20160913232351.GA36091@putsch.kolbu.ws> <68f553b9-8546-7707-df86-88851b3283f8@citrin.ru> <20160921093809.GB13386@putsch.kolbu.ws>
next in thread | previous in thread | raw e-mail | index | archive | help
On 09/21/16 11:38, Ståle Kristoffersen wrote: > On 2016-09-20 at 16:57, Anton Yuzhaninov wrote: >> On 2016-09-13 19:23, Stxe5le Bordal Kristoffersen wrote: >>> about once a day, but not in any pattern, it starts getting a load of 5-10 >>> and usually stops responding over the network before I notice it. >> >> Does it stop responding completely (including ping) or only some >> services and ssh doesn't respond? > > It just starts getting more and more lagged. It usually responds to ping, > but ssh can start to time out. Already opened ssh sessions can live quite > long, but running stuff can be a problem after a while. > >> >>> From googling a bit, I have tried to disable msix on the igb network >>> interface, and increased the nmbclusters with no apparent change in behaviour. >>> (kern.ipc.nmbclusters="1000000" and hw.igb.enable_msix=0 in loader.conf) >> >> kern.ipc.nmbclusters on modern FreeBSD version autotuned to very big >> value and manual increasing is rarely need. >> >> Disabling msix on igb is also unlikely need. > > This was more of a "grasping at straws"-move, and only included that for > completeness. > >>> All I see is that the igb0 taskq pid is almost always in the RUN state when >>> the machine is having trouble. >> >> There is no igb0 taskq in top output below. >> >> To see and inspect how top output looks when machine stops responding it >> is useful to run top from cron and log output. >> >> Example script for top logging: >> https://bitbucket.org/snippets/citrin/BpeXb >> >> In top output you should look at WCPU and STATE for kernel threads and >> for unresponding network daemons. > > I've now configured that script to run, and I'll share the results the next > time the server has issues. > >> Also do you have network load graph (bytes and packets per second) for >> this host (I saw munin in process list) - may be load is too high in >> moments when host not responding. > > When this happens network traffic crawls to a stop. I've also checked that > there isn't any other traffic on the network port causing problems. I also > tried doing 'ifconfig igb0 down' on the interface just to see if the server > would unclog itself. > >> Do you use firewalls or netgraph? > > No, nothing configured. > >> Which is the primary function of this server? > > Its a fileserver, sharing files via samba and FTP. > I have no idea. Can you tell me, what dmesg tell you? it looks like there is a system overun, but difficult to understand why. Greetings
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?94870139-26b7-ef0f-dbd9-df599642bac3>