Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 23 Sep 2016 22:55:55 +0200
From:      "lokadamus@gmx.de" <lokadamus@gmx.de>
To:        =?UTF-8?Q?St=c3=a5le_Kristoffersen?= <chiller@driftfun.com>, Anton Yuzhaninov <citrin+bsd@citrin.ru>
Cc:        "freebsd-questions@freebsd.org" <freebsd-questions@freebsd.org>
Subject:   Re: Server gets a high load, but no CPU use, and then later stops respond on the network
Message-ID:  <94870139-26b7-ef0f-dbd9-df599642bac3@gmx.de>
In-Reply-To: <20160921093809.GB13386@putsch.kolbu.ws>
References:  <20160913232351.GA36091@putsch.kolbu.ws> <68f553b9-8546-7707-df86-88851b3283f8@citrin.ru> <20160921093809.GB13386@putsch.kolbu.ws>

next in thread | previous in thread | raw e-mail | index | archive | help
On 09/21/16 11:38, Ståle Kristoffersen wrote:
> On 2016-09-20 at 16:57, Anton Yuzhaninov wrote:
>> On 2016-09-13 19:23, Stxe5le Bordal Kristoffersen wrote:
>>> about once a day, but not in any pattern, it starts getting a load of 5-10
>>> and usually stops responding over the network before I notice it.
>>
>> Does it stop responding completely (including ping) or only some 
>> services and ssh doesn't respond?
> 
> It just starts getting more and more lagged. It usually responds to ping,
> but ssh can start to time out. Already opened ssh sessions can live quite
> long, but running stuff can be a problem after a while.
> 
>>
>>> From googling a bit, I have tried to disable msix on the igb network
>>> interface, and increased the nmbclusters with no apparent change in behaviour.
>>> (kern.ipc.nmbclusters="1000000" and hw.igb.enable_msix=0 in loader.conf)
>>
>> kern.ipc.nmbclusters on modern FreeBSD version autotuned to very big 
>> value and manual increasing is rarely need.
>>
>> Disabling msix on igb is also unlikely need.
> 
> This was more of a "grasping at straws"-move, and only included that for
> completeness.
> 
>>> All I see is that the igb0 taskq pid is almost always in the RUN state when
>>> the machine is having trouble.
>>
>> There is no igb0 taskq in top output below.
>>
>> To see and inspect how top output looks when machine stops responding it 
>> is useful to run top from cron and log output.
>>
>> Example script for top logging:
>> https://bitbucket.org/snippets/citrin/BpeXb
>>
>> In top output you should look at WCPU and STATE for kernel threads and 
>> for unresponding network daemons.
> 
> I've now configured that script to run, and I'll share the results the next
> time the server has issues.
> 
>> Also do you have network load graph (bytes and packets per second) for 
>> this host (I saw munin in process list) - may be load is too high in 
>> moments when host not responding.
> 
> When this happens network traffic crawls to a stop. I've also checked that
> there isn't any other traffic on the network port causing problems. I also
> tried doing 'ifconfig igb0 down' on the interface just to see if the server
> would unclog itself.
> 
>> Do you use firewalls or netgraph?
> 
> No, nothing configured.
> 
>> Which is the primary function of this server?
> 
> Its a fileserver, sharing files via samba and FTP.
> 
I have no idea. Can you tell me, what dmesg tell you? it looks like
there is a system overun, but difficult to understand why.

Greetings



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?94870139-26b7-ef0f-dbd9-df599642bac3>