From owner-freebsd-questions@freebsd.org Tue Sep 20 20:57:17 2016 Return-Path: Delivered-To: freebsd-questions@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 59717BE272E for ; Tue, 20 Sep 2016 20:57:17 +0000 (UTC) (envelope-from citrin+bsd@citrin.ru) Received: from hz.citrin.ru (hz.citrin.ru [IPv6:2a01:4f8:d16:10c3::2]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 224A1B53 for ; Tue, 20 Sep 2016 20:57:17 +0000 (UTC) (envelope-from citrin+bsd@citrin.ru) Received: from [192.168.0.144] (c-24-60-168-172.hsd1.ct.comcast.net [24.60.168.172]) (Authenticated sender: citrin@citrin.ru) by hz.citrin.ru (Postfix) with ESMTPSA id 267DF286FFD for ; Tue, 20 Sep 2016 20:57:15 +0000 (UTC) Subject: Re: Server gets a high load, but no CPU use, and then later stops respond on the network To: "freebsd-questions@freebsd.org" References: <20160913232351.GA36091@putsch.kolbu.ws> From: Anton Yuzhaninov Message-ID: <68f553b9-8546-7707-df86-88851b3283f8@citrin.ru> Date: Tue, 20 Sep 2016 16:57:12 -0400 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 MIME-Version: 1.0 In-Reply-To: <20160913232351.GA36091@putsch.kolbu.ws> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=citrin.ru; s=s0; t=1474405035; bh=bEiTBAkViPdd4eIsgdVnN8+x3UqQHu2cU1df/g5J/Sw=; h=Subject:To:References:From:Message-ID:Date:MIME-Version:In-Reply-To:Content-Type:Content-Transfer-Encoding; b=dbNs5BquOIlb0X75nrdlOV3itIH2Lf2u46Vyp1WW/wfe8ZMHP3oJ5GBtc1jU1eiqHTJ7JUc3JXeVmlYKL3NbIClCASUnHgQHSBm01w9WdyyhljV5HGYnbN7KJcj53Yoc7EsnM8WB3da4ARsTARDIj+Ikcroyp4EtuUvtfM7y1QY= X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 Sep 2016 20:57:17 -0000 On 2016-09-13 19:23, Stxe5le Bordal Kristoffersen wrote: > about once a day, but not in any pattern, it starts getting a load of 5-10 > and usually stops responding over the network before I notice it. Does it stop responding completely (including ping) or only some services and ssh doesn't respond? > From googling a bit, I have tried to disable msix on the igb network > interface, and increased the nmbclusters with no apparent change in behaviour. > (kern.ipc.nmbclusters="1000000" and hw.igb.enable_msix=0 in loader.conf) kern.ipc.nmbclusters on modern FreeBSD version autotuned to very big value and manual increasing is rarely need. Disabling msix on igb is also unlikely need. > All I see is that the igb0 taskq pid is almost always in the RUN state when > the machine is having trouble. There is no igb0 taskq in top output below. To see and inspect how top output looks when machine stops responding it is useful to run top from cron and log output. Example script for top logging: https://bitbucket.org/snippets/citrin/BpeXb In top output you should look at WCPU and STATE for kernel threads and for unresponding network daemons. Also do you have network load graph (bytes and packets per second) for this host (I saw munin in process list) - may be load is too high in moments when host not responding. Do you use firewalls or netgraph? Which is the primary function of this server?