From owner-freebsd-questions@FreeBSD.ORG Wed Mar 4 09:53:38 2009 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id F38C2106566C for ; Wed, 4 Mar 2009 09:53:37 +0000 (UTC) (envelope-from zszalbot@gmail.com) Received: from fg-out-1718.google.com (fg-out-1718.google.com [72.14.220.155]) by mx1.freebsd.org (Postfix) with ESMTP id 879508FC0C for ; Wed, 4 Mar 2009 09:53:37 +0000 (UTC) (envelope-from zszalbot@gmail.com) Received: by fg-out-1718.google.com with SMTP id l26so163505fgb.35 for ; Wed, 04 Mar 2009 01:53:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:date:message-id:subject :from:to:content-type:content-transfer-encoding; bh=T0r83luA+TtmTb9QDDYla4LS1i/3QGBhPBlNfLhemUs=; b=NGQP0a4Uyhv1W0sZJMNXlJgGqVtMXWX5GSr0cpX8+dNE6arC5t/2riWqAMjJd2yb1X qnBFr85YhQES15oPeKzBzrR3Ipak6HJQH8OUVgpfnO9Pi/VXIlXIf9cQIz122VEK4Qnv 00rImfRb6NvmARpEuDyHbgMSg7hrFpMssovyU= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type :content-transfer-encoding; b=GSiZ1StTG9ad1i3157fPO9MVuqsLyEZrWmJqLJNU/GMhBpUjWNu5adusMKt65XKV6J 1/GI4Y+nNn0+yg+rfvpmrpzMHODJfG+oWU+5Xf3CpnaEtUBCKaEj0QIWhJoTBUR8IC9z wHfsd9Ud6Uh/O3tPYSVz/9Y/ERs0GxERKFKLw= MIME-Version: 1.0 Received: by 10.86.74.15 with SMTP id w15mr2213647fga.60.1236160416653; Wed, 04 Mar 2009 01:53:36 -0800 (PST) Date: Wed, 4 Mar 2009 10:53:36 +0100 Message-ID: <94136a2c0903040153l7844c353k81769342c424f62@mail.gmail.com> From: Zbigniew Szalbot To: User Questions Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Subject: tool to determine server stability issues X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 04 Mar 2009 09:53:38 -0000 Hello, I am not sure if it was upgrade to perl 5.8.9 which started my problem, but anyway I am spotting a strange server behaviour. It will usually last about 5 minutes during which the system becomes unresponsive. Top tells me there are two perl processes run by user www both of which use 100% of a CPU%. The server has four CPUs so that's ok. What is strange, though is that during such a storm the outgoing bandwidth is all taken up and this is the reason server becomes unresponsive. Normally, it does happen that the bandwidth is taken almost completely by remote backup job but I have priority queueing with pf and it has never been a problem. A site will be served fast even though the bandwidth is taken up, because httpd traffic has higher priority. Also, in this particular case, backup job is not involved (especially that the perl processes are run by user www) so it must be something else. I have looked through apache's logs but I cannot seem to find anything strange (normal traffic without any type of DoS activity, etc.). I have turned on debugging in HotSanic which I use for traffic/system measurement but it would not generate outgoing traffic. I guess I am looking for advice how to debug this. I often spot the problem when it is about to end so I do not have enough time to start some a more detailed monitoring (also I am not sure which tool would be best to use). I'd appreciate any advice on how to troubleshoot and find out the source of the problem. Today, I have managed to run netstat during the outage (the ssh session was on so I was able to continue, otherwise I wouldn't get to the server). I can provide its output if it is of any use. I have never had anything like this before so I am in the dark here. I use FreeBSD 7.0-RELEASE-p9 #3. Many thanks in advance! -- Zbigniew Szalbot www.slowo.pl www.fairtrade.net.pl