From owner-freebsd-stable@FreeBSD.ORG Thu Jan 3 00:35:24 2008 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1492716A418; Thu, 3 Jan 2008 00:35:24 +0000 (UTC) (envelope-from jarrod@netleader.com.au) Received: from wallace.netleader.com.au (wallace.netleader.com.au [203.122.246.247]) by mx1.freebsd.org (Postfix) with ESMTP id 383B913C4EB; Thu, 3 Jan 2008 00:35:22 +0000 (UTC) (envelope-from jarrod@netleader.com.au) Received: from wallace.netleader.com.au (localhost [127.0.0.1]) by wallace.netleader.com.au (8.14.2/8.14.2) with ESMTP id m030ZIWD046899 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 3 Jan 2008 11:05:20 +1030 (CST) (envelope-from jarrod@netleader.com.au) Received: from localhost (jarrod@localhost) by wallace.netleader.com.au (8.14.2/8.14.2/Submit) with ESMTP id m030ZG3Z046891; Thu, 3 Jan 2008 11:05:16 +1030 (CST) (envelope-from jarrod@netleader.com.au) X-Authentication-Warning: wallace.netleader.com.au: jarrod owned process doing -bs Date: Thu, 3 Jan 2008 11:05:16 +1030 (CST) From: Jarrod Sayers To: Tom Judge In-Reply-To: <477C1629.1030604@tomjudge.com> Message-ID: <20080103104129.T36551@wallace.netleader.com.au> References: <59DD6CCE263ECD75A7283A7B@ganymede.hub.org> <477A72B8.8010307@protected-networks.net> <477BAD2B.8070603@tomjudge.com> <1DB78354-EBA2-43D0-A2D6-EFDA4950135B@netleader.com.au> <477C1629.1030604@tomjudge.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: "Marc G. Fournier" , freebsd-stable@freebsd.org, freebsd-questions@freebsd.org Subject: Re: Nagios + 6.3-RELEASE == Hung Process X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 03 Jan 2008 00:35:24 -0000 On Wed, 2 Jan 2008, Tom Judge wrote: > Jarrod Sayers wrote: >> I hope I can confirm your frustrations. There is a threading issue >> with Nagios when it's binaries are linked against libpthread(3) >> threading library, the default on recent FreeBSD 5.x releases and all >> 6.x releases. The issue is random and extremely difficult to track down >> with the symptoms being a second Nagios process sitting on the system >> hanging a CPU. Be rest assured that I have been working on it, and >> have seen it on one system of mine. > > Not sure if this is related at all but out of the 3 nagios deployments > we have here I have only ever seen it on one (It currently has 2 nagios > threads spinning CPU time atm). > > The differences on that server are: > > * It is amd64 compared to i386 > * It also runs ndo2db from ndoutils 1.4b7 > > All the systems run 6.2-RELEASE-p5 and nagios-2.9_1, they are also all > patched with gnu libltdl patch below. > > Don't know if that info is of any use to you. That's actually good to know, as you're now (unless I am mistaken) the first user to contact me about this problem on non-i386 systems. One user, plus myself, have also seen the issue under Nagios 3.x, both on i386 systems though. I also have a net-mgmt/ndoutils port in the works (less the database support for now) which also has the same issue so using broker modules doesn't seem to affect the outcome. My gut feeling is that it's not an architecture issue but more an interoperability issue between the Nagios threading code and the libpthread() threading library. [yoink] >> I did receive that email and the changes went in with the last commit >> of net-mgmt/nagios-devel to test. No issues have arisen so i'll be >> back-porting it to net-mgmt/nagios soon for you. There also has been a >> rather large ports freeze which delayed the upgrade to Nagios 2.10, >> that PR was submitted on the 1st of November and committed on the 13th >> of December. Unfortunately your email fell somewhere in the middle, >> apologies for not letting you know. > > Thanks for this, I currently maintain the patch on our build servers. No worries, I will look at bundling in the change with the libthr() fix over the next few days. Thanks for pointing that out too as it was a bug instead of a feature request, as on systems where the library was available, the build process would link to it. Hmm... Jarrod.