From owner-freebsd-current@FreeBSD.ORG Tue Jul 28 09:37:04 2009 Return-Path: Delivered-To: freebsd-current@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 22D41106566C for ; Tue, 28 Jul 2009 09:37:04 +0000 (UTC) (envelope-from stb@lassitu.de) Received: from koef.zs64.net (koef.zs64.net [212.12.50.230]) by mx1.freebsd.org (Postfix) with ESMTP id C59BB8FC1B for ; Tue, 28 Jul 2009 09:37:03 +0000 (UTC) (envelope-from stb@lassitu.de) Received: from localhost by koef.zs64.net (8.14.3/8.14.3) with ESMTP id n6S9b0kX057474 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO); Tue, 28 Jul 2009 11:37:01 +0200 (CEST) (envelope-from stb@lassitu.de) (authenticated as stb) Message-Id: From: Stefan Bethke To: FreeBSD Current In-Reply-To: <05D1F58D-58E1-4723-B2D8-56434120721D@lassitu.de> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v935.3) Date: Tue, 28 Jul 2009 11:37:00 +0200 References: <81219750-3AA7-4AEB-9104-4B5C98722242@lassitu.de> <20090526204937.GA31832@atarininja.org> <05D1F58D-58E1-4723-B2D8-56434120721D@lassitu.de> X-Mailer: Apple Mail (2.935.3) Cc: Wesley Shields , jarrod@netleader.com.au Subject: Re: nagios dies with signal 10 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Jul 2009 09:37:04 -0000 Am 26.05.2009 um 23:19 schrieb Stefan Bethke: > Am 26.05.2009 um 22:49 schrieb Wesley Shields: > >> On Tue, May 26, 2009 at 06:37:24AM +0200, Stefan Bethke wrote: >>> I just noticed that my nagios keeps dying about five minutes after >>> startup with signal 10. Up-to-date current from May 21. >>> >>> I've tried portupgrade -fR nagios, but that alone does not seem to >>> be >>> sufficient to fix it. I've tried nagios both with and without >>> embedded perl. >> >> I'm assuming you've got the latest nagios port when you do this? I >> committed a fix for this and AFAIK the problem has been resolved >> since >> then. > > $ pkg_info|grep nagio > nagios-3.0.6_1 Extremely powerful network monitoring system > nagios-plugins-1.4.13,1 Plugins for Nagios > > $ head /usr/ports/net-mgmt/nagios/Makefile > # New ports collection makefile for: nagios > # Date created: 19 May 2002 > # Whom: Blaz Zupan > # > # $FreeBSD: ports/net-mgmt/nagios/Makefile,v 1.79 2009/05/04 > 15:36:05 wxs Exp $ > > As I said, this only started after I updated to -current on May 21. > With the earlier current (from around April) it was working fine. > > I will try to ktrace nagios on the weekend. I finally got round looking into this again; nagios is still broken for me. I'm on: FreeBSD krokodil.zs64.net 8.0-CURRENT FreeBSD 8.0-CURRENT #12: Fri Jun 12 06:29:20 UTC 2009 root@lokschuppen.zs64.net:/usr/obj/usr/src/ sys/EISENBOOT amd64 with sources from June 11. I've tried net-mgmt/nagios and net-mgmt/nagios-devel with and without embedded perl, to no avail. Nagios starts up, runs a few (almost all?) checks, then crashed with a bus error. The debug log does not contain anything useful. I've compiled nagios with CFLAGS=-g, and I get this in gdb: # gdb base/nagios GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd"... (gdb) r /usr/local/etc/nagios/nagios.cfg Starting program: /var/ports/work/net-mgmt/nagios-devel/nagios-3.1.2/ base/nagios /usr/local/etc/nagios/nagios.cfg [New LWP 100227] [New Thread 800c021c0 (LWP 100227)] Nagios 3.1.2 Copyright (c) 2009 Nagios Core Development Team and Community Contributors Copyright (c) 1999-2009 Ethan Galstad Last Modified: 06-23-2009 License: GPL Website: http://www.nagios.org Nagios 3.1.2 starting... (PID=55280) Local time is Tue Jul 28 09:21:56 UTC 2009 Warning: Host 'tivo' has no services associated with it! [New Thread 800c511c0 (LWP 100449)] Program received signal SIGBUS, Bus error. [Switching to Thread 800c021c0 (LWP 100227)] 0x000000000044869a in get_next_comment_by_host (host_name=0x800c6f960 "slingbox", start=0x800c488e0) at ../common/comments.c:632 632 for(;temp_comment && compare_hashdata(temp_comment- >host_name,NULL,host_name,NULL)<0;temp_comment=temp_comment->nexthash); (gdb) bt #0 0x000000000044869a in get_next_comment_by_host (host_name=0x800c6f960 "slingbox", start=0x800c488e0) at ../common/ comments.c:632 #1 0x0000000000447ad1 in delete_host_acknowledgement_comments (hst=0x800c16800) at ../common/comments.c:301 #2 0x00000000004362e6 in handle_host_state (hst=0x800c16800) at sehandlers.c:731 #3 0x0000000000412955 in process_host_check_result_3x (hst=0x800c16800, new_state=0, old_plugin_output=0x800c486a0 "CRITICAL - slingbox.lassitu.de: Host unreachable @ 44.128.127.15. rta nan, lost 100%", check_options=0, reschedule_check=1, use_cached_result=1, check_timestamp_horizon=15) at checks.c:3744 #4 0x00000000004117ac in handle_async_host_check_result_3x (temp_host=0x800c16800, queued_check_result=0x800c6b0c0) at checks.c: 3380 #5 0x000000000040a7c7 in reap_check_results () at checks.c:206 #6 0x000000000042574d in handle_timed_event (event=0x800c86fe0) at events.c:1307 #7 0x0000000000424cc8 in event_execution_loop () at events.c:1002 #8 0x000000000040a3c2 in main (argc=2, argv=0x7fffffffea58) at nagios.c:833 (gdb) p temp_comment $1 = (comment *) 0x5a5a5a5a5a5a5a5a I've been told that nagios simply isn't 64bis clean, and it's a lost cause. I can't believe nobody is running nagios on amd64, though. Thanks, Stefan -- Stefan Bethke Fon +49 151 14070811