From owner-freebsd-hackers@FreeBSD.ORG Thu May 26 09:29:15 2011 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 52F261065673 for ; Thu, 26 May 2011 09:29:15 +0000 (UTC) (envelope-from ml@my.gd) Received: from mail-vw0-f54.google.com (mail-vw0-f54.google.com [209.85.212.54]) by mx1.freebsd.org (Postfix) with ESMTP id 137C58FC1A for ; Thu, 26 May 2011 09:29:12 +0000 (UTC) Received: by vws18 with SMTP id 18so530444vws.13 for ; Thu, 26 May 2011 02:29:12 -0700 (PDT) MIME-Version: 1.0 Received: by 10.52.98.71 with SMTP id eg7mr867605vdb.47.1306402152393; Thu, 26 May 2011 02:29:12 -0700 (PDT) Received: by 10.52.159.133 with HTTP; Thu, 26 May 2011 02:29:12 -0700 (PDT) In-Reply-To: <4DDE067B.7080605@my.gd> References: <4DDD3021.1000109@my.gd> <4DDE067B.7080605@my.gd> Date: Thu, 26 May 2011 11:29:12 +0200 Message-ID: From: Damien Fleuriot To: freebsd-hackers@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Subject: Re: DEBUG - analysing core dumps X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 26 May 2011 09:29:15 -0000 On 26 May 2011 09:51, Damien Fleuriot wrote: > > > On 5/25/11 7:10 PM, Garrett Cooper wrote: >> On Wed, May 25, 2011 at 9:36 AM, Damien Fleuriot wrote: >>> Hello list, >>> >>> >>> >>> We've got these boxes at work running FreeBSD 8.1-STABLE amd64 and >>> serving as firewalls and openvpn gateways. >>> >>> We use CARP interfaces to provide an active-passive fault tolerant syst= em. >>> >>> >>> Today, we received a nagios alert from the master box saying it's >>> rsyslogd process had crashed. >>> >>> I logged on to it and tried to relaunch it, to no avail: >>> pid 2303 (rsyslogd), uid 0: exited on signal 11 (core dumped) >>> >>> >>> >>> >>> I would like advice on how to debug the output from the core dump. >>> >>> This is what I get from gdb: >>> >>> # gdb >>> GNU gdb 6.1.1 [FreeBSD] >>> Copyright 2004 Free Software Foundation, Inc. >>> GDB is free software, covered by the GNU General Public License, and yo= u are >>> welcome to change it and/or distribute copies of it under certain >>> conditions. >>> Type "show copying" to see the conditions. >>> There is absolutely no warranty for GDB. =A0Type "show warranty" for de= tails. >>> This GDB was configured as "amd64-marcel-freebsd". >>> (gdb) core rsyslogd.core >>> Core was generated by `rsyslogd'. >>> Program terminated with signal 11, Segmentation fault. >>> #0 =A00x00000000004258ec in ?? () >>> >>> >>> >>> >>> Sadly, getting a backtrace with "bt" gives me more lines with "??", >>> which is totally not helpful: >>> [SNIP] >>> #13 0x00007fffff1f9d70 in ?? () >>> #14 0x0000000000000000 in ?? () >>> #15 0x6f70732f7261762f in ?? () >>> #16 0x6c737973722f6c6f in ?? () >>> #17 0x5f6e70766f2f676f in ?? () >>> #18 0x746174732e676f6c in ?? () >>> #19 0x0000000000000065 in ?? () >>> #20 0x0000000000000000 in ?? () >>> [SNIP] >>> >>> I am not sure what steps I should follow to get more information ? >>> >>> >>> >>> Also, I believe that often, core dumps with signal 11 =3D RAM problems = and >>> I would like a confirmation here. >>> >>> I am concerned because rsyslogd is the only process that crashes in thi= s >>> way, even after I rebooted the firewall. >> >> =A0 =A0 Rebuild and reinstall rsyslogd with debug symbols and see if you >> can get a reasonable stack trace. Something else to try before that to >> narrow down the problem section of code is ktrace/kdump it, or truss >> it, and see if it's trying to open/read from a file and failing. >> Thanks, >> -Garrett > > > > > Thanks everyone for your answers, I'll recompile with DEBUG and obtain a > new core dump. > > I'll also investigate the possibility of corrupted spool files and post > the resolution here :) > > > -- > dfl > Turns out that after rebuilding rsyslog4-relp with -DWITH_DEBUG , the new daemon works just fine and doesn't sig11 anymore. Odd, but well, solves my problem. I will upgrade it on all the other boxes then. Thanks for the help guys -- dfl