From owner-freebsd-questions@FreeBSD.ORG Tue Dec 23 12:42:24 2008 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2E9C4106568D for ; Tue, 23 Dec 2008 12:42:24 +0000 (UTC) (envelope-from gandalf@shopzeus.com) Received: from shopzeus.com (135-shost.hostoffice.hu [195.228.74.135]) by mx1.freebsd.org (Postfix) with ESMTP id E1E9B8FC19 for ; Tue, 23 Dec 2008 12:42:23 +0000 (UTC) (envelope-from gandalf@shopzeus.com) Received: from [192.168.1.104] (localhost [127.0.0.1]) by shopzeus.com (Postfix) with ESMTP id 0441B392439; Tue, 23 Dec 2008 07:38:04 -0500 (EST) Message-ID: <4950DCAD.2010508@shopzeus.com> Date: Tue, 23 Dec 2008 13:42:21 +0100 From: Laszlo Nagy User-Agent: Thunderbird 2.0.0.18 (X11/20081125) MIME-Version: 1.0 To: Dan Nelson References: <494FA0E0.1060108@shopzeus.com> <20081222175801.GG90803@dan.emsphone.com> In-Reply-To: <20081222175801.GG90803@dan.emsphone.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-questions@freebsd.org Subject: Re: "truss" is buggy? X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 23 Dec 2008 12:42:24 -0000 > It looks like the ptrace() syscall is the problem: > > DESCRIPTION > The ptrace() system call provides tracing and debugging > facilities. It allows one process (the tracing process) to > control another (the traced process). The tracing process must > first attach to the traced process, and then issue a series of > ptrace() system calls to control the execution of the process, as > well as access process memory and register state. For the > duration of the tracing session, the traced process will be > ``re-parented'', with its parent process ID (and resulting > behavior) changed to the tracing process. > > I imagine that also explains why a truss'ed program will die if you > kill -9 the truss process. It looks like the "reset parent when > trussing" behaviour appeared back in 1996 (sys_process.s r1.21). The > fix would probably be to store the pid of the tracing process somewhere > other than p_ppid... > My problem is that there is a process (namely, postgresql stats collector) that may have a bug inside. I was asked on the devel list to send in some traces so they can figure out why it is in an infinite loop, eating 100% CPU time. However, when I start truss-ing this process, getppid() call changes return value. The postgresql stats collector periodically checks if the postmaster (its parent process) is alive or not, and will exit unconditionally if the postmaster has died. After I start truss-ing, the stats collector exits, making it impossible to debug the problem. I'm not able to change the stats collector's source code, because I'm not a C programmer, and because it is a production server and this would be too risky. I also tried to install strace, but it is not available on my platform (amd64). I cannot move to i386, because (apparently) the problem exists on this platform only. Is this a hopeless situation? BTW I'm not an expert, but I believe that the process being debugged should not see any difference, and it should not be able to tell if it is debugged or not. I think this is a bug indeed.