From owner-freebsd-amd64@FreeBSD.ORG Thu Sep 29 18:09:26 2005 Return-Path: X-Original-To: freebsd-amd64@FreeBSD.org Delivered-To: freebsd-amd64@FreeBSD.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id EA7FB16A41F; Thu, 29 Sep 2005 18:09:26 +0000 (GMT) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [204.156.12.53]) by mx1.FreeBSD.org (Postfix) with ESMTP id 11F7943D6E; Thu, 29 Sep 2005 18:09:24 +0000 (GMT) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [204.156.12.50]) by cyrus.watson.org (Postfix) with ESMTP id 429BC46B9D; Thu, 29 Sep 2005 14:09:24 -0400 (EDT) Date: Thu, 29 Sep 2005 19:09:24 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Rob Watt In-Reply-To: <20050928134724.P56436@daemon.mistermishap.net> Message-ID: <20050929185538.R61419@fledge.watson.org> References: <20050925115912.H11229@fledge.watson.org> <20050927140535.G50334@daemon.mistermishap.net> <20050927203128.S61419@fledge.watson.org> <20050927222624.R34322@fledge.watson.org> <20050928134724.P56436@daemon.mistermishap.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-hackers@FreeBSD.org, mikep@hudson-trading.com, freebsd-amd64@FreeBSD.org, Jason Carroll Subject: Re: freebsd-5.4-stable panics X-BeenThere: freebsd-amd64@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Porting FreeBSD to the AMD64 platform List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Sep 2005 18:09:27 -0000 On Wed, 28 Sep 2005, Rob Watt wrote: > We re-compiled the kernel with 'options KDB_STOP_NMI', and were able to > get a much more full analysis of what was happening on the 6-BETA5 > crash. Great. > We crashed in top again, and it does look like we may have hit a > kern_proc bug. This sounds good, or at least, promising. > in the attached file type3-core.txt you can see that it hits an > exception in: > > 0xffffffff802b897a is in fill_kinfo_thread > (/usr/src/sys/kern/kern_proc.c:736). > 731 } > 732 > 733 kg = td->td_ksegrp; > 734 > 735 /* things in the KSE GROUP */ > 736 kp->ki_estcpu = kg->kg_estcpu; > 737 kp->ki_slptime = kg->kg_slptime; > 738 kp->ki_pri.pri_user = kg->kg_user_pri; > 739 kp->ki_pri.pri_class = kg->kg_pri_class; > 740 > (kgdb) frame 8 > #8 0xffffffff802b897a in fill_kinfo_thread (td=0xffffff0063311260, > kp=0xffffffffb62d8510) > at /usr/src/sys/kern/kern_proc.c:733 > 733 kg = td->td_ksegrp; > (kgdb) p kg->kg_estcpu > Cannot access memory at address 0x173 > (kgdb) p td->td_ksegrp > $1 = (struct ksegrp *) 0x0 > (kgdb) p kp->ki_estcpu > $2 = 0 > (kgdb) p kg > $4 = (struct ksegrp *) 0x12b > > it seems that kg is an invalid pointer. Could you dump the contents of *td and *td->td_proc for me? I'm quite interested to know what the value in td->td_proc->p_state is, among other things. If I could also have you generate a dump of the KSE group structures in td->td_proc->p_ksegrps and the threads in td->td_proc->p_threads. Could you tell me if the program named by p->p_comm is linked against a threading library? If it's a custom app, you may already know, and if not, you can run ldd on the application to see what it is linked against. Depending on how much time you have available, it might make sense for me to grab from you a copy of your source tree, compiled kernel with debug symbols, and core dump. > We have started our tests again without running top. > > Hope you have a great vacation. It was brief but very enjoyable, and quite disconnected :-). Thanks, Robert