From owner-freebsd-stable@FreeBSD.ORG Wed May 25 20:03:45 2005 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id DE0FB16A41C for ; Wed, 25 May 2005 20:03:45 +0000 (GMT) (envelope-from stephen@math.missouri.edu) Received: from sccmmhc92.asp.att.net (sccmmhc92.asp.att.net [204.127.203.212]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9033143D48 for ; Wed, 25 May 2005 20:03:45 +0000 (GMT) (envelope-from stephen@math.missouri.edu) Received: from [10.0.0.4] (12-216-244-56.client.mchsi.com[12.216.244.56]) by sccmmhc92.asp.att.net (sccmmhc92) with ESMTP id <20050525200342m9200gd979e>; Wed, 25 May 2005 20:03:43 +0000 Message-ID: <4294DA1D.1030202@math.missouri.edu> Date: Wed, 25 May 2005 15:03:41 -0500 From: Stephen Montgomery-Smith User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.7.8) Gecko/20050521 X-Accept-Language: en-us, en MIME-Version: 1.0 To: freebsd-stable@FreeBSD.ORG Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Cc: Subject: releng 5 panic (again) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 25 May 2005 20:03:46 -0000 Please help me! I know that I am getting few responses to my emails - I am guessing that my situation is difficult. If you could offer any ideas how to help with further diagnostics. I am regularly getting panics with instruction pointer equal to 0xc0611c69. I am not able to get any dumps - the dumpon directive is simply ignored. (I did get one dump (for some reason), but that was with a kernel that was not made with config -g, and new kernels made afterwards seem significantly different, despite having exactly the same size.) The code at this instruction pointer is (kgdb) list *0xc0611c69 0xc0611c69 is in fill_kinfo_thread (../../../kern/kern_proc.c:748). 743 } 744 745 kg = td->td_ksegrp; 746 747 /* things in the KSE GROUP */ 748 kp->ki_estcpu = kg->kg_estcpu; 749 kp->ki_slptime = kg->kg_slptime; 750 kp->ki_pri.pri_user = kg->kg_user_pri; 751 kp->ki_pri.pri_class = kg->kg_pri_class; 752 so I'm guessing that kp is not correct. Because of the consistency of the instruction pointer value from panic to panic, I really am thinking that this is not a hardware issue. I will try any reasonable test you guys have for me. Right now I am switching off HTT to see if that is the issue. This is a dual Xeon system. I am willing to provide a copy of the program that I'm guessing is causing the problem. It is a multithreaded program that is very CPU instensive, although most of the inners of the code are from the fftw3 port. One interesting thing about this program is that when I run it, top says that about 45% CPU is being used (which with 4 logical CPU's means that almost 2 CPU's are being used), but that actual program is registered at running with about 80% CPU time (which I am guessing means 0.8 of one CPU is being used). It seems to me that there is some disparity in the accounting. Maybe it is a problem with the math/fftw3 code. But is still shouldn't causes crashes. Please help me. I am sure that this is a difficult problem, but I just don't know how to provide you any further decent diagnostic information. Thanks, Stephen