From owner-freebsd-hackers@FreeBSD.ORG Tue Sep 25 14:58:24 2007 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7E00B16A417 for ; Tue, 25 Sep 2007 14:58:24 +0000 (UTC) (envelope-from benjie@addgene.org) Received: from wa-out-1112.google.com (wa-out-1112.google.com [209.85.146.181]) by mx1.freebsd.org (Postfix) with ESMTP id 178A513C468 for ; Tue, 25 Sep 2007 14:58:23 +0000 (UTC) (envelope-from benjie@addgene.org) Received: by wa-out-1112.google.com with SMTP id k17so2468092waf for ; Tue, 25 Sep 2007 07:58:23 -0700 (PDT) Received: by 10.115.47.1 with SMTP id z1mr7311369waj.1190732302607; Tue, 25 Sep 2007 07:58:22 -0700 (PDT) Received: by 10.114.15.16 with HTTP; Tue, 25 Sep 2007 07:58:22 -0700 (PDT) Message-ID: Date: Tue, 25 Sep 2007 10:58:22 -0400 From: "Benjie Chen" To: "Kris Kennaway" In-Reply-To: <46F8D12E.7060202@FreeBSD.org> MIME-Version: 1.0 References: <46F8D12E.7060202@FreeBSD.org> X-Mailman-Approved-At: Wed, 26 Sep 2007 12:41:19 +0000 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-hackers@freebsd.org Subject: Re: Kernel panic on PowerEdge 1950 under certain stress load X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 25 Sep 2007 14:58:24 -0000 You are right, they may not be the same. From first look it seems like they are similar based on the description of the problems -- system stable, then under load related to network, get panic after different time intervals. I just assumed that kernel is typically stable enough that this kind of panic are rare (been using FBSD for 7 or 8 years now and in heavy loads as well, never had kernel panics to deal with). Upon closer look at the trace and the problem, they may not be the same, since one on those web pages was about the route code and my breaks only in one place - waiting for a lock. Again, I will see if I could get a dump when I return to the office. I did reboot the system and set mpsafenet to 0 and I have not had a crash since then (almost a day) running the same load, so that's positive: at least it may be that that's the workaround, and I don't need Dell to send me new memory modules to try... Kris or Ivan: I was wondering if you could briefly explain what your guess the problem might be. I am curious what the cause of the problem is. E.g. it seems like a race condition, but I am curious to know more of the details... Thanks, Benjie On 9/25/07, Kris Kennaway wrote: > > Benjie Chen wrote: > > Ivan and Kris, > > > > I will try to get a kernel trace -- it may not happen for awhile since I > am > > not in the office and working remotely for awhile so it may not be easy > to > > get a trace... but I will check. > > > > It looks like the problem reported by that link, and some of the links > from > > there though... > > Does it really? i.e. did you compare the function names in detail and > find that they match precisely, or do you just mean "they are both > panics of some description and I dunno what it all means"? :) I ask > because the linked trace does not involve a spinlock, which means it > cannot be precisely the same trace. > > Kris > > -- Benjie Chen, Ph.D. Addgene, a better way to share plasmids www.addgene.org