From owner-freebsd-questions@FreeBSD.ORG Wed Jun 22 16:11:06 2005 Return-Path: X-Original-To: freebsd-questions@freebsd.org Delivered-To: freebsd-questions@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 830F616A41C for ; Wed, 22 Jun 2005 16:11:06 +0000 (GMT) (envelope-from fenix@ramb.com.ua) Received: from zero.ramb.com.ua (zero.ramb.com.ua [62.149.0.90]) by mx1.FreeBSD.org (Postfix) with ESMTP id CA45F43D48 for ; Wed, 22 Jun 2005 16:11:05 +0000 (GMT) (envelope-from fenix@ramb.com.ua) Received: from server.webinfo.int (ip.82.144.202.143.stat-9.volia.net [82.144.202.143]) by zero.ramb.com.ua (8.13.3/8.13.3) with ESMTP id j5MJGAIr064062; Wed, 22 Jun 2005 19:16:16 GMT (envelope-from fenix@ramb.com.ua) Date: Wed, 22 Jun 2005 19:10:46 +0300 From: fenix@ramb.com.ua X-Mailer: The Bat! (v3.0.1.33) Professional X-Priority: 3 (Normal) Message-ID: <945588776.20050622191046@ramb.com.ua> To: Matt Juszczak In-Reply-To: <42B98AD0.7080508@atopia.net> References: <42B98AD0.7080508@atopia.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.80/893/Tue May 24 06:27:20 2005 clamav-milter version 0.80j on zero.ramb.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=0.2 required=6.3 tests=FORGED_RCVD_HELO, NO_REAL_NAME autolearn=no version=3.0.2 X-Spam-Checker-Version: SpamAssassin 3.0.2 (2004-11-16) on zero.ramb.com.ua Cc: freebsd-questions@freebsd.org Subject: Re[2]: FreeBSD Machines dieing, we've tried so much.... X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: fenix@ramb.com.ua List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 22 Jun 2005 16:11:06 -0000 Hello, Matt. >>The vast majority of panics are hardware-related. It is rare nowadays >>for a usermode program to make the system panic. In particular you said >>the problem happens more under load. That really points even more to a >>hardware problem - bad CPU cache ram, bad ram, scsi termination, that >>sort of thing. >> >>Ted >> >> > This is kind of going to be a blanket post to all the recent suggestions > to me. I appreciate suggestions :) Ted, sorry, my other posts had > dmesg and hardware specs, etc. I just couldn't remember the subject line > of that thread. I'll be more descriptive here. > We have two different servers crashing. Both are SMP, but on different > hardware. We have five freeBSD servers in total, and only two are > affected. That is why I do not believe this is a hardware problem. > In any case, the machines are in a cold room where the temperature is > constantly maintained. 20 other servers in there are perfectly stable, > with no probs. > This particular machine that crashed last night while running portsdb > -uU is a Super Micro machine, with hyperthreading disabled in the bios, > dual CPU 3.06 ghz, with 4 gigs memory. We ran mem test on orion (the > machine that crashed last night) a week or so ago, and it found 70,000 > ECC errors. Those were fixed and that machine has been stable until > last night. I've now disabled SMP support, we'll see if that keeps it > stable or not. Portsdb -uU ran without problems after I disabled SMP. > As far as uranus, the other box (we keep a planet scheme for a certain > set of servers), we ran memtest86 and found no errors at all. That box > crashed about two days ago but has been stable since. It has not lasted > more than a week without doing a kernel trap and freezing. > It seems that both these servers have this problem. Out of the five > FreeBSD servers we have, these two are the ones with the highest load. > Maybe a higher load on the other three servers would cause the same > problem. I agree with you that this is a hardware problem, but on more > than one server with two different architectures and our highest load > makes me re-consider. > If this is truly a bug in FreeBSD 5.4-RELEASE, maybe this is something > that has been fixed in -stable? I will compile a debug kernel today and > try to provide a trace to the problem. I'll do it on which ever server > crashes next. I had same situation with to different high loaded servers (both SMP, with 8Gb of ram, and HT enabled,), with 5.4 Release, after disabeling HT and cvsup OS to 5.4-stable all working fine without any problems, last reboot was 28 days ago. > _______________________________________________ > freebsd-questions@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-questions > To unsubscribe, send any mail to > "freebsd-questions-unsubscribe@freebsd.org" -- Best regards, Sergey S. Ropchan