From owner-freebsd-questions@FreeBSD.ORG Thu Apr 19 17:43:19 2007 Return-Path: X-Original-To: freebsd-questions@freebsd.org Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id D3C3016A402 for ; Thu, 19 Apr 2007 17:43:19 +0000 (UTC) (envelope-from cswiger@mac.com) Received: from mail-out4.apple.com (mail-out4.apple.com [17.254.13.23]) by mx1.freebsd.org (Postfix) with ESMTP id BAE5313C468 for ; Thu, 19 Apr 2007 17:43:19 +0000 (UTC) (envelope-from cswiger@mac.com) Received: from relay7.apple.com (a17-128-113-37.apple.com [17.128.113.37]) by mail-out4.apple.com (8.13.8/8.13.8) with ESMTP id l3JHhJag005511; Thu, 19 Apr 2007 10:43:19 -0700 (PDT) Received: from relay7.apple.com (unknown [127.0.0.1]) by relay7.apple.com (Symantec Mail Security) with ESMTP id 68136300BF; Thu, 19 Apr 2007 10:43:19 -0700 (PDT) X-AuditID: 11807125-9eb1dbb0000007e5-ae-4627aa37fed5 Received: from [17.214.13.96] (cswiger1.apple.com [17.214.13.96]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by relay7.apple.com (Apple SCV relay) with ESMTP id 563C13005F; Thu, 19 Apr 2007 10:43:19 -0700 (PDT) In-Reply-To: References: Mime-Version: 1.0 (Apple Message framework v752.2) Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: Content-Transfer-Encoding: 7bit From: Chuck Swiger Date: Thu, 19 Apr 2007 10:43:18 -0700 To: Dimitris Zilaskos X-Mailer: Apple Mail (2.752.2) X-Brightmail-Tracker: AAAAAA== Cc: freebsd-questions@freebsd.org Subject: Re: random hangs/reboots with Dell servers X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 19 Apr 2007 17:43:19 -0000 On Apr 19, 2007, at 3:54 AM, Dimitris Zilaskos wrote: > Over the last 3 year we have installed freebsd 5.x and 6.x, with > currently deployed version being 6.1, to a variety of of Dell rack > mounted systems. > > The Dell systems used so far are Poweredge 1750, 2950 (both scsi), > and sc1425 (sata). All of them are dual CPU Xeon systems. I've got a large number of Dell PowerEdge 1750, 1850, 2900, 2950 deployed in various production environments, whereas some other clients are using HP ProLiant 360/370 boxen. Both seem to be rock solid under either 5.4/5.5, or 6.1/6.2. I've even got a pair of firewall boxes running nothing but NAT and SSHd, which are at 600+ days of uptime: FreeBSD 5.4-STABLE (FW) #0: Tue Jul 12 11:10:14 EDT 2005 Welcome to FreeBSD! 12:24PM up 636 days, 19:26, 3 users, load averages: 0.25, 0.14, 0.04 (Machines running more services get OS or service related updates more frequently-- typically every month to every 3 months-- but I don't like to make changes to a running machine unless I expect the change to make an improvement which justifies the disruption. For a non-SMP firewall which would involve loss of external network connectivity to update, nothing in 6.x is worth the cost to update to as yet, IMHO.) > All these systems serve as mail/web servers, with 2 to 15 jails. > > Installation has always proceeded normally without problems. > However, after a few months of operation, all of these systems, > purchased at different moments during the last 3 years, will begin > rebooting randomly or freezing completely. > > These reboots/freezes will at first occur once per 6 months, then > gradually will move to to once per month, to normally stabilize > around once per week, but in the case of the 1750 system once it > even happened twice a day. > > Load does not seem to matter, since even after shutting down all > services in the servers, still random reboots occured. Sounds to be something hardware-related like a power-supply problem, if the failure rate is gradually getting shorter and is not correlated with load at all. > So far we tried various tricks digged from the archives, like > disabling ACPI, HT, but nothing changed. > > We have migrated some systems that had these issues to RHEL > compatible OS, and they run rock solid under heavy load. Hmm. Well, you might have to wait for a few weeks or months to be able to get reasonable comparison of longer-term stability, but this at least implies that something like cooling or a failed fan aren't likely causes. > Right now I have enabled kernel crash dumps and I am waiting for > the next crash. But I understad a lot of people use FreeBSD with > Dell servers, and I would like to listen on how to tackle this > situation we are facing. Try to get a crash dump. Also, you might find reviewing the BIOS options and disabling everything which is not needed, hopefully including USB, will help. -- -Chuck