Date: Thu, 19 Apr 2007 10:43:18 -0700 From: Chuck Swiger <cswiger@mac.com> To: Dimitris Zilaskos <dzila@tassadar.physics.auth.gr> Cc: freebsd-questions@freebsd.org Subject: Re: random hangs/reboots with Dell servers Message-ID: <AD32146C-44E0-486D-909D-862C89095619@mac.com> In-Reply-To: <Pine.LNX.4.64.0704191333000.7897@tassadar.physics.auth.gr> References: <Pine.LNX.4.64.0704191333000.7897@tassadar.physics.auth.gr>
next in thread | previous in thread | raw e-mail | index | archive | help
On Apr 19, 2007, at 3:54 AM, Dimitris Zilaskos wrote: > Over the last 3 year we have installed freebsd 5.x and 6.x, with > currently deployed version being 6.1, to a variety of of Dell rack > mounted systems. > > The Dell systems used so far are Poweredge 1750, 2950 (both scsi), > and sc1425 (sata). All of them are dual CPU Xeon systems. I've got a large number of Dell PowerEdge 1750, 1850, 2900, 2950 deployed in various production environments, whereas some other clients are using HP ProLiant 360/370 boxen. Both seem to be rock solid under either 5.4/5.5, or 6.1/6.2. I've even got a pair of firewall boxes running nothing but NAT and SSHd, which are at 600+ days of uptime: FreeBSD 5.4-STABLE (FW) #0: Tue Jul 12 11:10:14 EDT 2005 Welcome to FreeBSD! 12:24PM up 636 days, 19:26, 3 users, load averages: 0.25, 0.14, 0.04 (Machines running more services get OS or service related updates more frequently-- typically every month to every 3 months-- but I don't like to make changes to a running machine unless I expect the change to make an improvement which justifies the disruption. For a non-SMP firewall which would involve loss of external network connectivity to update, nothing in 6.x is worth the cost to update to as yet, IMHO.) > All these systems serve as mail/web servers, with 2 to 15 jails. > > Installation has always proceeded normally without problems. > However, after a few months of operation, all of these systems, > purchased at different moments during the last 3 years, will begin > rebooting randomly or freezing completely. > > These reboots/freezes will at first occur once per 6 months, then > gradually will move to to once per month, to normally stabilize > around once per week, but in the case of the 1750 system once it > even happened twice a day. > > Load does not seem to matter, since even after shutting down all > services in the servers, still random reboots occured. Sounds to be something hardware-related like a power-supply problem, if the failure rate is gradually getting shorter and is not correlated with load at all. > So far we tried various tricks digged from the archives, like > disabling ACPI, HT, but nothing changed. > > We have migrated some systems that had these issues to RHEL > compatible OS, and they run rock solid under heavy load. Hmm. Well, you might have to wait for a few weeks or months to be able to get reasonable comparison of longer-term stability, but this at least implies that something like cooling or a failed fan aren't likely causes. > Right now I have enabled kernel crash dumps and I am waiting for > the next crash. But I understad a lot of people use FreeBSD with > Dell servers, and I would like to listen on how to tackle this > situation we are facing. Try to get a crash dump. Also, you might find reviewing the BIOS options and disabling everything which is not needed, hopefully including USB, will help. -- -Chuck
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?AD32146C-44E0-486D-909D-862C89095619>