Date: Wed, 28 Feb 2007 16:39:09 -0500 From: DAve <dave.list@pixelhammer.com> To: freebsd-questions@freebsd.org Subject: Re: Stability Issues on 5.4-RELEASE Box Message-ID: <45E5F67D.3060701@pixelhammer.com> In-Reply-To: <20070228122108.bhd56o5wn4ss8c4g@mail.schnarff.com> References: <20070228122108.bhd56o5wn4ss8c4g@mail.schnarff.com>
next in thread | previous in thread | raw e-mail | index | archive | help
alex@schnarff.com wrote: > Hello All, > > I've recently fallen into the task of administering a FreeBSD > 5.4-RELEASE box that acts as the web server for a small non-profit that > I volunteer for. Unfortunately, the system has been having some > extremely vexing stability issues over the last month or so, which even > my 6+ years of experience as an OpenBSD admin have not helped me track > down. > > First things first, let me say explicitly that I'm not trying to say > "FreeBSD sucks, it's not stable" or anything like that. It's a fine OS, > and I'm sure that it's either faulty hardware or a misconfiguration of > some sort causing these problems. :-) > > That said, here are some of the symptoms the box has been experiencing: > > * Occasional random reboots. I've only personally witnessed one, and > they don't happen often, but any time a *NIX box just reboots for no > apparent reason (there was no indication of a problem in any of the > logs, at least that I could see), something really bad is going on. > > * Random extreme slowness when logging in via SSH, with the time to get > a shell ranging from a second or two all the way up to 80 seconds. The > box isn't busy enough that it's just slow due to load (especially since, > once you're in, things fly), and it's not just a reverse DNS issue like > I've seen on OpenBSD (this occurs even when logging in from locations > listed in /etc/hosts that resolve properly out of that file). Until I > upgraded to the current version of OpenSSL/OpenSSH, the box would > occasionally just become unresponsive altogether over SSH, not allowing > logins for 15+ minutes at a time. > > * Issues with files that are not found on startup sometimes, but are > other times. Prime example: the Zope CMS system that's been installed > failed to find libmysqlclient.so after a planned soft reboot, but found > it with no trouble on a subsequent boot a few minutes later, with no > config changes in between. > > * A warning in /var/log/messages that the root filesystem was full, when > it was at 60% capacity (and something like 2% inode capacity); the > problem has yet to repeat, though no files have been cleared off of that > filesystem. > > * Random crashes of the Zope/Plone system that's running the main part > of the web site. While I realize that, in and of itself, this means > nothing about the stability of the underlying OS, in the context of all > of the other things going on (as well as the fact that the Zope list has > been unable to help figure out why it's crashing), it seems like it > might be further evidence of a larger problem. > > Thus far, besides simply scanning log files, constantly watching "top" > and "ps", etc., I've not been able to do much with the box. As I said, I > upgraded OpenSSL/OpenSSH to current versions, and I installed pf as the > firewall (there was none before I arrived...don't even get me started on > that). This weekend the guy who was the previous admin will be running a > Memtest for me and disabling hyperthreading (which there's no > performance justification for, and which has caused me stability issues > at least on Linux in the past), since the server is in Oregon and I'm in > the DC area. That's about the extent of what I've been able to do to > date, since this is a production box. > > What I'd like to know from you guys is: > > * Am I justified in suspecting hyperthreading as a potential cause of > instability? > > * Does 5.4-RELEASE have any known bugs that might cause stability issues > like the ones I've described here? More importantly, would an upgrade to > 6.2-RELEASE be worthwhile (as is my instinct), in terms of being > generally more stable and/or having better hardware support? Would such > an upgrade be possible/relatively painless to perform without being > physically at a console, as has been the case with OpenBSD over the years? > > * Given my dmesg below, do you see any specific problems? > > * Do you have any other suggestions for debugging this problem? > > Thanks in advance for any help you can provide. :-) > > Alex Kirk I would certainly think hardware is the place to look. Just so you know, we still run a server on FBSD 4.8, and it runs very well. We have 4.8, 4.11, 5.2.1, 5.4, 6.1, and 6.2. Oh, and a couple Linux, NetBSD, and Solaris boxen too. I prefer not to chase versions on high load production equipment, certainly not as a problem resolution strategy. For the record, I have never had an blind upgrade fix an unidentified problem, and if it did I would be very worried. I would guess memory, at least that is where I would look first. I would also wonder what environment the server runs in, heat is a killer, so is vibration. Loose racks and humming floors can and will cause connections to slip. I have fixed servers that ran for months and suddenly showed odd behavior simply by powering down and removing all cards/ram/cables, then reattaching everything. Mysterious failures, 3000 miles to the console, I don't envy you ;^) DAve -- Three years now I've asked Google why they don't have a logo change for Memorial Day. Why do they choose to do logos for other non-international holidays, but nothing for Veterans? Maybe they forgot who made that choice possible.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?45E5F67D.3060701>