Date: Mon, 27 Jun 2005 01:01:09 -0400 From: Matt Juszczak <matt@atopia.net> To: freebsd-stable@freebsd.org Subject: FreeBSD -STABLE servers repeatedly crashing. Message-ID: <42BF8815.6090909@atopia.net>
next in thread | raw e-mail | index | archive | help
Hello all, About three weeks ago, I upgraded my 5.3-RELEASE boxes to 5.4-RELEASE. I also turned on procmail globally on our mail server. Here is our current FreeBSD server setup: URANUS - primary ldap CALIBAN - secondary ldap ORION - primary mail Orion was the first one to crash, about three weeks ago. Orion is constantly talking to uranus, because uranus is our primary ldap server (we have a planet scheme), and caliban is our secondary ldap server. I ran an email flood test on orion to see if I could crash it again. This time, the high requests on Uranus caused Uranus to crash. With two different servers on two different hardware setups crashing, I had to start thinking of what could be causing the problem. Memory tests on both servers came back OK. Orion had some ECC errors which it was able to fix. I wasn't able to catch orion's first crash, but I was able to catch uranus's first crash: http://paste.atopia.net/126 I have the other crashes written down in pencil at my work. They all say mostly the same thing. I assume Caliban would also experience this behavior, but because it does not receive much load at all (only does anything when uranus dies), I am not able to confirm this. The only thing similar between the boxes is that all three have two processors in them, and are running SMP. Orion had hyperthreading turned on but I disabled this in the bios, to no avail. Someone with similar experiences running SMP informed to upgrade to -STABLE as of last week. For almost a week, Orion ran fine. This evening; however, Orion once again crashed, its fourth time in three weeks. Uranus has been stable for a few days but I am expecting it to crash again any day now (they usually take between 4-6 days). So now I am stuck. I have two -STABLE machines which continue to cause kernel traps. Tomorrow, I am going to compile a debugging kernel on orion and try to let it crash again to see what kind of errors it reports, but I was wondering if anyone else is experiencing these problems. Thanks in advance, Matt Juszczak
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?42BF8815.6090909>