Date: Thu, 30 Sep 2004 10:03:00 -0400 From: Jim Durham <durham@jcdurham.com> To: freebsd-hackers@freebsd.org Subject: Sudden Reboots Message-ID: <200409301003.00492.durham@jcdurham.com>
next in thread | raw e-mail | index | archive | help
I have had this problem now with at least 3 FreeBSD servers over a period of about 2 years. I had put it down to some hardware problem but it seems to be too much of a coincidence with 3 different machines doing the same thing. The first time was when I put 4.5-RELEASE on a brand new Dell Poweredge 2650. I ran it on the bench for a week or so, then decided all was well and put it in the server rack and started doing the company's email service on it. After a few weeks, it suddenly would 'reboot' for no apparent reason. No log entries, nothing at all except the usual stuff in /var/log/messages about '/ was not unmounted correctly', etc. Just like you had pulled the power plug. The 2nd instance was a server that I maintain for an ISP that was a mirror image of their primary server, a 'hot spare' so to speak. The primary, running the same software was solid, but the backup would reboot at about 5:20 every morning with the same syndrome..no log entries of any sort and just the usual entries in /var/log messages saying the the / partition was not unmounted properly. The odd thing was that it was happening at virtually the same time every morning. I upgraded both systems to the latest -RELEASE and it made no difference. Then, they both just *stopped doing it by themselves* with no apparent correlation to anything installed software-wise. Neither server has had any problem for over a year now. The 3rd instance is happening now. Another server I maintain for my 'night job' is doing the same thing for a customer. It just 'stops' like you pulled the power plug. However, this time I thought to check using 'last' and found that I had accidentally left an ssh session open and that entry said 'crash'. There are no other log entries I can find related to the 'reboot'. I 'googled' this problem and found it mentioned at least dozens of times without any answer brought forth. I'm beginning to think this is real, but so intermittent that I don't know how to begin to debug or find it. A wild guess would be something like an unitialized pointer, where everything works until whereever it is pointing to assumes some value that makes it just die suddenly without even a panic message. The reason that I suspect this is also that the server that is doing this currently was running fine for a year, then the floods we had recently caused it to be powered down for a day or so and usually it is on a UPS and never is powered down, so that would have maybe changed the 'garbage' in memory, whereas normally it would stay the same until it was powered down. IE; if an uninitialized pointer was the culprit, maybe what it is pointing to, or where it is pointing is critical and powering it down changes where it is pointing and that area gets overwritten by some system process and causes the reboot. I'm posting this to 'hackers' because I thought it might be a kernel thing. -- -Jim
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200409301003.00492.durham>