Date: Tue, 25 Oct 2005 15:01:25 -0600 From: Dan Charrois <dan@syz.com> To: freebsd-stable@freebsd.org Subject: Strange crashing/rebooting problem Message-ID: <616C3CA2-AC7C-4688-B97F-35911A6C270B@syz.com>
next in thread | raw e-mail | index | archive | help
Hi all. I'm wondering if anyone can shed some light on a strange crashing/rebooting problem I'm having. First, the specs: Hardware: Dell PowerEdge 2850 rack mounted server, Dual 3.4 Ghz Xeon, 5 Gb memory Hard Drives: LSILogic PERC 4e/Di, configured as RAID 5, with 3 X 40 Gb disks OS: FreeBSD 5.4-RELEASE-p6 for amd64 Other related software: mysql Ver 14.7 Distrib 4.1.14, for portbld- freebsd5.4 (amd64) using 4.3 I currently have hyperthreading enabled, since I'm not too concerned about the security of the system (it's on an internal-only network, with no user accounts other than the administrator, and figure that if the security issue associated with hyperthreading is the only problem, it wouldn't hurt to get a bit more speed). It's intended to be a single-purpose MySQL server to other client machines via TCP/IP, and supposed to be a high reliability, fast as possible machine. But the problem is this. I have it set to run mysqlhotcopy a couple of times during the day to back up the databases. And twice now in the last month or so, when it starts to run, it brings down the server. But the odd thing is that it doesn't lock up indefinitely, or even reboot itself normally. Instead, it suddenly seems to quit as though someone unplugged it and then goes through the boot sequence. It's at a remote location from me, so I haven't been able to see the console while it goes through its problems, but according to /var/log/messages, everything is running fine, and then suddenly, starts to write its initial boot messages: sql syslogd: kernel boot file is /boot/kernel/kernel sql kernel: Copyright (c) 1992-2005 The FreeBSD Project. etc.. There are no logs of any "shutting down" variety, and sure enough, I get sql kernel: Mounting root from ufs:/dev/amrd0s1a sql kernel: WARNING: / was not properly dismounted sql kernel: WARNING: /usr was not properly dismounted messages written a bit later in the boot sequence. What gets me is that if the machine was "really" locking up due to a kernel panic or something, I would expect it to stay frozen and not restart itself. But within a couple of minutes of going down hard, it has rebooted itself. There isn't any kind of watchdog timer that reboots itself after a lockup that I'm not aware of, is there? Because of this, I sometimes don't even realize it's happened until I found that the odd MySQL database needs to be repaired, and then I checked the logs and see what's happened. According to the logs, it's almost as though it's getting physically unplugged midstream, then plugged back in and boots from there. But it's in a locked cabinet in a colocation centre with other machines of mine which aren't having the problem, and it's happened twice now at exactly the same time - just right as mysqlhotcopy is about to run. Considering that this machine is supposed to be high availability, being down for even a couple of minutes like this is a problem. Plus, I really don't like not understanding what's making it go down like it does, and I'm obviously concerned about data corruption to the databases when something like this happens. Does anyone have any advice on what may be wrong, or something to try? I really have no idea even how to begin to troubleshoot this problem. If you need any more information at all, please let me know. Thanks for your help! Dan -- Syzygy Research & Technology Box 83, Legal, AB T0G 1L0 Canada Phone: 780-961-2213
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?616C3CA2-AC7C-4688-B97F-35911A6C270B>