Date: Wed, 11 Jul 2007 18:23:00 +0100 From: Feargal Reilly <feargal@fbi.ie> To: freebsd-questions@freebsd.org Subject: Complete hang during boot at boot2 prompt Message-ID: <20070711182300.4634e278@mablung.edhellond.fbi.ie>
next in thread | raw e-mail | index | archive | help
Hi, I have a server which went down overnight, and would not subsequently boot. A reboot was performed by facilities staff before I got to look at it so I don't know what was showing on the console. The reason for the outage is unknown, and nothing showed in /var/log/messages, other than routine ntpd time sync messages. The server in question is a Intel SR1425BK1 server running FreeBSD 6.2 amd64 GENERIC with a SATA RAID-1 array provided by an onboard LSILogic MegaRAID controller. When booted, it would pass the various BIOS screens without problem, the RAID utility would say that the array was optimal, and then FreeBSD would start to boot, but it couldn't get past boot2: >> FreeBSD/amd64 BOOT Default: 0:ad(0,a)/boot/loader boot: At this point, the server emitted a single continous beep, and nothing else happened. Keyboard input did nothing, although Ctrl-Alt-Del still worked, and at one point a heart symbol appeared after I hit keys randomly for a while. My question is, what could have caused this failure? My initial guesses were either a memory failure or a really badly corrupted boot sector, but I'm not convinced by either explanation, for reasons outlined below. I urgently needed the data to be online again, so I yanked one disk out of the machine and inserted it into another host, and took the server back to the office. There, I yanked a memory module, and it booted fine, albeit complaining about the degraded RAID array. However, when I reinserted the memory, it continued to boot. I didn't have the foresight to try it before I fiddled with the disks, but I can't imagine that it had been seated incorrectly as the server had been up for two months without problem. Also, the BIOS tests passed, although I know they aren't too in depth. I'll run sysutils/memtest anyway, and see what that throws up. Meanwhile, I inserted a replacement disk and rebuilt the RAID-1 array, and it is still booting fine, so my best guess now is a corrupted boot sector. The disk that I removed to insert into another host was ad4, which I'm guessing is the disk that it would have being trying to boot from in the first place. So a bad sector could be responsible, but it would seem to be very convenient, as there does not appear to be any other data corruption on the disk. Also, I've run a short SMART test, and everything is okay as far as it is concerned. I'm in the process of running a long test, but that won't finish before I leave the office. If it were a corrupted sector, would it be able to get to boot2? Any other suggestions as to what caused the failure? I know I've changed the conditions and may never be able to reproduce it (nor do I want to), but if I've failing hardware, I'd like a best guess as to where it is. Thanks for your time, -fr. -- Feargal Reilly, Chief Techie, FBI. PGP Key: 0xBD252C01 (expires: 2006-11-30) Web: http://www.fbi.ie/ | Tel: +353.14988588 | Fax: +353.14988489 Communications House, 11 Sallymount Avenue, Ranelagh, Dublin 6.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070711182300.4634e278>