Date: Wed, 11 Jul 2007 07:34:02 -1000 From: NetOpsCenter <noc@hdk5.net> To: Feargal Reilly <feargal@fbi.ie>, FreeBSD Mailing List <freebsd-questions@freebsd.org> Subject: Re: Complete hang during boot at boot2 prompt Message-ID: <4695148A.9010200@hdk5.net> In-Reply-To: <20070711182300.4634e278@mablung.edhellond.fbi.ie> References: <20070711182300.4634e278@mablung.edhellond.fbi.ie>
next in thread | previous in thread | raw e-mail | index | archive | help
Feargal Reilly wrote: >Hi, > >I have a server which went down overnight, and >would not subsequently boot. A reboot was performed by >facilities staff before I got to look at it so I don't know what >was showing on the console. The reason for the outage is >unknown, and nothing showed in /var/log/messages, other than >routine ntpd time sync messages. > >The server in question is a Intel SR1425BK1 server running >FreeBSD 6.2 amd64 GENERIC with a SATA RAID-1 array >provided by an onboard LSILogic MegaRAID controller. > >When booted, it would pass the various BIOS screens without >problem, the RAID utility would say that the array was optimal, >and then FreeBSD would start to boot, but it couldn't get past >boot2: > > > >>>FreeBSD/amd64 BOOT >>> >>> >Default: 0:ad(0,a)/boot/loader >boot: > >At this point, the server emitted a single continous beep, and >nothing else happened. Keyboard input did nothing, although >Ctrl-Alt-Del still worked, and at one point a heart symbol >appeared after I hit keys randomly for a while. > >My question is, what could have caused this failure? > >My initial guesses were either a memory failure or a really >badly corrupted boot sector, but I'm not convinced by either >explanation, for reasons outlined below. > >I urgently needed the data to be online again, so I yanked one >disk out of the machine and inserted it into another host, and >took the server back to the office. > >There, I yanked a memory module, and it booted fine, albeit >complaining about the degraded RAID array. However, when I >reinserted the memory, it continued to boot. I didn't have the >foresight to try it before I fiddled with the disks, but I can't >imagine that it had been seated incorrectly as the server had >been up for two months without problem. Also, the BIOS tests >passed, although I know they aren't too in depth. I'll run >sysutils/memtest anyway, and see what that throws up. > >Meanwhile, I inserted a replacement disk and rebuilt the RAID-1 >array, and it is still booting fine, so my best guess now is a >corrupted boot sector. The disk that I removed to insert into >another host was ad4, which I'm guessing is the disk that it >would have being trying to boot from in the first place. So a >bad sector could be responsible, but it would seem to be very >convenient, as there does not appear to be any other data >corruption on the disk. > >Also, I've run a short SMART test, and everything is okay as far >as it is concerned. I'm in the process of running a long test, >but that won't finish before I leave the office. If it were a >corrupted sector, would it be able to get to boot2? > >Any other suggestions as to what caused the failure? I know I've >changed the conditions and may never be able to reproduce it >(nor do I want to), but if I've failing hardware, I'd like a >best guess as to where it is. > >Thanks for your time, > >-fr. > > > Aloha, I have had memory chips walk out of the slots on several occasions. Sometimes its vibration or in Hawaii we have humidity issues occasionally that tend to cause this too. I have learned to spray the sockets and card connections with contact cleaner about every 6 months to avaid this problem. Especially in areas where servers are not in a cool environment. ~Al Plant - Honolulu, Hawaii - Phone: 808-284-2740 + http://hawaiidakine.com + http://freebsdinfo.org + noc@hdk5.net + + http://internetohana.org - Supporting - FreeBSD 6.* - 7.* + "All that's really worth doing is what we do for others."- Lewis Carrol
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4695148A.9010200>