Date: Sun, 24 Jan 2010 19:09:35 -0600 From: Billy Newsom <billy@nlcc.us> To: freebsd-questions@freebsd.org Subject: Re: How to troubleshoot a frozen boot sequence Message-ID: <4B5CEF4F.3010305@nlcc.us> In-Reply-To: <4B5A9BB9.2070801@nlcc.us> References: <4B59E61B.3090504@nlcc.us> <795fc2b81001221030n321c994cv9fd3c76b981fead0@mail.gmail.com> <4B5A9BB9.2070801@nlcc.us>
next in thread | previous in thread | raw e-mail | index | archive | help
I am not sure why, but here was my solution. I determined through a lot of poking that the Master Boot Record of each drive. Here is what I found out: 1. My backup drive (ad0) had the FreeBSD boot manager installed. 2. My main drive (twed0) had the FreeBSD MBR installed. So, what is the problem? All I could figure is to install the boot manager (called boot0cfg) onto my main drive. Silly, but it worked. Why, I don't have a clue. I do, by the way, remember purposely using this setup when I ran sysinstall to configure this machine. I felt that the ad0 drive needed a boot manager (just in case it was used someplace else) and the main drive would not need a boot manager. But nothing ever indicated to me that a standard MBR on twed0 would not work if ad0 was missing. Here is my partition table from twed0: # /dev/twed0 g c60801 h255 s63 p 1 0xa5 63 976768002 a 1 Notice there is just one partition and it is active. But it wouldn't boot until I ran: bootcfg -B twed0 which keeps the slice table the same. Once I was done, the server will now boot with or without the ad0 drive. In case of a backup drive failure, I had to also mess with fstab: 1. I had to add the "noauto" option, as someone suggested. 2. I had to disable all fsck passes (3 didn't work -->0) or fsck failure will boot single user. My question is now, do I write a script to mount the drive (too late, I did) during boot and then to run fsck also? I am not sure how fsck should be run, but I assume it is kind of important. My main challenge was determining when to mount the disk. Here is my solution and my script so far that seems to work. ===================== #!/bin/sh # mounts my special drive # TODO: Need to fsck it # PROVIDE: mountbackup # REQUIRE: mail # KEYWORD: nojail . /etc/rc.subr name="mountbackup" start_cmd="mountbackup_start" stop_cmd=":" THIS="/disk250" HOSTNAME=`/bin/hostname` MAILTO=root@${HOSTNAME} TOD=`/bin/date` mountbackup_start() { local err # Mount "backup" filesystems. echo -n "Mounting $THIS Backup filesystems" mount $THIS err=$? echo '.' case ${err} in 0) ;; *) echo "Mounting $THIS filesystems failed," \ " but it's okay for now. Sending mail to $MAILTO" (echo " Mounting $THIS filesystems failed on boot!" echo " " echo "Host: $HOST Date: $TOD" | \ mail -s "FAILURE to mount $THIS on $HOST" $MAILTO ;; esac } load_rc_config $name run_rc_command "$1" ===================== Billy Newsom wrote: > Nathan Vidican wrote: > > To me, it sounds like you have two issues to deal with here: > > > > #1 - booting off of the twed0 disk, what is your systems' BIOS currently > > set to boot from, from the way you describe it's almost as if the system > > is booting from ad0 - in which case yes, you will have to put a valid > > boot config onto twed0 > > I feel that I have run across a common and old "SCSI v IDE" battle (The > FreeBSD Handbook still talks about it). Even though I make the drive > controller (the twe = 3Ware SATA controller) as my first boot drive in > BIOS (effectively 0x80 as I understand it), FreeBSD does not ever pay > attention to the BIOS's numerical order. (See my reason below*) It wants > to find stuff on ad0 and boot that drive if it exists. > > My supposition is that since I had twe0 and ad0 running during my 7.2 > install, that the correct drive partition and MBR stuff were applied to > get it to boot AS-IS, but... > > When it is not as it is now, It freezes at the boot loader, attempting > to find ad0. > > It is either > > a. Finding ad0 in fstab and really wishing it was there > or > b. The boot strap code is physically on ad0 and not twed0 because the > Sysinstall process never wrote it there. > > I think it is b. If b, the boot process may be: > > Stage 1: BIOS picks twe0 to be the first drive to attempt a boot. > Stage 2: MBR (boot 0) -- located on twe0 > Stage 3: boot1 -- located on twed0 (BTX Boot Loader?) > Stage 4: boot2 -- located on ad0 (FreeBSD/i386 bootstrap loader 1.1?) > Stage 5: Boot Loader -- shows menu on twed0s1a > Stage 6: Kernel boots up on twed0s1a > > And so when I remove ad0 to simulate a backup drive failure, the stage 4 > tries to run a missing bootstrap loader from twed0. > > Stage 4: boot2 -- missing on twed0, system hangs. > > I think this is happening because it is the BTX loader which may find > and concatenate the BIOS drives, getting confused, and switching the > boot to ad0 for just the one stage that finishes the bootstrap. > > I think one solution is to (next time) not install my backup drive until > after Sysinstall is long done! I think it's a sysinstall bug, some of this. > > * My Reason for saying that is my guess that the sysinstall program saw > the ad0 as something important, and included it in the chain of the > boot. For example, when I was done SLICING my drives in Sysinstall, the > silly thing then got the "w" write command and went out there and made > some (wrong) decisions under the assumption that ad0 would NATURALLY > (via BIOS) be part of the boot process. So the right code never got > written to twe0 in the right places. Sure, it got all the kernel and I > told it to put a standard FreeBSD MBR, but it must be missing something > on track 0. > > > #2 - you could add the flag 'noauto' to ad0 from within fstab - this > > will allow the system to boot without mounting the disk (alleviating the > > dreaded single-user-mode). Use a startup script in /usr/local/etc/rc.d > > to then mount the disk if available on bootup. I've done similar setups > > to this before where we were using external USB drives for backup and > > weren't 100% sure they'd always be connected in the case a server might > > be rebooted - worst case, you'll end up with it not mounted, but the > > system will still be up at least. > > I will give it a try. I need to do something to correct this second > issue for certain. My ad0 is a good spare, but it's old. > > > -- > > Nathan Vidican > > nathan@vidican.com <mailto:nathan@vidican.com> > > > > > > On Fri, Jan 22, 2010 at 12:53 PM, Billy Newsom <billy@nlcc.us > > <mailto:billy@nlcc.us>> wrote: > > > > I am doing a test run on a production server. It has 2 hard drives. > > > > ad0 (mounted on /disk250 in a single slice plus SWAP) > > twed0 (mounted on / /var /usr and a SWAP) > > > > The twed0 is a hardware mirror and my main drive. > > ad0 is just for backups. > > > > What the issue is, and you probably know where I'm heading. The boot > > process freezes if I remove the ad0 (to test a drive failure > condition) > > > > It freezes after saying: > > BTX boot loader.... etc. > > > > FreeBSD/i386 bootstrap loader 1.1 > > It spins for a second, then stops... unless I have ad0 in the > computer. > > /boot/kernel/kernel text=0x7b03a0 data=0xcdee0 / > > > > And it never gets to the boot menu. > > > > So: > > > > 1. Should I put a new boot0config on the twed0 drive? If so do I > > boot from a CD to do that? > > > > I need to potentially do something also to my disk labels and my > > fstab so that I don't boot to single user mode if drive ad0 fails. I > > haven't done this exact type of thing before, so I am looking for a > > little help. > > > > my fstab: > > /dev/ad0s1b none swap sw 0 > > 0 > > /dev/twed0s1b none swap sw 0 > > 0 > > /dev/twed0s1a / ufs rw 1 > > 1 > > /dev/ad0s1d /disk250 ufs rw 2 > > 2 > > /dev/twed0s1e /tmp ufs rw 2 > > 2 > > /dev/twed0s1f /usr ufs rw 2 > > 2 > > /dev/twed0s1d /var ufs rw 2 > > 2 > > /dev/acd0 /cdrom cd9660 ro,noauto 0 > > 0 > > > > > > I tried to read the MBR from the twed0 drive, and the program > > couldn't read it. The one from the ad0 drive is readable and I saved > > a copy of it. >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4B5CEF4F.3010305>