From owner-freebsd-questions@FreeBSD.ORG Sat Jan 23 06:49:00 2010 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8583F106566B for ; Sat, 23 Jan 2010 06:49:00 +0000 (UTC) (envelope-from billy@nlcc.us) Received: from toaster.abovetec.com (toaster.abovetec.com [208.75.177.126]) by mx1.freebsd.org (Postfix) with ESMTP id 590FD8FC12 for ; Sat, 23 Jan 2010 06:48:59 +0000 (UTC) Received: (qmail 88936 invoked by uid 89); 23 Jan 2010 06:48:57 -0000 Received: from unknown (HELO ibm.nlcc.us) (67.54.213.138) by 127.0.0.21 with SMTP; 23 Jan 2010 06:48:57 -0000 Received: (qmail 15054 invoked by uid 89); 23 Jan 2010 06:48:51 -0000 Received: from unknown (HELO ?192.168.0.46?) (billy@192.168.0.46) by ibm.nlcc.us with ESMTPA; 23 Jan 2010 06:48:51 -0000 Message-ID: <4B5A9BB9.2070801@nlcc.us> Date: Sat, 23 Jan 2010 00:48:25 -0600 From: Billy Newsom User-Agent: Thunderbird 2.0.0.23 (Windows/20090812) MIME-Version: 1.0 To: freebsd-questions@freebsd.org References: <4B59E61B.3090504@nlcc.us> <795fc2b81001221030n321c994cv9fd3c76b981fead0@mail.gmail.com> In-Reply-To: <795fc2b81001221030n321c994cv9fd3c76b981fead0@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: How to troubleshoot a frozen boot sequence X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 23 Jan 2010 06:49:00 -0000 Nathan Vidican wrote: > To me, it sounds like you have two issues to deal with here: > > #1 - booting off of the twed0 disk, what is your systems' BIOS currently > set to boot from, from the way you describe it's almost as if the system > is booting from ad0 - in which case yes, you will have to put a valid > boot config onto twed0 I feel that I have run across a common and old "SCSI v IDE" battle (The FreeBSD Handbook still talks about it). Even though I make the drive controller (the twe = 3Ware SATA controller) as my first boot drive in BIOS (effectively 0x80 as I understand it), FreeBSD does not ever pay attention to the BIOS's numerical order. (See my reason below*) It wants to find stuff on ad0 and boot that drive if it exists. My supposition is that since I had twe0 and ad0 running during my 7.2 install, that the correct drive partition and MBR stuff were applied to get it to boot AS-IS, but... When it is not as it is now, It freezes at the boot loader, attempting to find ad0. It is either a. Finding ad0 in fstab and really wishing it was there or b. The boot strap code is physically on ad0 and not twed0 because the Sysinstall process never wrote it there. I think it is b. If b, the boot process may be: Stage 1: BIOS picks twe0 to be the first drive to attempt a boot. Stage 2: MBR (boot 0) -- located on twe0 Stage 3: boot1 -- located on twed0 (BTX Boot Loader?) Stage 4: boot2 -- located on ad0 (FreeBSD/i386 bootstrap loader 1.1?) Stage 5: Boot Loader -- shows menu on twed0s1a Stage 6: Kernel boots up on twed0s1a And so when I remove ad0 to simulate a backup drive failure, the stage 4 tries to run a missing bootstrap loader from twed0. Stage 4: boot2 -- missing on twed0, system hangs. I think this is happening because it is the BTX loader which may find and concatenate the BIOS drives, getting confused, and switching the boot to ad0 for just the one stage that finishes the bootstrap. I think one solution is to (next time) not install my backup drive until after Sysinstall is long done! I think it's a sysinstall bug, some of this. * My Reason for saying that is my guess that the sysinstall program saw the ad0 as something important, and included it in the chain of the boot. For example, when I was done SLICING my drives in Sysinstall, the silly thing then got the "w" write command and went out there and made some (wrong) decisions under the assumption that ad0 would NATURALLY (via BIOS) be part of the boot process. So the right code never got written to twe0 in the right places. Sure, it got all the kernel and I told it to put a standard FreeBSD MBR, but it must be missing something on track 0. > #2 - you could add the flag 'noauto' to ad0 from within fstab - this > will allow the system to boot without mounting the disk (alleviating the > dreaded single-user-mode). Use a startup script in /usr/local/etc/rc.d > to then mount the disk if available on bootup. I've done similar setups > to this before where we were using external USB drives for backup and > weren't 100% sure they'd always be connected in the case a server might > be rebooted - worst case, you'll end up with it not mounted, but the > system will still be up at least. I will give it a try. I need to do something to correct this second issue for certain. My ad0 is a good spare, but it's old. > -- > Nathan Vidican > nathan@vidican.com > > > On Fri, Jan 22, 2010 at 12:53 PM, Billy Newsom > wrote: > > I am doing a test run on a production server. It has 2 hard drives. > > ad0 (mounted on /disk250 in a single slice plus SWAP) > twed0 (mounted on / /var /usr and a SWAP) > > The twed0 is a hardware mirror and my main drive. > ad0 is just for backups. > > What the issue is, and you probably know where I'm heading. The boot > process freezes if I remove the ad0 (to test a drive failure condition) > > It freezes after saying: > BTX boot loader.... etc. > > FreeBSD/i386 bootstrap loader 1.1 > It spins for a second, then stops... unless I have ad0 in the computer. > /boot/kernel/kernel text=0x7b03a0 data=0xcdee0 / > > And it never gets to the boot menu. > > So: > > 1. Should I put a new boot0config on the twed0 drive? If so do I > boot from a CD to do that? > > I need to potentially do something also to my disk labels and my > fstab so that I don't boot to single user mode if drive ad0 fails. I > haven't done this exact type of thing before, so I am looking for a > little help. > > my fstab: > /dev/ad0s1b none swap sw 0 > 0 > /dev/twed0s1b none swap sw 0 > 0 > /dev/twed0s1a / ufs rw 1 > 1 > /dev/ad0s1d /disk250 ufs rw 2 > 2 > /dev/twed0s1e /tmp ufs rw 2 > 2 > /dev/twed0s1f /usr ufs rw 2 > 2 > /dev/twed0s1d /var ufs rw 2 > 2 > /dev/acd0 /cdrom cd9660 ro,noauto 0 > 0 > > > I tried to read the MBR from the twed0 drive, and the program > couldn't read it. The one from the ad0 drive is readable and I saved > a copy of it.