From owner-freebsd-hackers@FreeBSD.ORG Mon Jan 10 23:19:09 2011 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 17A0B1065672; Mon, 10 Jan 2011 23:19:09 +0000 (UTC) (envelope-from luigi@onelab2.iet.unipi.it) Received: from onelab2.iet.unipi.it (onelab2.iet.unipi.it [131.114.59.238]) by mx1.freebsd.org (Postfix) with ESMTP id B585F8FC1A; Mon, 10 Jan 2011 23:19:07 +0000 (UTC) Received: by onelab2.iet.unipi.it (Postfix, from userid 275) id A9C2673098; Tue, 11 Jan 2011 00:33:30 +0100 (CET) Date: Tue, 11 Jan 2011 00:33:30 +0100 From: Luigi Rizzo To: Tom Judge Message-ID: <20110110233330.GA9190@onelab2.iet.unipi.it> References: <4D295820.20807@tomjudge.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4D295820.20807@tomjudge.com> User-Agent: Mutt/1.4.2.3i Cc: freebsd-hackers@freebsd.org, luigi@freebsd.org Subject: Boot0cfg bug redux (Re: sys/boot/boot0/boot0.S - r186598) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 10 Jan 2011 23:19:09 -0000 In order to understand the bug discussed in the recent thread (original message attached at the end), Tom Judge passed me the dump of the boot sector around the bug. The system giving trouble has the following configuration Fresh transcript: file1: ORIGINAL BOOT SECTOR # boot0cfg -v ad0 # flag start chs type end chs offset size 1 0x80 0: 1: 1 0xa5 494: 15:63 63 498897 2 0x00 495: 1: 1 0xa5 989: 15:63 499023 498897 3 0x00 990: 0: 1 0xa5 992: 15:63 997920 3024 file2: boot sector after running 'boot0cfg -s 2 -v ad0' > cmp -x file1 file2 000001b5 00 01 _OPT, default option No big surprises here, the default selection changes from 0 to 1 HOWEVER, boot0cfg does not alter the 'active' flag in the partition table. This triggers, if i remember well, a 'feature' in the boot1/boot2, code which does not know/honor the selected partition and instead boots the first partition marked as 'active', and missing that, the first FreeBSD partition. As a consequence, if we reboot without pressing an F-key, the system boots from partition s1 even though the boot loader indicates F2. file3: boot sector after the above reboot > cmp -x file1 file3 000001b5 00 01 Next, reboot this time pressing F2. After the boot we start from s2, and the boot sector is now changed: file4: boot sector after pressing F2 > cmp -x file1 file4 000001b4 00 b1 _NXTDRV 000001b5 00 01 _OPT, default option 000001be 80 00 active flag, slice 1 000001ce 00 80 active flag, slice 2 As expected the 'active' flag is updated as a result of a boot from the partition selected. This is something that could be done by 'boot0cfg -s ...' to achieve the desired behaviour. The only "surprise" here is that _NXTDRV has changed. I am unsure if this was the result of an erroneous F5 keypress. Indeed 0xb1 is probably the correct initial value of the byte at 0x1b4, probably I/we forgot to initialize the field. So, to summarize, I guess that a possible fix (that does not involve using gpart, or even worse, modifying boot0.S, which probably does not have any spare space) is to modify boot0cfg so that it sets the 'active' flag for the partition corresponding to the default entry. What do people think ? cheers luigi On Sun, Jan 09, 2011 at 12:39:28AM -0600, Tom Judge wrote: > Hi, > > Today I ran into an issue where setting the default slice with boot0cfg > -s is broken. > > This is related to a section of this revision: > > + commit Warner's patch "orb $NOUPDATE,_FLAGS(%bp)" > to avoid writing to disk in case of a timeout/default choice; > > This issue is quite well documented in bin/134907 which has been open > since May 2009. > > Reproduced with a fresh nanobsd build: > > Boot 1 - Slice 1 active as set by nanobsd image builder: > > === > # boot0cfg -v ad0 > # flag start chs type end chs offset size > 1 0x80 0: 1: 1 0xa5 494: 15:63 63 498897 > 2 0x00 495: 1: 1 0xa5 989: 15:63 499023 498897 > 3 0x00 990: 0: 1 0xa5 992: 15:63 997920 3024 > > version=2.0 drive=0x80 mask=0x3 ticks=182 bell=# (0x23) > options=packet,update,nosetdrv > volume serial ID 9090-9090 > default_selection=F1 (Slice 1) > === > > Update the active slice to 2: > === > # boot0cfg -s 2 -v ad0 > # flag start chs type end chs offset size > 1 0x80 0: 1: 1 0xa5 494: 15:63 63 498897 > 2 0x00 495: 1: 1 0xa5 989: 15:63 499023 498897 > 3 0x00 990: 0: 1 0xa5 992: 15:63 997920 3024 > > version=2.0 drive=0x80 mask=0x3 ticks=182 bell=# (0x23) > options=packet,update,nosetdrv > volume serial ID 9090-9090 > default_selection=F2 (Slice 2) > === > > Reboot and let boot0 time out and boot default slice 2: > === > # boot0cfg -v ad0 > # flag start chs type end chs offset size > 1 0x80 0: 1: 1 0xa5 494: 15:63 63 498897 > 2 0x00 495: 1: 1 0xa5 989: 15:63 499023 498897 > 3 0x00 990: 0: 1 0xa5 992: 15:63 997920 3024 > > version=2.0 drive=0x80 mask=0x3 ticks=182 bell=# (0x23) > options=packet,update,nosetdrv > volume serial ID 9090-9090 > default_selection=F2 (Slice 2) > === > The system actually booted into slice 1 here. > This was verified by dropping to the loader prompt and using show to grab: > loaddev=disk0s1a: > > Reboot and hit 2 at the boot0 prompt: > === > # boot0cfg -v ad0 > # flag start chs type end chs offset size > 1 0x00 0: 1: 1 0xa5 494: 15:63 63 498897 > 2 0x80 495: 1: 1 0xa5 989: 15:63 499023 498897 > 3 0x00 990: 0: 1 0xa5 992: 15:63 997920 3024 > > version=2.0 drive=0x80 mask=0x3 ticks=182 bell=# (0x23) > options=packet,update,nosetdrv > volume serial ID 9090-9090 > default_selection=F2 (Slice 2) > === > > This time we really boot into slice 2. > > The attached patch backs out the relevant part of r186598. > > There was a post on the embedded list that suggested this work around: > echo 'a 2' | fdisk -f /dev/stdin ad0 > boot0cfg -s 2 ad0 > > There are 2 issues with this: > 1) It can't be done without setting kern.geom.debugflags to 0x10. > 2) It resulted in most/all commands resulting in the error message > "Device not configured" including the second command and 'shutdown -r now'. > > Both of which leave this really work around fairly broken. > > > Tom > > Index: boot0.S > =================================================================== > --- boot0.S (revision 213760) > +++ boot0.S (working copy) > @@ -373,7 +373,6 @@ > * Timed out or default selection > */ > use_default: movb _OPT(%bp),%al # Load default > - orb $NOUPDATE,_FLAGS(%bp) # Disable updates > jmp check_selection # Join common code > > /*