Date: Mon, 28 Apr 2014 22:25:53 -0700 From: Doug Hardie <bc979@lafn.org> To: dteske@FreeBSD.org Cc: freebsd-stable@freebsd.org, 'Chris H' <bsd-lists@bsdforge.com> Subject: Re: 9.2 Boot Problem Message-ID: <A5176856-EF74-40CD-8F77-C05260D9F722@lafn.org> In-Reply-To: <117a01cf56eb$6f989e50$4ec9daf0$@FreeBSD.org> References: <175D3755-BB9B-4EAD-BDAD-06E9670E06AB@lafn.org> <186472F9-A97B-4863-81BC-67BE788D5E9A@lafn.org> <a865b8f2ccb9ad4918544bad3d49554d.authenticated@ultimatedns.net> <791C8200-023A-4ACB-9B6F-F5A8B0E170F4@lafn.org> <5bfb4fb619954c3dfbd3499aafa98917.authenticated@ultimatedns.net> <4F983E6A-0A7D-403C-AFAA-9CCCCB05716F@lafn.org> <feeca307c8da9ca3b385cf47d75904a7.authenticated@ultimatedns.net> <0f3f01cf5439$13cf8570$3b6e9050$@FreeBSD.org> <981CAA9F-1E67-4E56-A119-BA6D1D29F383@lafn.org> <89290759-E5C2-4991-B644-A82648BEDD52@lafn.org> <1D50A38D-8919-4034-A4E5-EEF8E78E638D@lafn.org> <117a01cf56eb$6f989e50$4ec9daf0$@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On 13 April 2014, at 00:38, dteske@FreeBSD.org wrote: >=20 >=20 >> -----Original Message----- >> From: Doug Hardie [mailto:bc979@lafn.org] >> Sent: Saturday, April 12, 2014 7:08 PM >> To: freebsd-stable@freebsd.org >> Cc: dteske@FreeBSD.org Teske; Chris H >> Subject: Re: 9.2 Boot Problem >>=20 >>=20 >> On 10 April 2014, at 14:23, Doug Hardie <bc979@lafn.org> wrote: >>=20 >>>=20 >>> On 9 April 2014, at 16:53, Doug Hardie <bc979@lafn.org> wrote: >>>=20 >>>>=20 >>>> On 9 April 2014, at 14:17, dteske@FreeBSD.org wrote: >>>>=20 >>>>>=20 >>>>>=20 >>>>>> -----Original Message----- >>>>>> From: Chris H [mailto:bsd-lists@bsdforge.com] >>>>>> Sent: Wednesday, April 9, 2014 2:03 PM >>>>>> To: Doug Hardie >>>>>> Cc: freebsd-stable@freebsd.org List >>>>>> Subject: Re: 9.2 Boot Problem >>>>>>=20 >>>>>>>=20 >>>>>>> On 9 April 2014, at 13:49, "Chris H" <bsd-lists@bsdforge.com> = wrote: >>>>>>>=20 >>>>>>>>>=20 >>>>>>>>> On 9 April 2014, at 11:29, "Chris H" <bsd-lists@bsdforge.com> >> wrote: >>>>>>>>>=20 >>>>>>>>>>>=20 >>>>>>>>>>> On 4 April 2014, at 21:08, Doug Hardie <bc979@lafn.org> = wrote: >>>>>>>>>>>=20 >>>>>>>>>>>> I put this out on Questions, but got no responses. = Hopefully >>>>>>>>>>>> someone here has some ideas. >>>>>>>>>>>>=20 >>>>>>>>>>>> FreeBSD 9.2. All of my systems are hanging during boot = right >>>>>>>>>>>> after the screen that has the picture. Its as if someone = hit >>>>>>>>>>>> a space on the keyboard. However, these systems have no >> keyboard. >>>>>>>>>>>> If I plug one in, or use the serial console, and enter a >>>>>>>>>>>> return, the boot continues properly. >>>>>>>>>>>>=20 >>>>>>>>>>>> The boot menu is displayed along with Beastie. However, = the >>>>>>>>>>>> line that says Autoboot in n seconds. never appears. It = just >>>>>>>>>>>> stops there. These are all new installs from CD systems. >>>>>>>>>>>> I just used freebsd-update to take a toy server from 9.1 to >>>>>>>>>>>> 9.2 and it doesn't exhibit this behavior. It boots = properly. >>>>>>>>>>>> I have updated one of the production servers with the = latest >>>>>>>>>>>> 9.2 changes and it still has the issue. I first thought = that >>>>>>>>>>>> some config file did not get updated properly on the CD. I >>>>>>>>>>>> have dug around through the 4th files and don't see = anything >>>>>>>>>>>> obvious that would cause this. I have now verified that = all >>>>>>>>>>>> the 4th files in boot are identical (except for the version >>>>>>>>>>>> number. They are slightly different). I don't believe = this >>>>>>>>>>>> is a BIOS setting issue as FreeBSD 7.2 didn't exhibit this >>>>>>>>>>>> behavior. All >>>>>>>>>>>> 4 >>>>>>>>>>>> systems are on totally different motherboards. >>>>>>>>>>>>=20 >>>>>>>>>>>> I tried setting loader_logo=3D"none" in /boot/config.rc and >>>>>>>>>>>> that eliminated the menu and Beastie. I think the system >>>>>>>>>>>> completed >>>>>> booting, but the serial console was then dead. >>>>>>>>>>>> It >>>>>>>>>>>> did not respond or output anything. I had to remove that = and >>>>>>>>>>>> reboot to get the console back again. >>>>>>>>>>>>=20 >>>>>>>>>>>> I need to get this fixed as these are production servers = that >>>>>>>>>>>> are essentially unmanned so its difficult to get them back = up >> again. >>>>>>>>>>>=20 >>>>>>>>>>>=20 >>>>>>>>>>> No response here either. Surely someone must know the >> loader. >>>>>>>>>>> I >>>>>> have been digging >>>>>>>>>>> through >>>>>>>>>>> the code, and can't find any differences between the systems >>>>>>>>>>> that >>>>>> work and those that >>>>>>>>>>> don't. >>>>>>>>>>> Is there any way to debug this? Is there a way to find out >>>>>>>>>>> where the >>>>>> loader is sitting >>>>>>>>>>> waiting on input from the terminal. That might give a clue = as >>>>>>>>>>> to why it >>>>>> didn't >>>>>>>>>>> autoboot. >>>>>>>>>>>=20 >>>>>>>>>> OK. This is the first I've seen of your post. I'm not going = to >>>>>>>>>> profess being an expert. But I might suggest adding the >>>>>>>>>> following to >>>>>>>>>> loader.conf(5) >>>>>>>>>>=20 >>>>>>>>>> verbose_loading=3D"YES" >>>>>>>>>> boot_verbose=3D"YES" >>>>>>>>>>=20 >>>>>>>>>> This raises the "noise level". Maybe that will help to = provide >>>>>>>>>> you with a bit more information, as to what, or if, your >>>>>>>>>> booting. DO have a look through /boot/defaults/loader.conf = for >>>>>>>>>> more hints, as to what, and >>>>>> how >>>>>>>>>> you can control the boot process. As well as > /etc/defaults/rc.conf. >>>>>>>>>> In fact, you can pre-decide what, and how, to boot. Even >>>>>>>>>> passing by the boot menu entirely. >>>>>>>>>=20 >>>>>>>>> Thanks Chris. I did that and here is what I get: >>>>>>>>>=20 >>>>>>>>> Rebooting... >>>>>>>>> cpu_reset: Stopping other CPUs >>>>>>>>> /boot.config: -Dh >>>>>>>>> Consoles: internal video/keyboard serial port BIOS drive A: = is >>>>>>>>> disk0 BIOS drive C: is disk1 BIOS 640kB/2087360kB available >>>>>>>>> memory >>>>>>>>>=20 >>>>>>>>> FreeBSD/x86 bootstrap loader, Revision 1.1 = (doug@zool.lafn.org, >>>>>>>>> Tue Apr 8 20:30:20 PDT 2014) Loading = /boot/defaults/loader.conf >>>>>>>>> Warning: unable to open file /boot/loader.conf.local >>>>>>>>> /boot/kernel/kernel text=3D0xdb3171 data=3D0xf3c04+0xbb770 >>>>>> syms=3D[0x4+0xeda80+0x4+0x1b8ebf] >>>>>>>>> zpool_cache...failed! >>>>>>>>> \ >>>>>>>>> H[Esc]ape to loader prompt_ _____ _____ >>>>>>>>> | ____| | _ \ / ____| __ \ >>>>>>>>> | |___ _ __ ___ ___ | |_) | (___ | | | | >>>>>>>>> | ___| '__/ _ \/ _ \| _ < \___ \| | | | >>>>>>>>> | | | | | __/ __/| |_) |____) | |__| | >>>>>>>>> | | | | | | || | | | >>>>>>>>> |_| |_| \___|\___||____/|_____/|_____/ ``` > ` >>>>>>>>> s` = `.....---.......--.``` > -/ >>>>>>>>> + Welcome to FreeBSD + +o .--` = /y:` > +. >>>>>>>>> | | yo`:. = :o > `+- >>>>>>>>> | 1. Boot Multi User [Enter] | y/ 3;46H = / >>>>>>>>> | 2.-- / | >>>>>>>>> | | >>>>>>>>> | 4. Reboot | `: > :` >>>>>>>>> | | `: > :` >>>>>>>>> | Options: / > / >>>>>>>>> | 5. Configure Boot [O]ptions... .- > -. >>>>>>>>> | -- > -. >>>>>>>>> | `:` > `:` >>>>>>>>> | .-- > `--. >>>>>>>>> | = .---.....----. >>>>>>>>> +-----------------------------------------+ >>>>>>>>>=20 >>>>>>>>> FreeBSD `Nakatomi >>>>>>>>> Socrates' 9.2 >>>>>>>>>=20 >>>>>>>>>=20 >>>>>>>>> Now it waits for a return. I have tried changing the logo, >>>>>>>>> setting the >>>>>> autoboot timeout >>>>>>>>> and >>>>>>>>> a couple others. The only thing that did anything different = was >>>>>>>>> setting >>>>>> the logo to an >>>>>>>>> invalid value. Basically the console was dead after that, but >>>>>>>>> the system >>>>>> did boot. I >>>>>>>>> never >>>>>>>>> see the Auto Boot in n seconds message. Its also interesting >>>>>>>>> that the list >>>>>> of options >>>>>>>>> above >>>>>>>>> appears incomplete. On the working system, items 1 through 5 >>>>>>>>> are all >>>>>> present. I have >>>>>>>>> now >>>>>>>>> checked all the cksum's for all the files in /boot and they = are > all the >> same. >>>>>>>>>=20 >>>>>>>> Hmmm. Looks like you're going to make me do all your research, = for >> you. >>>>>> ;) >>>>>>>> You /did/ read the contents of /boot/defaults/loader.conf. Yes? >>>>>>>> I'm >>>>>> guessing >>>>>>>> that you've also already read loader.4th(8), and the other = related >> info. >>>>>>>> Now this is pure supposition; as it appears that you're looking >>>>>>>> for a serial console. I'd /speculate/ that you want to turn all >>>>>>>> that NASTY ANSI stuff >>>>>> OFF >>>>>>>> That's why your not seeing the complete menu -- hear that = Devin! >>>>>>>> I'm going to post just this much for now, just to get you >>>>>>>> started. I know what else you need/are looking for. But need to >>>>>>>> find the /correct/ syntax >>>>>> -- >>>>>>>> paraphrasing, just won't get it. :)\ >>>>>>>=20 >>>>>>> Setting loader_color=3D"NO" (from man page) does give back = the full >> menu. >>>>>> Still waits for >>>>>>> return after the version name. I haven't found in the forth = where >>>>>>> it is >>>>>> reading the >>>>>>> keyboard. Yes, I have to use a serial console. These machines >>>>>>> are about >>>>>> 100 miles away. >>>>>>> Something is stopping the autoboot from even starting. >>>>>>=20 >>>>>> See my reply to this. I think I've given you the hints you need = -- >>>>>> fingers crossed. :) >>>>>>=20 >>>>>=20 >>>>> He's using console=3Dcomconsole (serial boot). >>>>> When that is the case, loader_color is automatically set to NO. >>>>> There's no reason to set both loader_color=3DNO and console=3D >>>>> comconsole. The code that does this is here: >>>>>=20 >>>>> = http://svnweb.freebsd.org/base/release/9.2.0/sys/boot/forth/color.4t >>>>> h?revision=3D255898&view=3Dmarkup Line 48 within the loader_color? >>>>> function: >>>>> boot_serial? if FALSE else TRUE then >>>>>=20 >>>>> As for answering the quandary of where the keyboard is polled = during >>>>> the timeout countdown, that's the getkey function in here: >>>>>=20 >>>>>=20 >> http://svnweb.freebsd.org/base/release/9.2.0/sys/boot/forth/menu.4th >>>>> ?revision=3D255898&view=3Dmarkup >>>>> -- >>>>=20 >>>>=20 >>>>=20 >>>> I commented out the 3 cursor positions in menu-timeout-update. It >>>> does not appear that word is being used. The Autoboot message = never >>>> appeared. Obviously getkey is being used as it does respond = properly >>>> to a return. I am beginning to suspect that menu_timeout_enabled = is >>>> zero. I believe adding a line after getkey's begin with >>>>=20 >>>> s"menu_timeout_enabled =3D " type menu_timeout_enabled @ . 10 >>>> spaces >>>>=20 >>>> will tell me. >>>=20 >>>=20 >>>=20 >>> There is a missing space after the first " above. However, that = does > confirm >> my suspicion that menu_timeout_enabled is set to 0. It is only = displayed >> once. On a working system the value is 1 and that message is output >> numerous times until the 10 seconds expires and then the boot begins. >>>=20 >>> Now to figure out how that value is getting set incorrectly. >>>=20 >>=20 >> After much digging, I now know what it going on, but not why. When = getkey >> is called the first time, menu_timeout_enable is set to one. = However, it > is >> set to zero on every check after that. In getkey after the comment = "Was a >> key pressed" is a check of key to see if a key was pressed. It is > returning a >> decimal 7 (BEL). That then clears menu_timeout_enable and it then = sits >> there waiting for a valid key input. There is no keyboard plugged = into > the >> system. I have no idea how that BEL is being generated or even how = to >> prevent it. Could it be possible that it comes from the serial = console? > I tend >> to doubt thats the case since the system hangs during boot when the = serial >> console is not connected. I suppose that I could put in a test for a = key > value >> that is not a control character, but that would only work until the = next > system >> update. I'd have to remember to put it back in each time. Thats not > likely to >> happen. My memory is not that good. Whats interesting is that I = have 4 >> systems (i386) doing this and 1 system (i386) and 2 systems (amd64) = not >> doing it. The only common thread is the 4 systems doing it are about = 100 >> miles from me and the working ones are here. >>=20 >=20 > Based on that feedback, I've developed the attached patch.txt. > Can you give it a whirl and let me know how it works? The patch works properly. However, it the process of testing it, I = discovered that the cause of the "bell" is actually the terminal = emulator echoing that character back from something earlier in the = reboot process. Why that character is not understood. Hence, the real = problem lies in a hardware "failure" outside the motherboard. So I = don't know if you want to make that patch into the system or not. It = seems like a good idea to ignore anything thats a control character, or = to clear out the input at the start of the process anyway. In my case, I need the patch and will keep it in my systems. Thanks for all the help. -- Doug
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?A5176856-EF74-40CD-8F77-C05260D9F722>