Date: Sat, 4 Nov 2017 23:35:47 -0600 From: Warner Losh <imp@bsdimp.com> To: Peter Wemm <peter@wemm.org> Cc: "svn-src-all@freebsd.org" <svn-src-all@freebsd.org>, Warner Losh <imp@freebsd.org>, src-committers <src-committers@freebsd.org>, "svn-src-head@freebsd.org" <svn-src-head@freebsd.org> Subject: Re: svn commit: r325378 - head/sys/dev/ipmi Message-ID: <CANCZdfpkwhcVefhr1bp7oAVvq_uy1ASLot6XdV=zCTYnY3cx7g@mail.gmail.com> In-Reply-To: <1595776.mmy5sTxHyV@overcee.wemm.org> References: <201711040301.vA431wdY002757@repo.freebsd.org> <2932858.xKWtPkGhRe@overcee.wemm.org> <CANCZdfq8jnuO8_=5PFFbXeEu_V14LM4_zYxjF2EBsmk9g-srMQ@mail.gmail.com> <1595776.mmy5sTxHyV@overcee.wemm.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, Nov 4, 2017 at 11:19 PM, Peter Wemm <peter@wemm.org> wrote: > On Saturday, November 04, 2017 11:03:55 PM Warner Losh wrote: > > On Sat, Nov 4, 2017 at 10:50 PM, Peter Wemm <peter@wemm.org> wrote: > > > On Saturday, November 04, 2017 03:01:58 AM Warner Losh wrote: > > > > Author: imp > > > > Date: Sat Nov 4 03:01:58 2017 > > > > New Revision: 325378 > > > > URL: https://svnweb.freebsd.org/changeset/base/325378 > > > > > > > > Log: > > > > Make the startup timeout 0 seconds by default rathern than 420s. > This > > > > makes the default fail safe when watchdogd is disabled (which is > also > > > > the default). > > > > > > We're still getting unanticipated reboots. > > > > > > I think what is happening is: > > > 1) orderly reboot initiated. > > > 2) By default, the watchdog code sets a 420 second timer, even with no > > > watchdogd. > > > 3) reboot complets, system comes up. > > > 4) A few minutes later, the pre-reboot 420 second timer expires and > > > *another* > > > reboot happens. > > > > > > Setting hw.ipmi.on="0" in loader.conf stops this... > > > > > > eg: reboot at 4:41:47.. system comes back up, and later: > > > ... > > > Uptime: 322 Sun Nov 5 04:48:45 UTC 2017 > > > Uptime: 323 Sun Nov 5 04:48:46 UTC 2017 > > > Uptime: 324 Sun Nov 5 04:48:47 UTC 2017 > > > Stopping cron. > > > Waiting for PIDS: 1004. > > > Stopping sshd. > > > Waiting for PIDS: 994. > > > Stopping nginx. > > > ... > > > That's exactly 420 seconds after the original reboot which matches the > > > wd_shutdown_countdown timer that is still enabled.] > > > > Good detective work.I suspect this will need to be opt-in as well... > Though > > the other option is to disable the watchdog on attach if we're not > enabling > > the early watchdog which would give us a watchdog when we hang on > > shutdown... I need to think this through.... Fix it early with less > > protection by setting this to 0, or fix it later with more protection, > but > > perhaps odd behavior for some edge cases like downgrade. > > > > In the mean time hw.ipmi.wd_shutdown_countdown=0 should also fix it. Can > > you confirm that? > > > > Warner > > We have a number of obnoxious machines that take 5+ minutes in POST. The 7 > minute timer is cutting it awfully close. > > However, what I'm more worried about: what if you're going to boot > something > other than FreeBSD? Or going into the BIOS to tweak something? If I > break > into the loader to pause booting, it'll just silently reboot out from > under me > a few minutes later. I don't see how this can be anything but opt-in by > default. As it's a timer initiated by an orderly shutdown/reboot there > should > be plenty of time for an approprate value to be safely set. > > Yes, setting the sysctl after boot did prevent the spurious reboot after > the > next boot-up. OK. Given the edge cases aren't so edgy as I was originally thinking, I'm inclined to agree here: both features have to be opt-in. Attempts at being clever only work in a monoculture of FreeBSD where one is always moving forward in versions and never back. There's problems with both of these assumptions... Sorry for what sounds like a lot of hassle to diagnose this. Warner
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANCZdfpkwhcVefhr1bp7oAVvq_uy1ASLot6XdV=zCTYnY3cx7g>