Date: Mon, 24 Jul 2017 12:25:34 -0400 From: Ken Merry <ken@freebsd.org> To: Steven Hartland <killing@multiplay.co.uk> Cc: freebsd-stable@freebsd.org, "re@freebsd.org" <re@freebsd.org>, Mark.Martinec+freebsd@ijs.si, Stephen Mcconnell <stephen.mcconnell@broadcom.com> Subject: Re: The 11.1-RC3 can only boot and attach disks in "Safe mode", otherwise gets stuck attaching Message-ID: <D466218B-9C65-418E-B9C1-5AE904EA72CA@freebsd.org> In-Reply-To: <c9b444f1-cb74-8402-4033-0d6161739e8f@multiplay.co.uk> References: <e4acc16980fe65751325333870bf2b68@ijs.si> <20170717232434.GB21048@wkstn-mjohnston.west.isilon.com> <c8140f430fb2af93a6bc70a3df8cdadc@ijs.si> <9b3563aae75aa954d7fe31ffe25e1d29@ijs.si> <20170720000325.GB9198@wkstn-mjohnston.west.isilon.com> <81295bcacd7c44813de8d346c88cbb65@ijs.si> <20170724021504.GA97170@raichu> <10649c9070bc419d93ae2a87a511d2ba@ijs.si> <c9b444f1-cb74-8402-4033-0d6161739e8f@multiplay.co.uk>
next in thread | previous in thread | raw e-mail | index | archive | help
It is possible that the change I MFCed today (r321207 in head, r321415 = in stable/11) is related, but Mark will have to boot his machine with = the fix to see if it makes any difference. What happened in my case on one particular machine (not on most machines = in our lab running the same code) was that mps_wait_command() / = mpr_wait_command() would not wait the full 60 seconds for a write to the = DPM table (Driver Persistent Mapping) table in the controller. So, it = reported that there was a timeout. There is a secondary bug that is still in the mps(4) / mpr(4) drivers = when a timeout does happen =E2=80=94 the error recovery code in the = wait_command() routine reinitializes the controller, which clears out = all the commands. When the wait_command() routine returns, the command = passed in has been freed, but the caller doesn=E2=80=99t know that. So = the caller (it happens in a number of places) dereferences a pointer to = freed memory and the kernel panics. I=E2=80=99m planning to fix that bug, too, if slm@ doesn=E2=80=99t get = to it first, I=E2=80=99ve just had other bugs to fix first. Eliminating bogus timeouts will eliminate most all of the sources of = those panics anyway. Ken =E2=80=94=20 Ken Merry ken@FreeBSD.ORG > On Jul 24, 2017, at 12:10 PM, Steven Hartland = <killing@multiplay.co.uk> wrote: >=20 > Based on your boot info you're using mps, so this could be related to = mps fix committed to stable/11 today by ken@ > https://svnweb.freebsd.org/changeset/base/321415 = <https://svnweb.freebsd.org/changeset/base/321415> >=20 > re@ cc'ed as this could cause hangs for others too on 11.1-RELEASE if = this is the case. >=20 > Regards > Steve >=20 > On 24/07/2017 15:55, Mark Martinec wrote: >>> Thanks! Tried it, and the message (or a backtrace) does not show=20 >>> during a boot of a generic (patched) kernel, at least not in=20 >>> the last 40-lines screen before the hang occurs.=20 >>> (It also does not show during a "Safe mode" successful boot.)=20 >>=20 >> Btw (may or may not be relevant): after the above experiment=20 >> I have rebooted the machine in "Safe mode" (generic kernel,=20 >> EARLY_AP_STARTUP enabled by default) - and spent some time=20 >> doing non-intensive interactive work on this host (web browsing,=20 >> editor, shell, all under KDE) - and after about an hour the=20 >> machine froze: clock display not updating, keyboard unresponsive,=20 >> console virtual terminals inaccessible) - so had to reboot.=20 >> According to fans speed the machine was idle.=20 >> The /var/log/messages does not show anything of interest=20 >> before the freeze. All disks are under ZFS.=20 >>=20 >> Can EARLY_AP_STARTUP have an effect also _after_ booting?=20 >> This host never hung during normal work when EARLY_AP_STARTUP=20 >> was disabled (or with 11.0 and earlier).=20 >>=20 >> Mark=20 >> _______________________________________________=20 >> freebsd-stable@freebsd.org <mailto:freebsd-stable@freebsd.org> = mailing list=20 >> https://lists.freebsd.org/mailman/listinfo/freebsd-stable = <https://lists.freebsd.org/mailman/listinfo/freebsd-stable>=20 >> To unsubscribe, send any mail to = "freebsd-stable-unsubscribe@freebsd.org" = <mailto:freebsd-stable-unsubscribe@freebsd.org>=20 >=20
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?D466218B-9C65-418E-B9C1-5AE904EA72CA>