Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 24 Jul 2017 12:25:34 -0400
From:      Ken Merry <ken@freebsd.org>
To:        Steven Hartland <killing@multiplay.co.uk>
Cc:        freebsd-stable@freebsd.org, "re@freebsd.org" <re@freebsd.org>, Mark.Martinec+freebsd@ijs.si, Stephen Mcconnell <stephen.mcconnell@broadcom.com>
Subject:   Re: The 11.1-RC3 can only boot and attach disks in "Safe mode", otherwise gets stuck attaching
Message-ID:  <D466218B-9C65-418E-B9C1-5AE904EA72CA@freebsd.org>
In-Reply-To: <c9b444f1-cb74-8402-4033-0d6161739e8f@multiplay.co.uk>
References:  <e4acc16980fe65751325333870bf2b68@ijs.si> <20170717232434.GB21048@wkstn-mjohnston.west.isilon.com> <c8140f430fb2af93a6bc70a3df8cdadc@ijs.si> <9b3563aae75aa954d7fe31ffe25e1d29@ijs.si> <20170720000325.GB9198@wkstn-mjohnston.west.isilon.com> <81295bcacd7c44813de8d346c88cbb65@ijs.si> <20170724021504.GA97170@raichu> <10649c9070bc419d93ae2a87a511d2ba@ijs.si> <c9b444f1-cb74-8402-4033-0d6161739e8f@multiplay.co.uk>

next in thread | previous in thread | raw e-mail | index | archive | help
It is possible that the change I MFCed today (r321207 in head, r321415 =
in stable/11) is related, but Mark will have to boot his machine with =
the fix to see if it makes any difference.

What happened in my case on one particular machine (not on most machines =
in our lab running the same code) was that mps_wait_command() / =
mpr_wait_command() would not wait the full 60 seconds for a write to the =
DPM table (Driver Persistent Mapping) table in the controller.  So, it =
reported that there was a timeout.

There is a secondary bug that is still in the mps(4) / mpr(4) drivers =
when a timeout does happen =E2=80=94 the error recovery code in the =
wait_command() routine reinitializes the controller, which clears out =
all the commands.  When the wait_command() routine returns, the command =
passed in has been freed, but the caller doesn=E2=80=99t know that.  So =
the caller (it happens in a number of places) dereferences a pointer to =
freed memory and the kernel panics.

I=E2=80=99m planning to fix that bug, too, if slm@ doesn=E2=80=99t get =
to it first, I=E2=80=99ve just had other bugs to fix first.

Eliminating bogus timeouts will eliminate most all of the sources of =
those panics anyway.

Ken
=E2=80=94=20
Ken Merry
ken@FreeBSD.ORG



> On Jul 24, 2017, at 12:10 PM, Steven Hartland =
<killing@multiplay.co.uk> wrote:
>=20
> Based on your boot info you're using mps, so this could be related to =
mps fix committed to stable/11 today by ken@
> https://svnweb.freebsd.org/changeset/base/321415 =
<https://svnweb.freebsd.org/changeset/base/321415>;
>=20
> re@ cc'ed as this could cause hangs for others too on 11.1-RELEASE if =
this is the case.
>=20
>     Regards
>     Steve
>=20
> On 24/07/2017 15:55, Mark Martinec wrote:
>>> Thanks! Tried it, and the message (or a backtrace) does not show=20
>>> during a boot of a generic (patched) kernel, at least not in=20
>>> the last 40-lines screen before the hang occurs.=20
>>> (It also does not show during a "Safe mode" successful boot.)=20
>>=20
>> Btw (may or may not be relevant): after the above experiment=20
>> I have rebooted the machine in "Safe mode" (generic kernel,=20
>> EARLY_AP_STARTUP enabled by default) - and spent some time=20
>> doing non-intensive interactive work on this host (web browsing,=20
>> editor, shell, all under KDE) - and after about an hour the=20
>> machine froze: clock display not updating, keyboard unresponsive,=20
>> console virtual terminals inaccessible) - so had to reboot.=20
>> According to fans speed the machine was idle.=20
>> The /var/log/messages does not show anything of interest=20
>> before the freeze. All disks are under ZFS.=20
>>=20
>> Can EARLY_AP_STARTUP have an effect also _after_ booting?=20
>> This host never hung during normal work when EARLY_AP_STARTUP=20
>> was disabled (or with 11.0 and earlier).=20
>>=20
>>   Mark=20
>> _______________________________________________=20
>> freebsd-stable@freebsd.org <mailto:freebsd-stable@freebsd.org> =
mailing list=20
>> https://lists.freebsd.org/mailman/listinfo/freebsd-stable =
<https://lists.freebsd.org/mailman/listinfo/freebsd-stable>=20
>> To unsubscribe, send any mail to =
"freebsd-stable-unsubscribe@freebsd.org" =
<mailto:freebsd-stable-unsubscribe@freebsd.org>=20
>=20




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?D466218B-9C65-418E-B9C1-5AE904EA72CA>