Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 24 Jul 2017 19:45:01 +0200
From:      Mark Martinec <Mark.Martinec+freebsd@ijs.si>
To:        freebsd-stable@freebsd.org
Cc:        re@freebsd.org
Subject:   Re: The 11.1-RC3 can only boot and attach disks in "Safe mode", otherwise gets stuck attaching
Message-ID:  <42cc3fffe99f5b7d5deb7d7bf8d071cd@ijs.si>
In-Reply-To: <D466218B-9C65-418E-B9C1-5AE904EA72CA@freebsd.org>
References:  <e4acc16980fe65751325333870bf2b68@ijs.si> <20170717232434.GB21048@wkstn-mjohnston.west.isilon.com> <c8140f430fb2af93a6bc70a3df8cdadc@ijs.si> <9b3563aae75aa954d7fe31ffe25e1d29@ijs.si> <20170720000325.GB9198@wkstn-mjohnston.west.isilon.com> <81295bcacd7c44813de8d346c88cbb65@ijs.si> <20170724021504.GA97170@raichu> <10649c9070bc419d93ae2a87a511d2ba@ijs.si> <c9b444f1-cb74-8402-4033-0d6161739e8f@multiplay.co.uk> <D466218B-9C65-418E-B9C1-5AE904EA72CA@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
2017-07-24 18:25, Ken Merry wrote:
> It is possible that the change I MFCed today (r321207 in head, r321415
> in stable/11) is related, but Mark will have to boot his machine with
> the fix to see if it makes any difference.
> 
> What happened in my case on one particular machine (not on most
> machines in our lab running the same code) was that mps_wait_command()
> / mpr_wait_command() would not wait the full 60 seconds for a write to
> the DPM table (Driver Persistent Mapping) table in the controller.
> So, it reported that there was a timeout.
> [...]
> Eliminating bogus timeouts will eliminate most all of the sources of
> those panics anyway.

Took r321415 from stable/11 and applied it to 11.1-RC3 - and it makes
no difference to booting: still hangs attempting to attach da0,
with a spinning CPU (according to fan speed).
Booting in safe mode, or with EARLY_AP_STARTUP disabled avoids the 
problem.

> There is a secondary bug that is still in the mps(4) / mpr(4) drivers
> when a timeout does happen — the error recovery code in the
> wait_command() routine reinitializes the controller, which clears out
> all the commands.  When the wait_command() routine returns, the
> command passed in has been freed, but the caller doesn’t know that.
> So the caller (it happens in a number of places) dereferences a
> pointer to freed memory and the kernel panics.
> 
> I’m planning to fix that bug, too, if slm@ doesn’t get to it first,
> I’ve just had other bugs to fix first.

No panics in my case, just hangs.

   Mark



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?42cc3fffe99f5b7d5deb7d7bf8d071cd>