Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 27 Aug 2019 15:27:15 -0600
From:      Scott Long <scottl@samsco.org>
To:        Alexander Motin <mav@FreeBSD.org>
Cc:        src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org
Subject:   Re: svn commit: r351550 - head/sys/cam/scsi
Message-ID:  <CCC6D9B5-A664-4CBE-9138-11E48F8B7082@samsco.org>
In-Reply-To: <3c2aa0be-3d42-881e-87e1-675499a7bc5f@FreeBSD.org>
References:  <201908271641.x7RGf6LC075849@repo.freebsd.org> <99271565-F168-48C8-90E0-749417C7C974@samsco.org> <3c2aa0be-3d42-881e-87e1-675499a7bc5f@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Excellent work, thank you!

Scott


> On Aug 27, 2019, at 2:57 PM, Alexander Motin <mav@FreeBSD.org> wrote:
>=20
> Some FreeNAS user reported panic after updating to newer version.  On
> the screenshot provided were several BUSY statuses for SATA disk on
> mps(4), followed by panic "Attempt to remove out-of-bounds index -1 =
from
> queue ...".  In his case I blame ancient LSI firmware or some broken
> hardware, but I was able to reproduce the panic on FreeBSD head debug
> kernel by hacking mps(4) driver to always report BUSY (appeared except
> IDENTIFY and REPORT LUNS).  To diagnose it I inserted assertion into
> xpt_free_ccb(), checking ccb->ccb_h.pinfo.index for values used for
> requests still in send queue.  Not sure it is to be persistent, but in
> this case it lead me directly to this place.
>=20
> On 27.08.2019 16:23, Scott Long wrote:
>> This is very concerning, and I wonder if it=E2=80=99s the cause of =
the mystery use-after-free / double-complete that I=E2=80=99ve seen for =
years and have never been able to catch.  Can you say more about how you =
found it?
>>=20
>> Scott
>>=20
>>=20
>>> On Aug 27, 2019, at 10:41 AM, Alexander Motin <mav@FreeBSD.org> =
wrote:
>>>=20
>>> Author: mav
>>> Date: Tue Aug 27 16:41:06 2019
>>> New Revision: 351550
>>> URL: https://svnweb.freebsd.org/changeset/base/351550
>>>=20
>>> Log:
>>> Always check cam_periph_error() status for ERESTART.
>>>=20
>>> Even if we do not expect retries, we better be sure, since otherwise =
it
>>> may result in use after free kernel panic.  I've noticed that it =
retries
>>> SCSI_STATUS_BUSY even with SF_NO_RECOVERY | SF_NO_RETRY.
>>>=20
>>> MFC after:	1 week
>>> Sponsored by:	iXsystems, Inc.
>>>=20
>>> Modified:
>>> head/sys/cam/scsi/scsi_xpt.c
>>>=20
>>> Modified: head/sys/cam/scsi/scsi_xpt.c
>>> =
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D
>>> --- head/sys/cam/scsi/scsi_xpt.c	Tue Aug 27 15:42:08 2019	=
(r351549)
>>> +++ head/sys/cam/scsi/scsi_xpt.c	Tue Aug 27 16:41:06 2019	=
(r351550)
>>> @@ -1684,8 +1684,9 @@ probe_device_check:
>>> 	case PROBE_TUR_FOR_NEGOTIATION:
>>> 	case PROBE_DV_EXIT:
>>> 		if (cam_ccb_status(done_ccb) !=3D CAM_REQ_CMP) {
>>> -			cam_periph_error(done_ccb, 0,
>>> -			    SF_NO_PRINT | SF_NO_RECOVERY | SF_NO_RETRY);
>>> +			if (cam_periph_error(done_ccb, 0, SF_NO_PRINT |
>>> +			    SF_NO_RECOVERY | SF_NO_RETRY) =3D=3D =
ERESTART)
>>> +				goto outr;
>>> 		}
>>> 		if ((done_ccb->ccb_h.status & CAM_DEV_QFRZN) !=3D 0) {
>>> 			/* Don't wedge the queue */
>>> @@ -1735,8 +1736,9 @@ probe_device_check:
>>> 		struct ccb_scsiio *csio;
>>>=20
>>> 		if (cam_ccb_status(done_ccb) !=3D CAM_REQ_CMP) {
>>> -			cam_periph_error(done_ccb, 0,
>>> -			    SF_NO_PRINT | SF_NO_RECOVERY | SF_NO_RETRY);
>>> +			if (cam_periph_error(done_ccb, 0, SF_NO_PRINT |
>>> +			    SF_NO_RECOVERY | SF_NO_RETRY) =3D=3D =
ERESTART)
>>> +				goto outr;
>>> 		}
>>> 		if ((done_ccb->ccb_h.status & CAM_DEV_QFRZN) !=3D 0) {
>>> 			/* Don't wedge the queue */
>>>=20
>>=20
>=20
> --=20
> Alexander Motin
>=20




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CCC6D9B5-A664-4CBE-9138-11E48F8B7082>