Date: Tue, 27 Aug 2019 15:27:15 -0600 From: Scott Long <scottl@samsco.org> To: Alexander Motin <mav@FreeBSD.org> Cc: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: Re: svn commit: r351550 - head/sys/cam/scsi Message-ID: <CCC6D9B5-A664-4CBE-9138-11E48F8B7082@samsco.org> In-Reply-To: <3c2aa0be-3d42-881e-87e1-675499a7bc5f@FreeBSD.org> References: <201908271641.x7RGf6LC075849@repo.freebsd.org> <99271565-F168-48C8-90E0-749417C7C974@samsco.org> <3c2aa0be-3d42-881e-87e1-675499a7bc5f@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Excellent work, thank you! Scott > On Aug 27, 2019, at 2:57 PM, Alexander Motin <mav@FreeBSD.org> wrote: >=20 > Some FreeNAS user reported panic after updating to newer version. On > the screenshot provided were several BUSY statuses for SATA disk on > mps(4), followed by panic "Attempt to remove out-of-bounds index -1 = from > queue ...". In his case I blame ancient LSI firmware or some broken > hardware, but I was able to reproduce the panic on FreeBSD head debug > kernel by hacking mps(4) driver to always report BUSY (appeared except > IDENTIFY and REPORT LUNS). To diagnose it I inserted assertion into > xpt_free_ccb(), checking ccb->ccb_h.pinfo.index for values used for > requests still in send queue. Not sure it is to be persistent, but in > this case it lead me directly to this place. >=20 > On 27.08.2019 16:23, Scott Long wrote: >> This is very concerning, and I wonder if it=E2=80=99s the cause of = the mystery use-after-free / double-complete that I=E2=80=99ve seen for = years and have never been able to catch. Can you say more about how you = found it? >>=20 >> Scott >>=20 >>=20 >>> On Aug 27, 2019, at 10:41 AM, Alexander Motin <mav@FreeBSD.org> = wrote: >>>=20 >>> Author: mav >>> Date: Tue Aug 27 16:41:06 2019 >>> New Revision: 351550 >>> URL: https://svnweb.freebsd.org/changeset/base/351550 >>>=20 >>> Log: >>> Always check cam_periph_error() status for ERESTART. >>>=20 >>> Even if we do not expect retries, we better be sure, since otherwise = it >>> may result in use after free kernel panic. I've noticed that it = retries >>> SCSI_STATUS_BUSY even with SF_NO_RECOVERY | SF_NO_RETRY. >>>=20 >>> MFC after: 1 week >>> Sponsored by: iXsystems, Inc. >>>=20 >>> Modified: >>> head/sys/cam/scsi/scsi_xpt.c >>>=20 >>> Modified: head/sys/cam/scsi/scsi_xpt.c >>> = =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D >>> --- head/sys/cam/scsi/scsi_xpt.c Tue Aug 27 15:42:08 2019 = (r351549) >>> +++ head/sys/cam/scsi/scsi_xpt.c Tue Aug 27 16:41:06 2019 = (r351550) >>> @@ -1684,8 +1684,9 @@ probe_device_check: >>> case PROBE_TUR_FOR_NEGOTIATION: >>> case PROBE_DV_EXIT: >>> if (cam_ccb_status(done_ccb) !=3D CAM_REQ_CMP) { >>> - cam_periph_error(done_ccb, 0, >>> - SF_NO_PRINT | SF_NO_RECOVERY | SF_NO_RETRY); >>> + if (cam_periph_error(done_ccb, 0, SF_NO_PRINT | >>> + SF_NO_RECOVERY | SF_NO_RETRY) =3D=3D = ERESTART) >>> + goto outr; >>> } >>> if ((done_ccb->ccb_h.status & CAM_DEV_QFRZN) !=3D 0) { >>> /* Don't wedge the queue */ >>> @@ -1735,8 +1736,9 @@ probe_device_check: >>> struct ccb_scsiio *csio; >>>=20 >>> if (cam_ccb_status(done_ccb) !=3D CAM_REQ_CMP) { >>> - cam_periph_error(done_ccb, 0, >>> - SF_NO_PRINT | SF_NO_RECOVERY | SF_NO_RETRY); >>> + if (cam_periph_error(done_ccb, 0, SF_NO_PRINT | >>> + SF_NO_RECOVERY | SF_NO_RETRY) =3D=3D = ERESTART) >>> + goto outr; >>> } >>> if ((done_ccb->ccb_h.status & CAM_DEV_QFRZN) !=3D 0) { >>> /* Don't wedge the queue */ >>>=20 >>=20 >=20 > --=20 > Alexander Motin >=20
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CCC6D9B5-A664-4CBE-9138-11E48F8B7082>