Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 14 Dec 2009 17:09:08 -0500
From:      Alexander Sack <pisymbol@gmail.com>
To:        freebsd-current@freebsd.org
Cc:        freebsd-scsi@freebsd.org
Subject:   Re: aac(4) handling of probe when no devices are there
Message-ID:  <3c0b01820912141409t74a3554ctd224db485ceeb80c@mail.gmail.com>
In-Reply-To: <3c0b01820912141347y366a7252y5d9711b1141b9b70@mail.gmail.com>
References:  <3c0b01820912141347y366a7252y5d9711b1141b9b70@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Dec 14, 2009 at 4:47 PM, Alexander Sack <pisymbol@gmail.com> wrote:
> Hello Again:
>
> I guess I have a technical question/concern that I was looking for
> feedback. =A0 During the probe sequence, aac(4) conditionally responds
> to INQUIRY commands depending on target LUN:
>
> aac_cam.c/aac_cam_complete():
> 532 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (command =3D=3D IN=
QUIRY) {
> 533 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (c=
cb->ccb_h.status =3D=3D CAM_REQ_CMP) {
> 534 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 devic=
e =3D ccb->csio.data_ptr[0] & 0x1f;
> 535 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 /*
> 536 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0* =
We want DASD and PROC devices to only be
> 537 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0* =
visible through the pass device.
> 538 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0*/
> 539 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if ((=
device =3D=3D T_DIRECT) ||
> 540 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 (device =3D=3D T_PROCESSOR) ||
> 541 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 (sc->flags & AAC_FLAGS_CAM_PASSONLY))
> 542 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 ccb->csio.data_ptr[0] =3D
> 543 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 =A0 =A0 ((device & 0xe0) | T_NODEVICE);
> 544 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 } els=
e if (ccb->ccb_h.status =3D=3D
> CAM_SEL_TIMEOUT &&
> 545 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 ccb->ccb_h.target_lun !=3D 0) {
> 546 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 /* fix for INQUIRYs on Lun>0 */
> 547 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 ccb->ccb_h.status =3D
> CAM_DEV_NOT_THERE;
> 548 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 }
> 549 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 }
>
> Why is CAM_DEV_NOT_THERE skipped on LUN 0? =A0This is true on my target
> 6.1-amd64 machine as well as CURRENT. =A0The reason why I ask this is
> because now that aac(4) is sequential scanned, there are a lot of cam
> interrupts that come in on my 6.x machine where the threshold is only
> 500 and I get the interrupt storm threshold warning for swi2 pretty
> quickly:
>
> Interrupt storm detected on "swi2:"; throttling interrupt source
>
> Obviously its contingent on the number of adapters you have on your
> system. =A0On CURRENT I didn't see this because the threshold is double
> (I think its a 1000 by default).
>
> The issue is the number of xpt_async(AC_LOST_DEVICE, ..) calls during
> the scan. =A0The probe sequence in CURRENT as well as 6.1 handles
> CAM_SEL_TIMEOUT a little differently depending on context.
>
> scsi_xpt.c/probedone():
> 1090 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 } else if (cam_periph_error(done_ccb=
, 0,
> 1091 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 =A0 =A0 done_ccb->ccb_h.target_lun > 0
> 1092 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 =A0 =A0 ? SF_RETRY_UA|SF_QUIET_IR
> 1093 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 =A0 =A0 : SF_RETRY_UA,
> 1094 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 =A0 =A0 &softc->saved_ccb) =3D=3D
> ERESTART) {
> 1095 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 return;
> 1096 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 } else if ((done_ccb->ccb_h.status &=
 CAM_DEV_QFRZN) !=3D 0) {
> 1097 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 /* Don't wedge the q=
ueue */
> 1098 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 xpt_release_devq(don=
e_ccb->ccb_h.path, /*count*/1,
> 1099 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 =A0/*run_queue*/TRUE);
> 1100 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 }
> 1101 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 /*
> 1102 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0* If we get to this point, we got=
 an error status back
> 1103 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0* from the inquiry and the error =
status doesn't require
> 1104 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0* automatically retrying the comm=
and. =A0Therefore, the
> 1105 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0* inquiry failed. =A0If we had in=
quiry information before
> 1106 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0* for this device, but this lates=
t inquiry command failed,
> 1107 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0* the device has probably gone aw=
ay. =A0If this device isn't
> 1108 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0* already marked unconfigured, no=
tify the peripheral
> 1109 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0* drivers that this device is no =
more.
> 1110 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0*/
> 1111 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if ((path->device->flags & CAM_DEV_U=
NCONFIGURED) =3D=3D 0)
> 1112 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 /* Send the async no=
tification. */
> 1113 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 xpt_async(AC_LOST_DE=
VICE, path, NULL);
> 1114
> 1115 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 xpt_release_ccb(done_ccb);
> 1116 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 break;
> 1117 =A0 =A0 =A0 =A0 }
>
> But on cam_periph_error(), this will issue a xpt_async(AC_LOST_DEVICE,
> path, NULL) regardless of whether or not the device has been scene
> already (as per the comment above), i.e. on every initial bus scan,
> you will get into (on an aac(4) card with LUN > 0):
>
> cam_periph.c/cam_periph_error():
> 1697 =A0 =A0 =A0 =A0 case CAM_SEL_TIMEOUT:
> 1698 =A0 =A0 =A0 =A0 {
> .
> .
> 1729 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 /*
> 1730 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0* Let peripheral drivers know tha=
t this device has gone
> 1731 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0* away.
> 1732 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0*/
> 1733 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 xpt_async(AC_LOST_DEVICE, newpath, N=
ULL);
> 1734 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 xpt_free_path(newpath);
> 1735 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 break;
>
> Is this really right? This generates A LOT of interrupts noise when no
> devices are attached during the initial scan, i.e. we are treating the
> initial scan of failed INQUIRY commands on the SCSI BUS as if we
> really lost a device during a selection timeout. =A0(we even generate a
> path to issue the async event).

I should have properly titled the thread a little bit better, but
basically we always generate a ton of software CAM interrupts during a
LUN scan for targets on aac(4) that do not really exist (i.e. nothing
is truly there).  We do this because we treat the initial INQUIRY sent
down equivalent to a selection timeout instead of the device is not
really there.  There seems to be an historical workaround for part of
this issue but I am trying to delve deeper in order to do the *right
thing* for our 6.1 deployments (as well as 7.x and CURRENT).

-aps



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3c0b01820912141409t74a3554ctd224db485ceeb80c>