Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 16 Dec 2009 12:10:59 -0500
From:      Alexander Sack <pisymbol@gmail.com>
To:        Scott Long <scottl@samsco.org>
Cc:        freebsd-scsi@freebsd.org, freebsd-current@freebsd.org
Subject:   Re: aac(4) handling of probe when no devices are there
Message-ID:  <3c0b01820912160910i35e12112s4d6412d6cb174f3b@mail.gmail.com>
In-Reply-To: <978BBD51-222D-42F0-9D3A-FFACCBCC886D@samsco.org>
References:  <3c0b01820912141347y366a7252y5d9711b1141b9b70@mail.gmail.com> <978BBD51-222D-42F0-9D3A-FFACCBCC886D@samsco.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Dec 15, 2009 at 4:54 AM, Scott Long <scottl@samsco.org> wrote:
> On Dec 14, 2009, at 2:47 PM, Alexander Sack wrote:
>>
>> Hello Again:
>>
>> I guess I have a technical question/concern that I was looking for
>> feedback. =A0 During the probe sequence, aac(4) conditionally responds
>> to INQUIRY commands depending on target LUN:
>>
>> aac_cam.c/aac_cam_complete():
>> 532 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (command =3D=3D I=
NQUIRY) {
>> 533 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (=
ccb->ccb_h.status =3D=3D CAM_REQ_CMP)
>> {
>> 534 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 devi=
ce =3D ccb->csio.data_ptr[0] & 0x1f;
>> 535 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 /*
>> 536 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0*=
 We want DASD and PROC devices to
>> only be
>> 537 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0*=
 visible through the pass device.
>> 538 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0*=
/
>> 539 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (=
(device =3D=3D T_DIRECT) ||
>> 540 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 (device =3D=3D T_PROCESSOR) ||
>> 541 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 (sc->flags &
>> AAC_FLAGS_CAM_PASSONLY))
>> 542 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 ccb->csio.data_ptr[0] =3D
>> 543 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 =A0 =A0 ((device & 0xe0) |
>> T_NODEVICE);
>> 544 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 } el=
se if (ccb->ccb_h.status =3D=3D
>> CAM_SEL_TIMEOUT &&
>> 545 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 ccb->ccb_h.target_lun !=3D 0) {
>> 546 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 /* fix for INQUIRYs on Lun>0
>> */
>> 547 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 ccb->ccb_h.status =3D
>> CAM_DEV_NOT_THERE;
>> 548 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 }
>> 549 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 }
>>
>> Why is CAM_DEV_NOT_THERE skipped on LUN 0?
>
> In the parallel scsi world, a selection timeout means that all LUNs withi=
n
> the entire target =A0do not (or no longer) exist. =A0So returning
> CAM_SEL_TIMEOUT for LUN 1 would tell CAM to invalidate LUN 0 as well.
>
> If you look higher up in this function, you'll see a note about the
> error/status codes from the AAC firmware coincidentally matching CAM's
> status codes. =A0My guess is that somewhere along the line, someone at Ad=
aptec
> stopped reading the SCSI spec and starting returning CAM_SEL_TIMEOUT for
> LUNs greater than 0, which is why this work-around is now in the driver.

Interesting.  Learn something everyday.  I did not know that a
selection timeout on a non-zero LUN meant no other LUN was available.
As a colleague noted, "Has Adaptec ever read the SCSI spec?"  Just
kidding (somewhat)....

>> =A0This is true on my target
>> 6.1-amd64 machine as well as CURRENT. =A0The reason why I ask this is
>> because now that aac(4) is sequential scanned, there are a lot of cam
>> interrupts that come in on my 6.x machine where the threshold is only
>> 500 and I get the interrupt storm threshold warning for swi2 pretty
>> quickly:
>>
>> Interrupt storm detected on "swi2:"; throttling interrupt source
>>
>> Obviously its contingent on the number of adapters you have on your
>> system. =A0On CURRENT I didn't see this because the threshold is double
>> (I think its a 1000 by default).
>>
>> The issue is the number of xpt_async(AC_LOST_DEVICE, ..) calls during
>> the scan. =A0The probe sequence in CURRENT as well as 6.1 handles
>> CAM_SEL_TIMEOUT a little differently depending on context.

Yeah I spoke too soon.  I think that is a red herring though and
misinterpretation of what that was really doing (in this case just
seeing the device as unconfigured and moving on).

But I STILL don't understand why its treated as a AC_LOST_DEVICE event
at scan time (i.e. more overhead than really necessary but perhaps I
am not thinking of all the possibilities down this code path, i.e. why
create a path, then call xpt_asyc, all to just set the flag as
unconfigured - perhaps its more align with the model than anything
else and I'm reading too much into it).

> It's not at all clear to me what is going on here. =A0Can you instrument =
the
> code to record the status of everything that is being issued to the aac_c=
am
> module?

Yes surely.   I think what might be happening is that after the
INQUIRY fails, xpt_release_ccb() which I think will also check to see
if any more CCBs should be sent to the device and send them.
Basically the boot -v output is I am getting a CAM_SEL_TIMEOUT for
each target and just hit into the 500 interrupt storm default
threshold on 6.1.

Let me investigate further...I'm on the right track, but I need to
instrument more...Scott its my first time playing with CAM (be
gentle).  :D

-aps



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3c0b01820912160910i35e12112s4d6412d6cb174f3b>