Date: Wed, 16 Dec 2009 12:10:59 -0500 From: Alexander Sack <pisymbol@gmail.com> To: Scott Long <scottl@samsco.org> Cc: freebsd-scsi@freebsd.org, freebsd-current@freebsd.org Subject: Re: aac(4) handling of probe when no devices are there Message-ID: <3c0b01820912160910i35e12112s4d6412d6cb174f3b@mail.gmail.com> In-Reply-To: <978BBD51-222D-42F0-9D3A-FFACCBCC886D@samsco.org> References: <3c0b01820912141347y366a7252y5d9711b1141b9b70@mail.gmail.com> <978BBD51-222D-42F0-9D3A-FFACCBCC886D@samsco.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Dec 15, 2009 at 4:54 AM, Scott Long <scottl@samsco.org> wrote: > On Dec 14, 2009, at 2:47 PM, Alexander Sack wrote: >> >> Hello Again: >> >> I guess I have a technical question/concern that I was looking for >> feedback. =A0 During the probe sequence, aac(4) conditionally responds >> to INQUIRY commands depending on target LUN: >> >> aac_cam.c/aac_cam_complete(): >> 532 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (command =3D=3D I= NQUIRY) { >> 533 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (= ccb->ccb_h.status =3D=3D CAM_REQ_CMP) >> { >> 534 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 devi= ce =3D ccb->csio.data_ptr[0] & 0x1f; >> 535 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 /* >> 536 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0*= We want DASD and PROC devices to >> only be >> 537 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0*= visible through the pass device. >> 538 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0*= / >> 539 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (= (device =3D=3D T_DIRECT) || >> 540 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 (device =3D=3D T_PROCESSOR) || >> 541 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 (sc->flags & >> AAC_FLAGS_CAM_PASSONLY)) >> 542 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 ccb->csio.data_ptr[0] =3D >> 543 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 ((device & 0xe0) | >> T_NODEVICE); >> 544 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 } el= se if (ccb->ccb_h.status =3D=3D >> CAM_SEL_TIMEOUT && >> 545 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 ccb->ccb_h.target_lun !=3D 0) { >> 546 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 /* fix for INQUIRYs on Lun>0 >> */ >> 547 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 ccb->ccb_h.status =3D >> CAM_DEV_NOT_THERE; >> 548 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 } >> 549 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 } >> >> Why is CAM_DEV_NOT_THERE skipped on LUN 0? > > In the parallel scsi world, a selection timeout means that all LUNs withi= n > the entire target =A0do not (or no longer) exist. =A0So returning > CAM_SEL_TIMEOUT for LUN 1 would tell CAM to invalidate LUN 0 as well. > > If you look higher up in this function, you'll see a note about the > error/status codes from the AAC firmware coincidentally matching CAM's > status codes. =A0My guess is that somewhere along the line, someone at Ad= aptec > stopped reading the SCSI spec and starting returning CAM_SEL_TIMEOUT for > LUNs greater than 0, which is why this work-around is now in the driver. Interesting. Learn something everyday. I did not know that a selection timeout on a non-zero LUN meant no other LUN was available. As a colleague noted, "Has Adaptec ever read the SCSI spec?" Just kidding (somewhat).... >> =A0This is true on my target >> 6.1-amd64 machine as well as CURRENT. =A0The reason why I ask this is >> because now that aac(4) is sequential scanned, there are a lot of cam >> interrupts that come in on my 6.x machine where the threshold is only >> 500 and I get the interrupt storm threshold warning for swi2 pretty >> quickly: >> >> Interrupt storm detected on "swi2:"; throttling interrupt source >> >> Obviously its contingent on the number of adapters you have on your >> system. =A0On CURRENT I didn't see this because the threshold is double >> (I think its a 1000 by default). >> >> The issue is the number of xpt_async(AC_LOST_DEVICE, ..) calls during >> the scan. =A0The probe sequence in CURRENT as well as 6.1 handles >> CAM_SEL_TIMEOUT a little differently depending on context. Yeah I spoke too soon. I think that is a red herring though and misinterpretation of what that was really doing (in this case just seeing the device as unconfigured and moving on). But I STILL don't understand why its treated as a AC_LOST_DEVICE event at scan time (i.e. more overhead than really necessary but perhaps I am not thinking of all the possibilities down this code path, i.e. why create a path, then call xpt_asyc, all to just set the flag as unconfigured - perhaps its more align with the model than anything else and I'm reading too much into it). > It's not at all clear to me what is going on here. =A0Can you instrument = the > code to record the status of everything that is being issued to the aac_c= am > module? Yes surely. I think what might be happening is that after the INQUIRY fails, xpt_release_ccb() which I think will also check to see if any more CCBs should be sent to the device and send them. Basically the boot -v output is I am getting a CAM_SEL_TIMEOUT for each target and just hit into the 500 interrupt storm default threshold on 6.1. Let me investigate further...I'm on the right track, but I need to instrument more...Scott its my first time playing with CAM (be gentle). :D -aps
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3c0b01820912160910i35e12112s4d6412d6cb174f3b>