From owner-freebsd-current@FreeBSD.ORG Mon Dec 14 22:09:10 2009 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 29FB61065672; Mon, 14 Dec 2009 22:09:10 +0000 (UTC) (envelope-from pisymbol@gmail.com) Received: from mail-yx0-f171.google.com (mail-yx0-f171.google.com [209.85.210.171]) by mx1.freebsd.org (Postfix) with ESMTP id C765A8FC1F; Mon, 14 Dec 2009 22:09:09 +0000 (UTC) Received: by yxe1 with SMTP id 1so3268916yxe.3 for ; Mon, 14 Dec 2009 14:09:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=TYITU85RbA5zATcKmRKq31IdKLeDXE1KbMXg0bigR6E=; b=OKy2wCUcAimvYFlOp/8knMDwvDxFqs/0LlDSOM6cWUcLn5Fy9ToqRY6Bhj8UBJuilZ nBnM8qZ2XA0LsOs/lGnXkjxcyKeeDW/NIDIZ6z0bIyvoMdL+rHXywXuSif+1yjoU8bVS SoTMigs7P0SlsmN21yrN2+6FnOfFdFhDOQgOQ= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=D3GEPUhxKURWjUjO8wtItNAEyq+JeRVF0h6HENuPEMDFUNb36s97fEbXFf8ApIxXWY Z97C+ghwwRW0NmFcaIxJx0WmjynW9KUXyQ9dXyKSIqYhSIoUpv8h1Ik8KM6TJEhuHznh +6Kn1z+CbAq5DbsGYREwhxDbQapv3Uv0+Gg9I= MIME-Version: 1.0 Received: by 10.101.11.2 with SMTP id o2mr8217574ani.52.1260828548538; Mon, 14 Dec 2009 14:09:08 -0800 (PST) In-Reply-To: <3c0b01820912141347y366a7252y5d9711b1141b9b70@mail.gmail.com> References: <3c0b01820912141347y366a7252y5d9711b1141b9b70@mail.gmail.com> Date: Mon, 14 Dec 2009 17:09:08 -0500 Message-ID: <3c0b01820912141409t74a3554ctd224db485ceeb80c@mail.gmail.com> From: Alexander Sack To: freebsd-current@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: freebsd-scsi@freebsd.org Subject: Re: aac(4) handling of probe when no devices are there X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Dec 2009 22:09:10 -0000 On Mon, Dec 14, 2009 at 4:47 PM, Alexander Sack wrote: > Hello Again: > > I guess I have a technical question/concern that I was looking for > feedback. =A0 During the probe sequence, aac(4) conditionally responds > to INQUIRY commands depending on target LUN: > > aac_cam.c/aac_cam_complete(): > 532 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (command =3D=3D IN= QUIRY) { > 533 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (c= cb->ccb_h.status =3D=3D CAM_REQ_CMP) { > 534 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 devic= e =3D ccb->csio.data_ptr[0] & 0x1f; > 535 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 /* > 536 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0* = We want DASD and PROC devices to only be > 537 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0* = visible through the pass device. > 538 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0*/ > 539 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if ((= device =3D=3D T_DIRECT) || > 540 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 (device =3D=3D T_PROCESSOR) || > 541 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 (sc->flags & AAC_FLAGS_CAM_PASSONLY)) > 542 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 ccb->csio.data_ptr[0] =3D > 543 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 ((device & 0xe0) | T_NODEVICE); > 544 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 } els= e if (ccb->ccb_h.status =3D=3D > CAM_SEL_TIMEOUT && > 545 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 ccb->ccb_h.target_lun !=3D 0) { > 546 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 /* fix for INQUIRYs on Lun>0 */ > 547 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 ccb->ccb_h.status =3D > CAM_DEV_NOT_THERE; > 548 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 } > 549 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 } > > Why is CAM_DEV_NOT_THERE skipped on LUN 0? =A0This is true on my target > 6.1-amd64 machine as well as CURRENT. =A0The reason why I ask this is > because now that aac(4) is sequential scanned, there are a lot of cam > interrupts that come in on my 6.x machine where the threshold is only > 500 and I get the interrupt storm threshold warning for swi2 pretty > quickly: > > Interrupt storm detected on "swi2:"; throttling interrupt source > > Obviously its contingent on the number of adapters you have on your > system. =A0On CURRENT I didn't see this because the threshold is double > (I think its a 1000 by default). > > The issue is the number of xpt_async(AC_LOST_DEVICE, ..) calls during > the scan. =A0The probe sequence in CURRENT as well as 6.1 handles > CAM_SEL_TIMEOUT a little differently depending on context. > > scsi_xpt.c/probedone(): > 1090 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 } else if (cam_periph_error(done_ccb= , 0, > 1091 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 done_ccb->ccb_h.target_lun > 0 > 1092 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 ? SF_RETRY_UA|SF_QUIET_IR > 1093 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 : SF_RETRY_UA, > 1094 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 &softc->saved_ccb) =3D=3D > ERESTART) { > 1095 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 return; > 1096 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 } else if ((done_ccb->ccb_h.status &= CAM_DEV_QFRZN) !=3D 0) { > 1097 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 /* Don't wedge the q= ueue */ > 1098 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 xpt_release_devq(don= e_ccb->ccb_h.path, /*count*/1, > 1099 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0/*run_queue*/TRUE); > 1100 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 } > 1101 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 /* > 1102 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0* If we get to this point, we got= an error status back > 1103 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0* from the inquiry and the error = status doesn't require > 1104 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0* automatically retrying the comm= and. =A0Therefore, the > 1105 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0* inquiry failed. =A0If we had in= quiry information before > 1106 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0* for this device, but this lates= t inquiry command failed, > 1107 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0* the device has probably gone aw= ay. =A0If this device isn't > 1108 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0* already marked unconfigured, no= tify the peripheral > 1109 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0* drivers that this device is no = more. > 1110 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0*/ > 1111 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if ((path->device->flags & CAM_DEV_U= NCONFIGURED) =3D=3D 0) > 1112 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 /* Send the async no= tification. */ > 1113 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 xpt_async(AC_LOST_DE= VICE, path, NULL); > 1114 > 1115 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 xpt_release_ccb(done_ccb); > 1116 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 break; > 1117 =A0 =A0 =A0 =A0 } > > But on cam_periph_error(), this will issue a xpt_async(AC_LOST_DEVICE, > path, NULL) regardless of whether or not the device has been scene > already (as per the comment above), i.e. on every initial bus scan, > you will get into (on an aac(4) card with LUN > 0): > > cam_periph.c/cam_periph_error(): > 1697 =A0 =A0 =A0 =A0 case CAM_SEL_TIMEOUT: > 1698 =A0 =A0 =A0 =A0 { > . > . > 1729 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 /* > 1730 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0* Let peripheral drivers know tha= t this device has gone > 1731 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0* away. > 1732 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0*/ > 1733 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 xpt_async(AC_LOST_DEVICE, newpath, N= ULL); > 1734 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 xpt_free_path(newpath); > 1735 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 break; > > Is this really right? This generates A LOT of interrupts noise when no > devices are attached during the initial scan, i.e. we are treating the > initial scan of failed INQUIRY commands on the SCSI BUS as if we > really lost a device during a selection timeout. =A0(we even generate a > path to issue the async event). I should have properly titled the thread a little bit better, but basically we always generate a ton of software CAM interrupts during a LUN scan for targets on aac(4) that do not really exist (i.e. nothing is truly there). We do this because we treat the initial INQUIRY sent down equivalent to a selection timeout instead of the device is not really there. There seems to be an historical workaround for part of this issue but I am trying to delve deeper in order to do the *right thing* for our 6.1 deployments (as well as 7.x and CURRENT). -aps