From owner-freebsd-scsi@FreeBSD.ORG Mon Dec 14 21:47:05 2009 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5E26F106566C; Mon, 14 Dec 2009 21:47:05 +0000 (UTC) (envelope-from pisymbol@gmail.com) Received: from mail-yw0-f172.google.com (mail-yw0-f172.google.com [209.85.211.172]) by mx1.freebsd.org (Postfix) with ESMTP id 0C02D8FC19; Mon, 14 Dec 2009 21:47:04 +0000 (UTC) Received: by ywh2 with SMTP id 2so3524453ywh.27 for ; Mon, 14 Dec 2009 13:47:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:date:message-id:subject :from:to:cc:content-type; bh=j+ln1p0D6G0czQUoJFXdLZxwcB+O9d+AqHR2v9RN9LE=; b=iLKD2D0jbLxk0Zk/iITIbQPGgr2/DGIOAaa9DR8k6jm4+Y5t5tNNL27Wn5anxkiK9u Sq62WUHqRvM0dN1Ch2NV1QKnPVzFFWkxIXyL6tKswTLM/dJN81O+RQqIP5tCVAXnshpb FSoc2noJo/rphEtHPqsbYodbBbbVT9D7UB9S8= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:cc:content-type; b=vBfj4MYEaogCmkbr8JWx30Qa77c1f4uxiiQVjg0Q0H57Lnci0rWNDcZx3eQ1GNMLre lRGtM0AURAWfQQSIz6pL7M/jVBghYJwsqOeoVnyZ155EGMg0n+qDGHwe3PMYkYlnyj1w e4STyWENLo9OVMSpMLrUG2GP+5AEJIzi4ZvVg= MIME-Version: 1.0 Received: by 10.101.4.27 with SMTP id g27mr8053749ani.100.1260827224252; Mon, 14 Dec 2009 13:47:04 -0800 (PST) Date: Mon, 14 Dec 2009 16:47:04 -0500 Message-ID: <3c0b01820912141347y366a7252y5d9711b1141b9b70@mail.gmail.com> From: Alexander Sack To: freebsd-current@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 Cc: freebsd-scsi@freebsd.org Subject: aac(4) handling of probe when no devices are there X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Dec 2009 21:47:05 -0000 Hello Again: I guess I have a technical question/concern that I was looking for feedback. During the probe sequence, aac(4) conditionally responds to INQUIRY commands depending on target LUN: aac_cam.c/aac_cam_complete(): 532 if (command == INQUIRY) { 533 if (ccb->ccb_h.status == CAM_REQ_CMP) { 534 device = ccb->csio.data_ptr[0] & 0x1f; 535 /* 536 * We want DASD and PROC devices to only be 537 * visible through the pass device. 538 */ 539 if ((device == T_DIRECT) || 540 (device == T_PROCESSOR) || 541 (sc->flags & AAC_FLAGS_CAM_PASSONLY)) 542 ccb->csio.data_ptr[0] = 543 ((device & 0xe0) | T_NODEVICE); 544 } else if (ccb->ccb_h.status == CAM_SEL_TIMEOUT && 545 ccb->ccb_h.target_lun != 0) { 546 /* fix for INQUIRYs on Lun>0 */ 547 ccb->ccb_h.status = CAM_DEV_NOT_THERE; 548 } 549 } Why is CAM_DEV_NOT_THERE skipped on LUN 0? This is true on my target 6.1-amd64 machine as well as CURRENT. The reason why I ask this is because now that aac(4) is sequential scanned, there are a lot of cam interrupts that come in on my 6.x machine where the threshold is only 500 and I get the interrupt storm threshold warning for swi2 pretty quickly: Interrupt storm detected on "swi2:"; throttling interrupt source Obviously its contingent on the number of adapters you have on your system. On CURRENT I didn't see this because the threshold is double (I think its a 1000 by default). The issue is the number of xpt_async(AC_LOST_DEVICE, ..) calls during the scan. The probe sequence in CURRENT as well as 6.1 handles CAM_SEL_TIMEOUT a little differently depending on context. scsi_xpt.c/probedone(): 1090 } else if (cam_periph_error(done_ccb, 0, 1091 done_ccb->ccb_h.target_lun > 0 1092 ? SF_RETRY_UA|SF_QUIET_IR 1093 : SF_RETRY_UA, 1094 &softc->saved_ccb) == ERESTART) { 1095 return; 1096 } else if ((done_ccb->ccb_h.status & CAM_DEV_QFRZN) != 0) { 1097 /* Don't wedge the queue */ 1098 xpt_release_devq(done_ccb->ccb_h.path, /*count*/1, 1099 /*run_queue*/TRUE); 1100 } 1101 /* 1102 * If we get to this point, we got an error status back 1103 * from the inquiry and the error status doesn't require 1104 * automatically retrying the command. Therefore, the 1105 * inquiry failed. If we had inquiry information before 1106 * for this device, but this latest inquiry command failed, 1107 * the device has probably gone away. If this device isn't 1108 * already marked unconfigured, notify the peripheral 1109 * drivers that this device is no more. 1110 */ 1111 if ((path->device->flags & CAM_DEV_UNCONFIGURED) == 0) 1112 /* Send the async notification. */ 1113 xpt_async(AC_LOST_DEVICE, path, NULL); 1114 1115 xpt_release_ccb(done_ccb); 1116 break; 1117 } But on cam_periph_error(), this will issue a xpt_async(AC_LOST_DEVICE, path, NULL) regardless of whether or not the device has been scene already (as per the comment above), i.e. on every initial bus scan, you will get into (on an aac(4) card with LUN > 0): cam_periph.c/cam_periph_error(): 1697 case CAM_SEL_TIMEOUT: 1698 { . . 1729 /* 1730 * Let peripheral drivers know that this device has gone 1731 * away. 1732 */ 1733 xpt_async(AC_LOST_DEVICE, newpath, NULL); 1734 xpt_free_path(newpath); 1735 break; Is this really right? This generates A LOT of interrupts noise when no devices are attached during the initial scan, i.e. we are treating the initial scan of failed INQUIRY commands on the SCSI BUS as if we really lost a device during a selection timeout. (we even generate a path to issue the async event). Obviously if aac(4) returned CAM_NO_DEVICE_THERE you avoid this but there is some history here and I've yet to fully grasp the intent of the original fix on LUNs greater than zero. What was the problem? Comments/thoughts appreciated? Thanks! -aps