From owner-svn-src-all@FreeBSD.ORG Mon May 21 22:54:34 2012 Return-Path: Delivered-To: svn-src-all@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9D4CA1065670; Mon, 21 May 2012 22:54:34 +0000 (UTC) (envelope-from jimharris@FreeBSD.org) Received: from svn.freebsd.org (svn.freebsd.org [IPv6:2001:4f8:fff6::2c]) by mx1.freebsd.org (Postfix) with ESMTP id 873928FC19; Mon, 21 May 2012 22:54:34 +0000 (UTC) Received: from svn.freebsd.org (localhost [127.0.0.1]) by svn.freebsd.org (8.14.4/8.14.4) with ESMTP id q4LMsYdq077180; Mon, 21 May 2012 22:54:34 GMT (envelope-from jimharris@svn.freebsd.org) Received: (from jimharris@localhost) by svn.freebsd.org (8.14.4/8.14.4/Submit) id q4LMsYhg077174; Mon, 21 May 2012 22:54:34 GMT (envelope-from jimharris@svn.freebsd.org) Message-Id: <201205212254.q4LMsYhg077174@svn.freebsd.org> From: Jim Harris Date: Mon, 21 May 2012 22:54:34 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org X-SVN-Group: head MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Cc: Subject: svn commit: r235751 - head/sys/dev/isci X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 21 May 2012 22:54:34 -0000 Author: jimharris Date: Mon May 21 22:54:33 2012 New Revision: 235751 URL: http://svn.freebsd.org/changeset/base/235751 Log: Wait until completion context unwinds before retrying CCBs that have been queued internally. This works around issue in the isci HAL where it cannot accept new I/O to a device after a resetting->ready state transition until the completion context has unwound. This issue was found by submitting non-tagged CCBs through pass(4) interface to a SATA disk with an extremely small timeout value (5ms). This would trigger internal resets with I/O in the isci(4) internal queues. The small timeout value had not been intentional (and original reporter has since changed his test to use 5sec instead), but it did uncover this corner case that would result in a hung disk. Sponsored by: Intel Reported and tested by: Ravi Pokala Reviewed by: scottl (earlier version) MFC after: 1 week Modified: head/sys/dev/isci/isci.h head/sys/dev/isci/isci_controller.c head/sys/dev/isci/isci_interrupt.c head/sys/dev/isci/isci_io_request.c head/sys/dev/isci/isci_remote_device.c Modified: head/sys/dev/isci/isci.h ============================================================================== --- head/sys/dev/isci/isci.h Mon May 21 22:43:38 2012 (r235750) +++ head/sys/dev/isci/isci.h Mon May 21 22:54:33 2012 (r235751) @@ -30,6 +30,9 @@ * $FreeBSD$ */ +#ifndef _ISCI_H +#define _ISCI_H + #include #include #include @@ -86,7 +89,31 @@ struct ISCI_REMOTE_DEVICE { BOOL is_resetting; uint32_t frozen_lun_mask; SCI_FAST_LIST_ELEMENT_T pending_device_reset_element; + + /* + * This queue maintains CCBs that have been returned with + * SCI_IO_FAILURE_INVALID_STATE from the SCI layer. These CCBs + * need to be retried, but we cannot return CAM_REQUEUE_REQ because + * this status gets passed all the way back up to users of the pass(4) + * interface and breaks things like smartctl. So instead, we queue + * these CCBs internally. + */ TAILQ_HEAD(,ccb_hdr) queued_ccbs; + + /* + * Marker denoting this remote device needs its first queued ccb to + * be retried. + */ + BOOL release_queued_ccb; + + /* + * Points to a CCB in the queue that is currently being processed by + * SCIL. This allows us to keep in flight CCBs in the queue so as to + * maintain ordering (i.e. in case we retry an I/O and then find out + * it needs to be retried again - it just keeps its same place in the + * queue. + */ + union ccb * queued_ccb_in_progress; }; struct ISCI_DOMAIN { @@ -126,6 +153,7 @@ struct ISCI_CONTROLLER BOOL has_been_scanned; uint32_t initial_discovery_mask; BOOL is_frozen; + BOOL release_queued_ccbs; uint8_t *remote_device_memory; struct ISCI_MEMORY cached_controller_memory; struct ISCI_MEMORY uncached_controller_memory; @@ -291,6 +319,8 @@ int isci_controller_attach_to_cam(struct void isci_controller_start(void *controller); +void isci_controller_release_queued_ccbs(struct ISCI_CONTROLLER *controller); + void isci_domain_construct(struct ISCI_DOMAIN *domain, uint32_t domain_index, struct ISCI_CONTROLLER *controller); @@ -301,3 +331,5 @@ void isci_log_message(uint32_t verbosity char *log_message, ...); extern uint32_t g_isci_debug_level; + +#endif /* #ifndef _ISCI_H */ Modified: head/sys/dev/isci/isci_controller.c ============================================================================== --- head/sys/dev/isci/isci_controller.c Mon May 21 22:43:38 2012 (r235750) +++ head/sys/dev/isci/isci_controller.c Mon May 21 22:54:33 2012 (r235751) @@ -201,6 +201,7 @@ void isci_controller_construct(struct IS controller->is_started = FALSE; controller->is_frozen = FALSE; + controller->release_queued_ccbs = FALSE; controller->sim = NULL; controller->initial_discovery_mask = 0; @@ -431,6 +432,8 @@ int isci_controller_allocate_memory(stru sci_fast_list_element_init(remote_device, &remote_device->pending_device_reset_element); TAILQ_INIT(&remote_device->queued_ccbs); + remote_device->release_queued_ccb = FALSE; + remote_device->queued_ccb_in_progress = NULL; /* * For the first SCI_MAX_DOMAINS device objects, do not put @@ -694,3 +697,47 @@ void isci_action(struct cam_sim *sim, un } } +/* + * Unfortunately, SCIL doesn't cleanly handle retry conditions. + * CAM_REQUEUE_REQ works only when no one is using the pass(4) interface. So + * when SCIL denotes an I/O needs to be retried (typically because of mixing + * tagged/non-tagged ATA commands, or running out of NCQ slots), we queue + * these I/O internally. Once SCIL completes an I/O to this device, or we get + * a ready notification, we will retry the first I/O on the queue. + * Unfortunately, SCIL also doesn't cleanly handle starting the new I/O within + * the context of the completion handler, so we need to retry these I/O after + * the completion handler is done executing. + */ +void +isci_controller_release_queued_ccbs(struct ISCI_CONTROLLER *controller) +{ + struct ISCI_REMOTE_DEVICE *dev; + struct ccb_hdr *ccb_h; + int dev_idx; + + KASSERT(mtx_owned(&controller->lock), ("controller lock not owned")); + + controller->release_queued_ccbs = FALSE; + for (dev_idx = 0; + dev_idx < SCI_MAX_REMOTE_DEVICES; + dev_idx++) { + + dev = controller->remote_device[dev_idx]; + if (dev != NULL && + dev->release_queued_ccb == TRUE && + dev->queued_ccb_in_progress == NULL) { + dev->release_queued_ccb = FALSE; + ccb_h = TAILQ_FIRST(&dev->queued_ccbs); + + if (ccb_h == NULL) + continue; + + isci_log_message(1, "ISCI", "release %p %x\n", ccb_h, + ((union ccb *)ccb_h)->csio.cdb_io.cdb_bytes[0]); + + dev->queued_ccb_in_progress = (union ccb *)ccb_h; + isci_io_request_execute_scsi_io( + (union ccb *)ccb_h, controller); + } + } +} Modified: head/sys/dev/isci/isci_interrupt.c ============================================================================== --- head/sys/dev/isci/isci_interrupt.c Mon May 21 22:43:38 2012 (r235750) +++ head/sys/dev/isci/isci_interrupt.c Mon May 21 22:54:33 2012 (r235751) @@ -177,6 +177,9 @@ isci_interrupt_legacy_handler(void *arg) if (interrupt_handler(scic_controller_handle)) { mtx_lock(&controller->lock); completion_handler(scic_controller_handle); + if (controller->release_queued_ccbs == TRUE) + isci_controller_release_queued_ccbs( + controller); mtx_unlock(&controller->lock); } } @@ -204,6 +207,13 @@ isci_interrupt_msix_handler(void *arg) if (interrupt_handler(scic_controller_handle)) { mtx_lock(&controller->lock); completion_handler(scic_controller_handle); + /* + * isci_controller_release_queued_ccb() is a relatively + * expensive routine, so we don't call it until the controller + * level flag is set to TRUE. + */ + if (controller->release_queued_ccbs == TRUE) + isci_controller_release_queued_ccbs(controller); mtx_unlock(&controller->lock); } } Modified: head/sys/dev/isci/isci_io_request.c ============================================================================== --- head/sys/dev/isci/isci_io_request.c Mon May 21 22:43:38 2012 (r235750) +++ head/sys/dev/isci/isci_io_request.c Mon May 21 22:54:33 2012 (r235751) @@ -223,7 +223,7 @@ isci_io_request_complete(SCI_CONTROLLER_ (struct ISCI_REQUEST *)isci_request); if (complete_ccb) { - if (ccb->ccb_h.status != CAM_REQ_CMP) { + if ((ccb->ccb_h.status & CAM_STATUS_MASK) != CAM_REQ_CMP) { /* ccb will be completed with some type of non-success * status. So temporarily freeze the queue until the * upper layers can act on the status. The @@ -234,6 +234,26 @@ isci_io_request_complete(SCI_CONTROLLER_ xpt_freeze_devq(ccb->ccb_h.path, 1); } + if (ccb->ccb_h.status & CAM_SIM_QUEUED) { + + KASSERT(ccb == isci_remote_device->queued_ccb_in_progress, + ("multiple internally queued ccbs in flight")); + + TAILQ_REMOVE(&isci_remote_device->queued_ccbs, + &ccb->ccb_h, sim_links.tqe); + ccb->ccb_h.status &= ~CAM_SIM_QUEUED; + + /* + * This CCB that was in the queue was completed, so + * set the in_progress pointer to NULL denoting that + * we can retry another CCB from the queue. We only + * allow one CCB at a time from the queue to be + * in progress so that we can effectively maintain + * ordering. + */ + isci_remote_device->queued_ccb_in_progress = NULL; + } + if (isci_remote_device->frozen_lun_mask != 0) { isci_remote_device_release_device_queue(isci_remote_device); } @@ -248,11 +268,30 @@ isci_io_request_complete(SCI_CONTROLLER_ isci_remote_device_freeze_lun_queue(isci_remote_device, ccb->ccb_h.target_lun); - isci_log_message(1, "ISCI", "queue %p %x\n", ccb, - ccb->csio.cdb_io.cdb_bytes[0]); - ccb->ccb_h.status |= CAM_SIM_QUEUED; - TAILQ_INSERT_TAIL(&isci_remote_device->queued_ccbs, - &ccb->ccb_h, sim_links.tqe); + if (ccb->ccb_h.status & CAM_SIM_QUEUED) { + + KASSERT(ccb == isci_remote_device->queued_ccb_in_progress, + ("multiple internally queued ccbs in flight")); + + /* + * Do nothing, CCB is already on the device's queue. + * We leave it on the queue, to be retried again + * next time a CCB on this device completes, or we + * get a ready notification for this device. + */ + isci_log_message(1, "ISCI", "already queued %p %x\n", + ccb, ccb->csio.cdb_io.cdb_bytes[0]); + + isci_remote_device->queued_ccb_in_progress = NULL; + + } else { + isci_log_message(1, "ISCI", "queue %p %x\n", ccb, + ccb->csio.cdb_io.cdb_bytes[0]); + ccb->ccb_h.status |= CAM_SIM_QUEUED; + + TAILQ_INSERT_TAIL(&isci_remote_device->queued_ccbs, + &ccb->ccb_h, sim_links.tqe); + } } } Modified: head/sys/dev/isci/isci_remote_device.c ============================================================================== --- head/sys/dev/isci/isci_remote_device.c Mon May 21 22:43:38 2012 (r235750) +++ head/sys/dev/isci/isci_remote_device.c Mon May 21 22:54:33 2012 (r235751) @@ -297,14 +297,18 @@ isci_remote_device_release_device_queue( for (lun = 0; lun < ISCI_MAX_LUN; lun++) isci_remote_device_release_lun_queue(device, lun); } else { - struct ccb_hdr *ccb_h; + /* + * We cannot unfreeze the devq, because there are still + * CCBs in our internal queue that need to be processed + * first. Mark this device, and the controller, so that + * the first CCB in this device's internal queue will be + * resubmitted after the current completion context + * unwinds. + */ + device->release_queued_ccb = TRUE; + device->domain->controller->release_queued_ccbs = TRUE; - ccb_h = TAILQ_FIRST(&device->queued_ccbs); - TAILQ_REMOVE(&device->queued_ccbs, ccb_h, sim_links.tqe); - ccb_h->status &= ~CAM_SIM_QUEUED; - isci_log_message(1, "ISCI", "release %p %x\n", ccb_h, - ((union ccb*)ccb_h)->csio.cdb_io.cdb_bytes[0]); - isci_io_request_execute_scsi_io((union ccb *)ccb_h, - device->domain->controller); + isci_log_message(1, "ISCI", "schedule %p for release\n", + device); } }