From owner-svn-src-stable-12@freebsd.org Thu May 28 23:23:50 2020 Return-Path: Delivered-To: svn-src-stable-12@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 88C2D328B14; Thu, 28 May 2020 23:23:50 +0000 (UTC) (envelope-from imp@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 49Y3ft3NcKz4N8r; Thu, 28 May 2020 23:23:50 +0000 (UTC) (envelope-from imp@FreeBSD.org) Received: from repo.freebsd.org (repo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 54F8DE6BF; Thu, 28 May 2020 23:23:50 +0000 (UTC) (envelope-from imp@FreeBSD.org) Received: from repo.freebsd.org ([127.0.1.37]) by repo.freebsd.org (8.15.2/8.15.2) with ESMTP id 04SNNonk043557; Thu, 28 May 2020 23:23:50 GMT (envelope-from imp@FreeBSD.org) Received: (from imp@localhost) by repo.freebsd.org (8.15.2/8.15.2/Submit) id 04SNNnFX043554; Thu, 28 May 2020 23:23:49 GMT (envelope-from imp@FreeBSD.org) Message-Id: <202005282323.04SNNnFX043554@repo.freebsd.org> X-Authentication-Warning: repo.freebsd.org: imp set sender to imp@FreeBSD.org using -f From: Warner Losh Date: Thu, 28 May 2020 23:23:49 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-stable@freebsd.org, svn-src-stable-12@freebsd.org Subject: svn commit: r361608 - in stable/12/sys/dev: mpr mps X-SVN-Group: stable-12 X-SVN-Commit-Author: imp X-SVN-Commit-Paths: in stable/12/sys/dev: mpr mps X-SVN-Commit-Revision: 361608 X-SVN-Commit-Repository: base MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: svn-src-stable-12@freebsd.org X-Mailman-Version: 2.1.33 Precedence: list List-Id: SVN commit messages for only the 12-stable src tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 28 May 2020 23:23:50 -0000 Author: imp Date: Thu May 28 23:23:49 2020 New Revision: 361608 URL: https://svnweb.freebsd.org/changeset/base/361608 Log: MFC r358308: Before issing the REMOVE_DEVICE command to the firmware, make sure that all commands have completed. It's not OK to force complete any pending commands before we send the REMOVE_DEVICE. Instead, make sure that all pending commands are complete before sending that. By trying to second guess the firmware here, we run the risk of completing commands twice, which leads to corruption. This removes the forced completion of commands introduced in r218811. So it's a partial backout of that commit, but replaces it with a more rebust mechanism. Either these commands will complete due to the TARGET RESET, or they will timeout and be aborted, but they will all complete. Add assert that all commands are complete to REMOVE_DEVICE completion routine. We attempt to assure this programatically, so we shouldn't have any commands in the queue because we've waited for them all. Any commands that make it into our action routine after we mark the target in removal will complete immediately with an error. When we're removing a target that's not a volume, advertise up the stack that it's actually gone, as opposed to having a transient selection error we should retry. Do this both in the action routine, and when we get a notification of an aborted command. We don't do this for volumes because the driver tries hard not to advertise to the OS a volume has disappeared. Apply these changes to both mpr and mps since they are based on quite similar designs. Discussed with: scottl@ Differential Revision: https://reviews.freebsd.org/D23768 Modified: stable/12/sys/dev/mpr/mpr_sas.c stable/12/sys/dev/mpr/mpr_sas.h stable/12/sys/dev/mps/mps_sas.c stable/12/sys/dev/mps/mps_sas.h Directory Properties: stable/12/ (props changed) Modified: stable/12/sys/dev/mpr/mpr_sas.c ============================================================================== --- stable/12/sys/dev/mpr/mpr_sas.c Thu May 28 22:05:50 2020 (r361607) +++ stable/12/sys/dev/mpr/mpr_sas.c Thu May 28 23:23:49 2020 (r361608) @@ -559,7 +559,6 @@ mprsas_remove_device(struct mpr_softc *sc, struct mpr_ MPI2_SCSI_TASK_MANAGE_REPLY *reply; MPI2_SAS_IOUNIT_CONTROL_REQUEST *req; struct mprsas_target *targ; - struct mpr_command *next_cm; uint16_t handle; MPR_FUNCTRACE(sc); @@ -609,7 +608,18 @@ mprsas_remove_device(struct mpr_softc *sc, struct mpr_ tm->cm_complete = mprsas_remove_complete; tm->cm_complete_data = (void *)(uintptr_t)handle; - mpr_map_command(sc, tm); + /* + * Wait to send the REMOVE_DEVICE until all the commands have cleared. + * They should be aborted or time out and we'll kick thus off there + * if so. + */ + if (TAILQ_FIRST(&targ->commands) == NULL) { + mpr_dprint(sc, MPR_INFO, "No pending commands: starting remove_device\n"); + mpr_map_command(sc, tm); + targ->pending_remove_tm = NULL; + } else { + targ->pending_remove_tm = tm; + } mpr_dprint(sc, MPR_INFO, "clearing target %u handle 0x%04x\n", targ->tid, handle); @@ -618,14 +628,6 @@ mprsas_remove_device(struct mpr_softc *sc, struct mpr_ "connector name (%4s)\n", targ->encl_level, targ->encl_slot, targ->connector_name); } - TAILQ_FOREACH_SAFE(tm, &targ->commands, cm_link, next_cm) { - union ccb *ccb; - - mpr_dprint(sc, MPR_XINFO, "Completing missed command %p\n", tm); - ccb = tm->cm_complete_data; - mprsas_set_ccbstatus(ccb, CAM_DEV_NOT_THERE); - mprsas_scsiio_complete(sc, tm); - } } static void @@ -641,7 +643,16 @@ mprsas_remove_complete(struct mpr_softc *sc, struct mp reply = (MPI2_SAS_IOUNIT_CONTROL_REPLY *)tm->cm_reply; handle = (uint16_t)(uintptr_t)tm->cm_complete_data; + targ = tm->cm_targ; + /* + * At this point, we should have no pending commands for the target. + * The remove target has just completed. + */ + KASSERT(TAILQ_FIRST(&targ->commands) == NULL, + ("%s: no commands should be pending\n", __func__)); + + /* * Currently there should be no way we can hit this case. It only * happens when we have a failure to allocate chain frames, and * task management commands don't have S/G lists. @@ -673,7 +684,6 @@ mprsas_remove_complete(struct mpr_softc *sc, struct mp */ if ((le16toh(reply->IOCStatus) & MPI2_IOCSTATUS_MASK) == MPI2_IOCSTATUS_SUCCESS) { - targ = tm->cm_targ; targ->handle = 0x0; targ->encl_handle = 0x0; targ->encl_level_valid = 0x0; @@ -1963,13 +1973,17 @@ mprsas_action_scsiio(struct mprsas_softc *sassc, union /* * If devinfo is 0 this will be a volume. In that case don't tell CAM * that the volume has timed out. We want volumes to be enumerated - * until they are deleted/removed, not just failed. + * until they are deleted/removed, not just failed. In either event, + * we're removing the target due to a firmware event telling us + * the device is now gone (as opposed to some transient event). Since + * we're opting to remove failed devices from the OS's view, we need + * to propagate that status up the stack. */ if (targ->flags & MPRSAS_TARGET_INREMOVAL) { if (targ->devinfo == 0) mprsas_set_ccbstatus(ccb, CAM_REQ_CMP); else - mprsas_set_ccbstatus(ccb, CAM_SEL_TIMEOUT); + mprsas_set_ccbstatus(ccb, CAM_DEV_NOT_THERE); xpt_done(ccb); return; } @@ -2843,15 +2857,22 @@ mprsas_scsiio_complete(struct mpr_softc *sc, struct mp * potential risk of flagging false failures in the event * of a topology-related error (e.g. a SAS expander problem * causes a command addressed to a drive to fail), but - * avoiding getting into an infinite retry loop. + * avoiding getting into an infinite retry loop. However, + * if we get them while were moving a device, we should + * fail the request as 'not there' because the device + * is effectively gone. */ - mprsas_set_ccbstatus(ccb, CAM_REQ_CMP_ERR); + if (cm->cm_targ->flags & MPRSAS_TARGET_INREMOVAL) + mprsas_set_ccbstatus(ccb, CAM_DEV_NOT_THERE); + else + mprsas_set_ccbstatus(ccb, CAM_REQ_CMP_ERR); mpr_dprint(sc, MPR_INFO, - "Controller reported %s tgt %u SMID %u loginfo %x\n", + "Controller reported %s tgt %u SMID %u loginfo %x%s\n", mpr_describe_table(mpr_iocstatus_string, le16toh(rep->IOCStatus) & MPI2_IOCSTATUS_MASK), target_id, cm->cm_desc.Default.SMID, - le32toh(rep->IOCLogInfo)); + le32toh(rep->IOCLogInfo), + (cm->cm_targ->flags & MPRSAS_TARGET_INREMOVAL) ? " departing" : ""); mpr_dprint(sc, MPR_XINFO, "SCSIStatus %x SCSIState %x xfercount %u\n", rep->SCSIStatus, rep->SCSIState, @@ -2897,6 +2918,21 @@ mprsas_scsiio_complete(struct mpr_softc *sc, struct mp if (mprsas_get_ccbstatus(ccb) != CAM_REQ_CMP) { ccb->ccb_h.status |= CAM_DEV_QFRZN; xpt_freeze_devq(ccb->ccb_h.path, /*count*/ 1); + } + + /* + * Check to see if we're removing the device. If so, and this is the + * last command on the queue, proceed with the deferred removal of the + * device. Note, for removing a volume, this won't trigger because + * pending_remove_tm will be NULL. + */ + if (cm->cm_targ->flags & MPRSAS_TARGET_INREMOVAL) { + if (TAILQ_FIRST(&cm->cm_targ->commands) == NULL && + cm->cm_targ->pending_remove_tm != NULL) { + mpr_dprint(sc, MPR_INFO, "Last pending command complete: starting remove_device\n"); + mpr_map_command(sc, cm->cm_targ->pending_remove_tm); + cm->cm_targ->pending_remove_tm = NULL; + } } mpr_free_command(sc, cm); Modified: stable/12/sys/dev/mpr/mpr_sas.h ============================================================================== --- stable/12/sys/dev/mpr/mpr_sas.h Thu May 28 22:05:50 2020 (r361607) +++ stable/12/sys/dev/mpr/mpr_sas.h Thu May 28 23:23:49 2020 (r361608) @@ -64,6 +64,7 @@ struct mprsas_target { SLIST_HEAD(, mprsas_lun) luns; TAILQ_HEAD(, mpr_command) commands; struct mpr_command *tm; + struct mpr_command *pending_remove_tm; TAILQ_HEAD(, mpr_command) timedout_commands; uint16_t exp_dev_handle; uint16_t phy_num; Modified: stable/12/sys/dev/mps/mps_sas.c ============================================================================== --- stable/12/sys/dev/mps/mps_sas.c Thu May 28 22:05:50 2020 (r361607) +++ stable/12/sys/dev/mps/mps_sas.c Thu May 28 23:23:49 2020 (r361608) @@ -556,7 +556,6 @@ mpssas_remove_device(struct mps_softc *sc, struct mps_ MPI2_SCSI_TASK_MANAGE_REPLY *reply; MPI2_SAS_IOUNIT_CONTROL_REQUEST *req; struct mpssas_target *targ; - struct mps_command *next_cm; uint16_t handle; MPS_FUNCTRACE(sc); @@ -609,18 +608,22 @@ mpssas_remove_device(struct mps_softc *sc, struct mps_ tm->cm_complete = mpssas_remove_complete; tm->cm_complete_data = (void *)(uintptr_t)handle; - mps_map_command(sc, tm); + /* + * Wait to send the REMOVE_DEVICE until all the commands have cleared. + * They should be aborted or time out and we'll kick thus off there + * if so. + */ + if (TAILQ_FIRST(&targ->commands) == NULL) { + mps_dprint(sc, MPS_INFO, "No pending commands: starting remove_device\n"); + mps_map_command(sc, tm); + targ->pending_remove_tm = NULL; + } else { + targ->pending_remove_tm = tm; + } + mps_dprint(sc, MPS_XINFO, "clearing target %u handle 0x%04x\n", targ->tid, handle); - TAILQ_FOREACH_SAFE(tm, &targ->commands, cm_link, next_cm) { - union ccb *ccb; - - mps_dprint(sc, MPS_XINFO, "Completing missed command %p\n", tm); - ccb = tm->cm_complete_data; - mpssas_set_ccbstatus(ccb, CAM_DEV_NOT_THERE); - mpssas_scsiio_complete(sc, tm); - } } static void @@ -635,8 +638,17 @@ mpssas_remove_complete(struct mps_softc *sc, struct mp reply = (MPI2_SAS_IOUNIT_CONTROL_REPLY *)tm->cm_reply; handle = (uint16_t)(uintptr_t)tm->cm_complete_data; + targ = tm->cm_targ; /* + * At this point, we should have no pending commands for the target. + * The remove target has just completed. + */ + KASSERT(TAILQ_FIRST(&targ->commands) == NULL, + ("%s: no commands should be pending\n", __func__)); + + + /* * Currently there should be no way we can hit this case. It only * happens when we have a failure to allocate chain frames, and * task management commands don't have S/G lists. @@ -670,7 +682,6 @@ mpssas_remove_complete(struct mps_softc *sc, struct mp */ if ((le16toh(reply->IOCStatus) & MPI2_IOCSTATUS_MASK) == MPI2_IOCSTATUS_SUCCESS) { - targ = tm->cm_targ; targ->handle = 0x0; targ->encl_handle = 0x0; targ->encl_slot = 0x0; @@ -1712,13 +1723,17 @@ mpssas_action_scsiio(struct mpssas_softc *sassc, union /* * If devinfo is 0 this will be a volume. In that case don't tell CAM * that the volume has timed out. We want volumes to be enumerated - * until they are deleted/removed, not just failed. + * until they are deleted/removed, not just failed. In either event, + * we're removing the target due to a firmware event telling us + * the device is now gone (as opposed to some transient event). Since + * we're opting to remove failed devices from the OS's view, we need + * to propagate that status up the stack. */ if (targ->flags & MPSSAS_TARGET_INREMOVAL) { if (targ->devinfo == 0) mpssas_set_ccbstatus(ccb, CAM_REQ_CMP); else - mpssas_set_ccbstatus(ccb, CAM_SEL_TIMEOUT); + mpssas_set_ccbstatus(ccb, CAM_DEV_NOT_THERE); xpt_done(ccb); return; } @@ -2351,15 +2366,22 @@ mpssas_scsiio_complete(struct mps_softc *sc, struct mp * potential risk of flagging false failures in the event * of a topology-related error (e.g. a SAS expander problem * causes a command addressed to a drive to fail), but - * avoiding getting into an infinite retry loop. + * avoiding getting into an infinite retry loop. However, + * if we get them while were moving a device, we should + * fail the request as 'not there' because the device + * is effectively gone. */ - mpssas_set_ccbstatus(ccb, CAM_REQ_CMP_ERR); + if (cm->cm_targ->flags & MPSSAS_TARGET_INREMOVAL) + mpssas_set_ccbstatus(ccb, CAM_DEV_NOT_THERE); + else + mpssas_set_ccbstatus(ccb, CAM_REQ_CMP_ERR); mps_dprint(sc, MPS_INFO, - "Controller reported %s tgt %u SMID %u loginfo %x\n", + "Controller reported %s tgt %u SMID %u loginfo %x%s\n", mps_describe_table(mps_iocstatus_string, le16toh(rep->IOCStatus) & MPI2_IOCSTATUS_MASK), target_id, cm->cm_desc.Default.SMID, - le32toh(rep->IOCLogInfo)); + le32toh(rep->IOCLogInfo), + (cm->cm_targ->flags & MPSSAS_TARGET_INREMOVAL) ? " departing" : ""); mps_dprint(sc, MPS_XINFO, "SCSIStatus %x SCSIState %x xfercount %u\n", rep->SCSIStatus, rep->SCSIState, @@ -2398,6 +2420,21 @@ mpssas_scsiio_complete(struct mps_softc *sc, struct mp if (mpssas_get_ccbstatus(ccb) != CAM_REQ_CMP) { ccb->ccb_h.status |= CAM_DEV_QFRZN; xpt_freeze_devq(ccb->ccb_h.path, /*count*/ 1); + } + + /* + * Check to see if we're removing the device. If so, and this is the + * last command on the queue, proceed with the deferred removal of the + * device. Note, for removing a volume, this won't trigger because + * pending_remove_tm will be NULL. + */ + if (cm->cm_targ->flags & MPSSAS_TARGET_INREMOVAL) { + if (TAILQ_FIRST(&cm->cm_targ->commands) == NULL && + cm->cm_targ->pending_remove_tm != NULL) { + mps_dprint(sc, MPS_INFO, "Last pending command complete: starting remove_device\n"); + mps_map_command(sc, cm->cm_targ->pending_remove_tm); + cm->cm_targ->pending_remove_tm = NULL; + } } mps_free_command(sc, cm); Modified: stable/12/sys/dev/mps/mps_sas.h ============================================================================== --- stable/12/sys/dev/mps/mps_sas.h Thu May 28 22:05:50 2020 (r361607) +++ stable/12/sys/dev/mps/mps_sas.h Thu May 28 23:23:49 2020 (r361608) @@ -62,6 +62,7 @@ struct mpssas_target { SLIST_HEAD(, mpssas_lun) luns; TAILQ_HEAD(, mps_command) commands; struct mps_command *tm; + struct mps_command *pending_remove_tm; TAILQ_HEAD(, mps_command) timedout_commands; uint16_t exp_dev_handle; uint16_t phy_num;