From owner-svn-src-all@freebsd.org Sun Nov 24 15:24:06 2019 Return-Path: Delivered-To: svn-src-all@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id A4F201B1ED9; Sun, 24 Nov 2019 15:24:06 +0000 (UTC) (envelope-from imp@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 47LYqB3pzLz47Nv; Sun, 24 Nov 2019 15:24:06 +0000 (UTC) (envelope-from imp@FreeBSD.org) Received: from repo.freebsd.org (repo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 4C33A26253; Sun, 24 Nov 2019 15:24:06 +0000 (UTC) (envelope-from imp@FreeBSD.org) Received: from repo.freebsd.org ([127.0.1.37]) by repo.freebsd.org (8.15.2/8.15.2) with ESMTP id xAOFO6Ts053878; Sun, 24 Nov 2019 15:24:06 GMT (envelope-from imp@FreeBSD.org) Received: (from imp@localhost) by repo.freebsd.org (8.15.2/8.15.2/Submit) id xAOFO6Bg053877; Sun, 24 Nov 2019 15:24:06 GMT (envelope-from imp@FreeBSD.org) Message-Id: <201911241524.xAOFO6Bg053877@repo.freebsd.org> X-Authentication-Warning: repo.freebsd.org: imp set sender to imp@FreeBSD.org using -f From: Warner Losh Date: Sun, 24 Nov 2019 15:24:06 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: svn commit: r355056 - in head/sys/dev: mpr mps X-SVN-Group: head X-SVN-Commit-Author: imp X-SVN-Commit-Paths: in head/sys/dev: mpr mps X-SVN-Commit-Revision: 355056 X-SVN-Commit-Repository: base MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 24 Nov 2019 15:24:06 -0000 Author: imp Date: Sun Nov 24 15:24:05 2019 New Revision: 355056 URL: https://svnweb.freebsd.org/changeset/base/355056 Log: Fix leak in state machine for commands. When we get a device departed message from the firmware, we send a TARGET_REST to the device to let the firmware know we're done and as part of the recovery process. This will abort all the commands. While the documentation says the IOC is responsible for writing the completion message for all the commands pending with an aborted status, we sometimes have queued commands for the target that haven't been completed so are in the INQUEUE state. So, when we later complete the pending CCB as aborted, these commands are freed and we hit the "state not busy" panic. Elsewhere where we dequeue commands, we move the state to BUSY from INQUEUE. Do that here as well. In talking to Ken, Scott and Justin, they recommended a series of tests to see if this is 100% safe. Those tests are ongoing, but preliminary tests suggest this is safe as we see no duplicate completions when we hit this case at work. We have a machine that has a dodgy powersupply which usually doesn't apply power to a few drives, but sometimes does when the machine is under heavy load so we get a rash of the connect / disconnect messages over half an hour. Without this change, we'd see state not busy panic. With this change, the drives just annoyingly come and go without affecting the rest of the machine, but without a complete error injection test suite, it's hard to know if all edge cases are now covered or not. Discussed with: scottl, ken, gibbs Modified: head/sys/dev/mpr/mpr_sas.c head/sys/dev/mps/mps_sas.c Modified: head/sys/dev/mpr/mpr_sas.c ============================================================================== --- head/sys/dev/mpr/mpr_sas.c Sun Nov 24 15:03:35 2019 (r355055) +++ head/sys/dev/mpr/mpr_sas.c Sun Nov 24 15:24:05 2019 (r355056) @@ -624,6 +624,7 @@ mprsas_remove_device(struct mpr_softc *sc, struct mpr_ mpr_dprint(sc, MPR_XINFO, "Completing missed command %p\n", tm); ccb = tm->cm_complete_data; mprsas_set_ccbstatus(ccb, CAM_DEV_NOT_THERE); + tm->cm_state = MPR_CM_STATE_BUSY; mprsas_scsiio_complete(sc, tm); } } Modified: head/sys/dev/mps/mps_sas.c ============================================================================== --- head/sys/dev/mps/mps_sas.c Sun Nov 24 15:03:35 2019 (r355055) +++ head/sys/dev/mps/mps_sas.c Sun Nov 24 15:24:05 2019 (r355056) @@ -619,6 +619,7 @@ mpssas_remove_device(struct mps_softc *sc, struct mps_ mps_dprint(sc, MPS_XINFO, "Completing missed command %p\n", tm); ccb = tm->cm_complete_data; mpssas_set_ccbstatus(ccb, CAM_DEV_NOT_THERE); + tm->cm_state = MPS_CM_STATE_BUSY; mpssas_scsiio_complete(sc, tm); } }