From owner-svn-src-all@freebsd.org Thu Jun 18 19:16:04 2020 Return-Path: Delivered-To: svn-src-all@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 2F4263309E1; Thu, 18 Jun 2020 19:16:04 +0000 (UTC) (envelope-from mav@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 49ns9J0VSGz3f6N; Thu, 18 Jun 2020 19:16:04 +0000 (UTC) (envelope-from mav@FreeBSD.org) Received: from repo.freebsd.org (repo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 0C6AAEA3C; Thu, 18 Jun 2020 19:16:04 +0000 (UTC) (envelope-from mav@FreeBSD.org) Received: from repo.freebsd.org ([127.0.1.37]) by repo.freebsd.org (8.15.2/8.15.2) with ESMTP id 05IJG3HC097364; Thu, 18 Jun 2020 19:16:03 GMT (envelope-from mav@FreeBSD.org) Received: (from mav@localhost) by repo.freebsd.org (8.15.2/8.15.2/Submit) id 05IJG3lW097361; Thu, 18 Jun 2020 19:16:03 GMT (envelope-from mav@FreeBSD.org) Message-Id: <202006181916.05IJG3lW097361@repo.freebsd.org> X-Authentication-Warning: repo.freebsd.org: mav set sender to mav@FreeBSD.org using -f From: Alexander Motin Date: Thu, 18 Jun 2020 19:16:03 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: svn commit: r362337 - head/sys/dev/nvme X-SVN-Group: head X-SVN-Commit-Author: mav X-SVN-Commit-Paths: head/sys/dev/nvme X-SVN-Commit-Revision: 362337 X-SVN-Commit-Repository: base MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.33 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Jun 2020 19:16:04 -0000 Author: mav Date: Thu Jun 18 19:16:03 2020 New Revision: 362337 URL: https://svnweb.freebsd.org/changeset/base/362337 Log: Make polled request timeout less invasive. Instead of panic after one second of polling, make the normal timeout handler to activate, reset the controller and abort the outstanding requests. If all of it won't happen within 10 seconds then something in the driver is likely stuck bad and panic is the only way out. In particular this fixed device hot unplug during execution of those polled commands, allowing clean device detach instead of panic. MFC after: 1 week Sponsored by: iXsystems, Inc. Modified: head/sys/dev/nvme/nvme_ctrlr.c head/sys/dev/nvme/nvme_private.h head/sys/dev/nvme/nvme_qpair.c Modified: head/sys/dev/nvme/nvme_ctrlr.c ============================================================================== --- head/sys/dev/nvme/nvme_ctrlr.c Thu Jun 18 19:03:20 2020 (r362336) +++ head/sys/dev/nvme/nvme_ctrlr.c Thu Jun 18 19:16:03 2020 (r362337) @@ -520,7 +520,7 @@ nvme_ctrlr_create_qpairs(struct nvme_controller *ctrlr } status.done = 0; - nvme_ctrlr_cmd_create_io_sq(qpair->ctrlr, qpair, + nvme_ctrlr_cmd_create_io_sq(ctrlr, qpair, nvme_completion_poll_cb, &status); nvme_completion_poll(&status); if (nvme_completion_is_error(&status.cpl)) { Modified: head/sys/dev/nvme/nvme_private.h ============================================================================== --- head/sys/dev/nvme/nvme_private.h Thu Jun 18 19:03:20 2020 (r362336) +++ head/sys/dev/nvme/nvme_private.h Thu Jun 18 19:16:03 2020 (r362337) @@ -463,20 +463,22 @@ int nvme_detach(device_t dev); * Wait for a command to complete using the nvme_completion_poll_cb. * Used in limited contexts where the caller knows it's OK to block * briefly while the command runs. The ISR will run the callback which - * will set status->done to true.usually within microseconds. A 1s - * pause means something is seriously AFU and we should panic to - * provide the proper context to diagnose. + * will set status->done to true, usually within microseconds. If not, + * then after one second timeout handler should reset the controller + * and abort all outstanding requests including this polled one. If + * still not after ten seconds, then something is wrong with the driver, + * and panic is the only way to recover. */ static __inline void nvme_completion_poll(struct nvme_completion_poll_status *status) { - int sanity = hz * 1; + int sanity = hz * 10; while (!atomic_load_acq_int(&status->done) && --sanity > 0) pause("nvme", 1); if (sanity <= 0) - panic("NVME polled command failed to complete within 1s."); + panic("NVME polled command failed to complete within 10s."); } static __inline void Modified: head/sys/dev/nvme/nvme_qpair.c ============================================================================== --- head/sys/dev/nvme/nvme_qpair.c Thu Jun 18 19:03:20 2020 (r362336) +++ head/sys/dev/nvme/nvme_qpair.c Thu Jun 18 19:16:03 2020 (r362337) @@ -956,6 +956,7 @@ nvme_qpair_submit_tracker(struct nvme_qpair *qpair, st { struct nvme_request *req; struct nvme_controller *ctrlr; + int timeout; mtx_assert(&qpair->lock, MA_OWNED); @@ -964,9 +965,14 @@ nvme_qpair_submit_tracker(struct nvme_qpair *qpair, st qpair->act_tr[tr->cid] = tr; ctrlr = qpair->ctrlr; - if (req->timeout) - callout_reset_on(&tr->timer, ctrlr->timeout_period * hz, - nvme_timeout, tr, qpair->cpu); + if (req->timeout) { + if (req->cb_fn == nvme_completion_poll_cb) + timeout = hz; + else + timeout = ctrlr->timeout_period * hz; + callout_reset_on(&tr->timer, timeout, nvme_timeout, tr, + qpair->cpu); + } /* Copy the command from the tracker to the submission queue. */ memcpy(&qpair->cmd[qpair->sq_tail], &req->cmd, sizeof(req->cmd));