From nobody Tue Nov 1 18:53:57 2022 X-Original-To: dev-commits-src-all@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4N1zhB5Vfyz4h2QW; Tue, 1 Nov 2022 18:54:02 +0000 (UTC) (envelope-from rpokala@freebsd.org) Received: from smtp.freebsd.org (smtp.freebsd.org [IPv6:2610:1c1:1:606c::24b:4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "smtp.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4N1zhB55c6z4HkB; Tue, 1 Nov 2022 18:54:02 +0000 (UTC) (envelope-from rpokala@freebsd.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1667328842; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5TmzNYInzwAMvQX4Vzg7xm6Z4vT6XhJgfdgCUD47e7U=; b=IEoX/Jm/+bK7w/fEGA9r82axp8mx6/QWGYg+9x7hOPscvkZ2/4gWw6MMd3wRgg0H5UCeR6 BI9Mx99k7IVAkybffXMj1Gxc+dceQmyRclHTFi2iyn4HJPN/z6WWKvyM6fvajbcUVOaV2X c6pLFuw8Kqou4CKIFcWGKXXgGHwrLCGDqTkEVvYzxtRME1Yk8B7RHsYDBUrvppiYgR6KH+ 3a8YyXRyW+ub164TIkSMZDcHNSEJcs1fo4vlYzWGMRK/YdnoBVNTvbsmNQwVcaNWOOSk2d EgyXh6/fWZofTljUTGtS1f4sgHFCFBXIjsZ/uqKgPWyDIAFZqPZol4JRJ6cL6A== Received: from [192.168.1.10] (unknown [IPv6:2601:641:700:5284:509a:9aeb:cea0:2a81]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) (Authenticated sender: rpokala) by smtp.freebsd.org (Postfix) with ESMTPSA id 4N1zhB0JWgz12pP; Tue, 1 Nov 2022 18:54:01 +0000 (UTC) (envelope-from rpokala@freebsd.org) User-Agent: Microsoft-MacOutlook/16.66.22102801 Date: Tue, 01 Nov 2022 11:53:57 -0700 Subject: Re: f0f3e3e961d3 - main - ipmi: use a queue for kcs driver requests when possible From: Ravi Pokala To: Chuck Silvers , , , Message-ID: <930FB7AD-A360-460E-9638-AEEDD85AF2DE@panasas.com> Thread-Topic: f0f3e3e961d3 - main - ipmi: use a queue for kcs driver requests when possible References: <202211011755.2A1HtMll099820@gitrepo.freebsd.org> In-Reply-To: <202211011755.2A1HtMll099820@gitrepo.freebsd.org> List-Id: Commit messages for all branches of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-all List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-dev-commits-src-all@freebsd.org X-BeenThere: dev-commits-src-all@freebsd.org Mime-version: 1.0 Content-type: text/plain; charset="UTF-8" Content-transfer-encoding: quoted-printable ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1667328842; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5TmzNYInzwAMvQX4Vzg7xm6Z4vT6XhJgfdgCUD47e7U=; b=K+fKctp4ZUHG5alJq1wRGlHww3qQiT1IRB29oKHcDk0O75Df3bv+5Wv7DvzgnLCkGU+TH8 FJX13TgGEVB0eNn6lGCPpqn8GuF4uBwh5ibImtH+bQlxRicjEUcvctRgw6XJt3SqpBtvjg IvFHeLg+v/cQFbzjL+SbG0uicj2qQ2hTCR4cWs6j+T3JvatwlcyaR9JQ8hvF+JYZVRzMMk ME22cMUjmRxwl4oWSJmRfT+RDAhcdeqSXwuW9PtaEaQ/nbyO/dP0WfB4TIq2N+zd2uG5uQ dWkp0EL6VlWQh2cLNwMmWTAZghq4TIpvJoZUdWog3TzQrYeTeZRhX/iIU7OycA== ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1667328842; a=rsa-sha256; cv=none; b=yOEEMt4O3pNoAGqkCj2Wd3z9rLeL39zWIl8SyE/GunIMcSH8wSNluATcw88Mjw1G7hu0FL 7juWhK0Rseenm8afByL97MF8EdcoY7D8hNu4FoJIHJEHBo/PSwgOJg8Swy4OqJGlWUXrrG 8QIsm2iSF5XJkmJGqlmfnhNOY1eB0Yk/9NQvUxTUQCukPbJvkGZXtw+EblBDjOkqNarPq/ BjJewSkZj3boNAjjR4KENWoWUH7Nrwd7tYIaVygtIwG3uXe+K84twixKlsA8HnfQ/3LsCz yJjORHmMghccz5oFZSgizVuO+kPgYnqoT6ktA/m+SzAEnFejpvUpBitW5fTlZw== ARC-Authentication-Results: i=1; mx1.freebsd.org; none X-ThisMailContainsUnwantedMimeParts: N Hi Chuck, As best as I can tell, the stack for handling a request from userland (e.g.= `ipmitool') looks like this: ipmi_polled_enqueue_request_highpri() kcs_driver_request_queue() /* assuming not panicked or dumping core */ kcs_driver_request() sc->ipmi_driver_request() ipmi_submit_driver_request() ipmi_ioctl() And the stack for handling kernel-internal requests (i.e. watchdog) is esse= ntially the same, except for calling ipmi_submit_driver_request() directly r= ather than going through ipmi_ioctl(). Specifically, there doesn't seem to b= e anything that would put the two different types of requests on two differe= nt queues; it looks like everything goes onto the highpri queue. What am I missing? Thanks, Ravi (rpokala@) =EF=BB=BF-----Original Message----- From: on behalf of Chuck Silvers Date: 2022-11-01, Tuesday at 10:55 To: , , Subject: git: f0f3e3e961d3 - main - ipmi: use a queue for kcs driver reques= ts when possible The branch main has been updated by chs: URL: https://cgit.FreeBSD.org/src/commit/?id=3Df0f3e3e961d311f1cd938f1319= 385e7f454525f1 commit f0f3e3e961d311f1cd938f1319385e7f454525f1 Author: Chuck Silvers AuthorDate: 2022-11-01 17:55:14 +0000 Commit: Chuck Silvers CommitDate: 2022-11-01 17:55:14 +0000 ipmi: use a queue for kcs driver requests when possible The ipmi watchdog pretimeout action can trigger unintentionally in certain rare, complicated situations. What we have seen at Netflix is that the BMC can sometimes be sent a continuous stream of writes to port 0x80, and due to what is a bug or misconfiguration in the BMC software, this results in the BMC running out of memory, becoming very slow to respond to KCS requests, and eventually being rebooted by its own internal watchdog. While that is going on in the BMC, back in the host OS, a number of requests are pending in the ipmi request queue, and the kcs_loop thread is working on processing these requests. All of the KCS accesses to process those requests are timing out and eventually failing because the BMC is responding very slowly or not at all, and the kcs_loop threa= d is holding the IPMI_IO_LOCK the whole time that is going on. Meanwhile the watchdogd process in the host is trying to pat the BMC watchdog, and this process is sleeping waiting to get the IPMI_IO_LOCK. It's not entirely clear why the watchdogd process is sleeping for this lock, because the intention is that a thread holding the IPMI_IO_LOCK should not sleep and thus any thread that wants the lock should just spin to wait for it. My best guess is that the kcs_loop thread is spinning waiting for the BMC to respond for so long that it is eventually preempted, and during the brief interval when the kcs_loop thread is not running, the watchdogd thread notices that the lock holder is not running and sleeps. When the kcs_loop thread eventually finishes processin= g one request, it drops the IPMI_IO_LOCK and then immediately takes t= he lock again so it can process the next request in the queue. Because the watchdogd thread is sleeping at this point, the kcs_loo= p always wins the race to acquire the IPMI_IO_LOCK, thus starving the watchdogd thread. The callout for the watchdog pretimeout would be reset by the watchdogd thread after its request to the BMC watchdog completes, but since that request never processed, the pretimeout callout eventually fires, even though there is nothing actually wrong with the host. To prevent this saga from unfolding: - when kcs_driver_request() is called in a context where it can sl= eep, queue the request and let the worker thread process it rather th= an trying to process in the original thread. - add a new high-priority queue for driver requests, so that the watchdog patting requests will be processed as quickly as possib= le even if lots of application requests have already been queued. With these two changes, the watchdog pretimeout action does not tri= gger even if the BMC is completely out to lunch for long periods of time (as long as the watchdogd check command does not also get stuck). Sponsored by: Netflix Reviewed by: imp Differential Revision: https://reviews.freebsd.org/D36555 --- sys/dev/ipmi/ipmi.c | 33 ++++++++++++++++++++++++++++++--- sys/dev/ipmi/ipmi_kcs.c | 28 +++++++++++++++++++++++++++- sys/dev/ipmi/ipmivars.h | 2 ++ 3 files changed, 59 insertions(+), 4 deletions(-) diff --git a/sys/dev/ipmi/ipmi.c b/sys/dev/ipmi/ipmi.c index d79690d55c68..b8705a81627b 100644 --- a/sys/dev/ipmi/ipmi.c +++ b/sys/dev/ipmi/ipmi.c @@ -208,6 +208,15 @@ ipmi_dtor(void *arg) IPMI_LOCK(sc); if (dev->ipmi_requests) { /* Throw away any pending requests for this device. */ + TAILQ_FOREACH_SAFE(req, &sc->ipmi_pending_requests_highpri, ir_link, + nreq) { + if (req->ir_owner =3D=3D dev) { + TAILQ_REMOVE(&sc->ipmi_pending_requests_highpri, req, + ir_link); + dev->ipmi_requests--; + ipmi_free_request(req); + } + } TAILQ_FOREACH_SAFE(req, &sc->ipmi_pending_requests, ir_link, nreq) { if (req->ir_owner =3D=3D dev) { @@ -579,13 +588,19 @@ ipmi_dequeue_request(struct ipmi_softc *sc) IPMI_LOCK_ASSERT(sc); - while (!sc->ipmi_detaching && TAILQ_EMPTY(&sc->ipmi_pending_requests)= ) + while (!sc->ipmi_detaching && TAILQ_EMPTY(&sc->ipmi_pending_requests)= && + TAILQ_EMPTY(&sc->ipmi_pending_requests_highpri)) cv_wait(&sc->ipmi_request_added, &sc->ipmi_requests_lock); if (sc->ipmi_detaching) return (NULL); - req =3D TAILQ_FIRST(&sc->ipmi_pending_requests); - TAILQ_REMOVE(&sc->ipmi_pending_requests, req, ir_link); + req =3D TAILQ_FIRST(&sc->ipmi_pending_requests_highpri); + if (req !=3D NULL) + TAILQ_REMOVE(&sc->ipmi_pending_requests_highpri, req, ir_link); + else { + req =3D TAILQ_FIRST(&sc->ipmi_pending_requests); + TAILQ_REMOVE(&sc->ipmi_pending_requests, req, ir_link); + } return (req); } @@ -601,6 +616,17 @@ ipmi_polled_enqueue_request(struct ipmi_softc *sc,= struct ipmi_request *req) return (0); } +int +ipmi_polled_enqueue_request_highpri(struct ipmi_softc *sc, struct ipmi= _request *req) +{ + + IPMI_LOCK_ASSERT(sc); + + TAILQ_INSERT_TAIL(&sc->ipmi_pending_requests_highpri, req, ir_link); + cv_signal(&sc->ipmi_request_added); + return (0); +} + /* * Watchdog event handler. */ @@ -817,6 +843,7 @@ ipmi_startup(void *arg) mtx_init(&sc->ipmi_requests_lock, "ipmi requests", NULL, MTX_DEF); mtx_init(&sc->ipmi_io_lock, "ipmi io", NULL, MTX_DEF); cv_init(&sc->ipmi_request_added, "ipmireq"); + TAILQ_INIT(&sc->ipmi_pending_requests_highpri); TAILQ_INIT(&sc->ipmi_pending_requests); /* Initialize interface-dependent state. */ diff --git a/sys/dev/ipmi/ipmi_kcs.c b/sys/dev/ipmi/ipmi_kcs.c index df3b37614eb7..5908ec88e039 100644 --- a/sys/dev/ipmi/ipmi_kcs.c +++ b/sys/dev/ipmi/ipmi_kcs.c @@ -32,6 +32,7 @@ __FBSDID("$FreeBSD$"); #include #include #include +#include #include #include #include @@ -490,7 +491,21 @@ kcs_startup(struct ipmi_softc *sc) } static int -kcs_driver_request(struct ipmi_softc *sc, struct ipmi_request *req, in= t timo) +kcs_driver_request_queue(struct ipmi_softc *sc, struct ipmi_request *r= eq, int timo) +{ + int error; + + IPMI_LOCK(sc); + ipmi_polled_enqueue_request_highpri(sc, req); + error =3D msleep(req, &sc->ipmi_requests_lock, 0, "ipmireq", timo); + if (error =3D=3D 0) + error =3D req->ir_error; + IPMI_UNLOCK(sc); + return (error); +} + +static int +kcs_driver_request_poll(struct ipmi_softc *sc, struct ipmi_request *re= q) { int i, ok; @@ -504,6 +519,17 @@ kcs_driver_request(struct ipmi_softc *sc, struct i= pmi_request *req, int timo) return (req->ir_error); } +static int +kcs_driver_request(struct ipmi_softc *sc, struct ipmi_request *req, in= t timo) +{ + + if (KERNEL_PANICKED() || dumping) + return (kcs_driver_request_poll(sc, req)); + else + return (kcs_driver_request_queue(sc, req, timo)); +} + + int ipmi_kcs_attach(struct ipmi_softc *sc) { diff --git a/sys/dev/ipmi/ipmivars.h b/sys/dev/ipmi/ipmivars.h index b0548ee3d7c3..1468e86be73b 100644 --- a/sys/dev/ipmi/ipmivars.h +++ b/sys/dev/ipmi/ipmivars.h @@ -111,6 +111,7 @@ struct ipmi_softc { uint8_t ipmi_dev_support; /* IPMI_ADS_* */ struct cdev *ipmi_cdev; TAILQ_HEAD(,ipmi_request) ipmi_pending_requests; + TAILQ_HEAD(,ipmi_request) ipmi_pending_requests_highpri; int ipmi_driver_requests_polled; eventhandler_tag ipmi_power_cycle_tag; eventhandler_tag ipmi_watchdog_tag; @@ -237,6 +238,7 @@ void ipmi_complete_request(struct ipmi_softc *, str= uct ipmi_request *); struct ipmi_request *ipmi_dequeue_request(struct ipmi_softc *); void ipmi_free_request(struct ipmi_request *); int ipmi_polled_enqueue_request(struct ipmi_softc *, struct ipmi_reque= st *); +int ipmi_polled_enqueue_request_highpri(struct ipmi_softc *, struct ip= mi_request *); int ipmi_submit_driver_request(struct ipmi_softc *, struct ipmi_reques= t *, int);