Date: Tue, 1 Nov 2022 14:55:51 -0700 From: Chuck Silvers <chuq@chuq.com> To: Ravi Pokala <rpokala@freebsd.org> Cc: src-committers@FreeBSD.org, dev-commits-src-all@FreeBSD.org, dev-commits-src-main@FreeBSD.org Subject: Re: f0f3e3e961d3 - main - ipmi: use a queue for kcs driver requests when possible Message-ID: <Y2GV5zRzPj%2BvSynz@spathi.chuq.com> In-Reply-To: <930FB7AD-A360-460E-9638-AEEDD85AF2DE@panasas.com> References: <202211011755.2A1HtMll099820@gitrepo.freebsd.org> <930FB7AD-A360-460E-9638-AEEDD85AF2DE@panasas.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Hi Ravi, the usual path (the IPMI_SYSTEM_INTERFACE_ADDR_TYPE case) for an "application" request (ie. one from userland) is: ipmi_polled_enqueue_request() sc->ipmi_enqueue_request() ipmi_ioctl() which uses the regular (low priority) queue. in the non-IPMI_SYSTEM_INTERFACE_ADDR_TYPE case (aka. the IPMB case), ipmi_ioctl() does issue two driver requests to set up the IPMB stuff before then issuing the real application request with sc->ipmi_enqueue_request() as above. I don't know much about this IPMB thing, it looks like a kind of pass-through interface for communicating with another device that sits behind the main BMC IPMI controller. I suppose that it would make sense to change these two IPMB setup driver requests to use the regular (low priority) queue (since sleeping is ok in that path), but I don't think I have any way to test that since our systems don't use this path. -Chuck On Tue, Nov 01, 2022 at 11:53:57AM -0700, Ravi Pokala wrote: > Hi Chuck, > > As best as I can tell, the stack for handling a request from userland (e.g. `ipmitool') looks like this: > > ipmi_polled_enqueue_request_highpri() > kcs_driver_request_queue() /* assuming not panicked or dumping core */ > kcs_driver_request() > sc->ipmi_driver_request() > ipmi_submit_driver_request() > ipmi_ioctl() > > And the stack for handling kernel-internal requests (i.e. watchdog) is essentially the same, except for calling ipmi_submit_driver_request() directly rather than going through ipmi_ioctl(). Specifically, there doesn't seem to be anything that would put the two different types of requests on two different queues; it looks like everything goes onto the highpri queue. > > What am I missing? > > Thanks, > > Ravi (rpokala@) > > -----Original Message----- > From: <owner-src-committers@freebsd.org> on behalf of Chuck Silvers <chs@FreeBSD.org> > Date: 2022-11-01, Tuesday at 10:55 > To: <src-committers@FreeBSD.org>, <dev-commits-src-all@FreeBSD.org>, <dev-commits-src-main@FreeBSD.org> > Subject: git: f0f3e3e961d3 - main - ipmi: use a queue for kcs driver requests when possible > > The branch main has been updated by chs: > > URL: https://cgit.FreeBSD.org/src/commit/?id=f0f3e3e961d311f1cd938f1319385e7f454525f1 > > commit f0f3e3e961d311f1cd938f1319385e7f454525f1 > Author: Chuck Silvers <chs@FreeBSD.org> > AuthorDate: 2022-11-01 17:55:14 +0000 > Commit: Chuck Silvers <chs@FreeBSD.org> > CommitDate: 2022-11-01 17:55:14 +0000 > > ipmi: use a queue for kcs driver requests when possible > > The ipmi watchdog pretimeout action can trigger unintentionally in > certain rare, complicated situations. What we have seen at Netflix > is that the BMC can sometimes be sent a continuous stream of > writes to port 0x80, and due to what is a bug or misconfiguration > in the BMC software, this results in the BMC running out of memory, > becoming very slow to respond to KCS requests, and eventually being > rebooted by its own internal watchdog. While that is going on in > the BMC, back in the host OS, a number of requests are pending in > the ipmi request queue, and the kcs_loop thread is working on > processing these requests. All of the KCS accesses to process > those requests are timing out and eventually failing because the > BMC is responding very slowly or not at all, and the kcs_loop thread > is holding the IPMI_IO_LOCK the whole time that is going on. > Meanwhile the watchdogd process in the host is trying to pat the > BMC watchdog, and this process is sleeping waiting to get the > IPMI_IO_LOCK. It's not entirely clear why the watchdogd process > is sleeping for this lock, because the intention is that a thread > holding the IPMI_IO_LOCK should not sleep and thus any thread > that wants the lock should just spin to wait for it. My best guess > is that the kcs_loop thread is spinning waiting for the BMC to > respond for so long that it is eventually preempted, and during > the brief interval when the kcs_loop thread is not running, > the watchdogd thread notices that the lock holder is not running > and sleeps. When the kcs_loop thread eventually finishes processing > one request, it drops the IPMI_IO_LOCK and then immediately takes the > lock again so it can process the next request in the queue. > Because the watchdogd thread is sleeping at this point, the kcs_loop > always wins the race to acquire the IPMI_IO_LOCK, thus starving > the watchdogd thread. The callout for the watchdog pretimeout > would be reset by the watchdogd thread after its request to the BMC > watchdog completes, but since that request never processed, the > pretimeout callout eventually fires, even though there is nothing > actually wrong with the host. > > To prevent this saga from unfolding: > > - when kcs_driver_request() is called in a context where it can sleep, > queue the request and let the worker thread process it rather than > trying to process in the original thread. > - add a new high-priority queue for driver requests, so that the > watchdog patting requests will be processed as quickly as possible > even if lots of application requests have already been queued. > > With these two changes, the watchdog pretimeout action does not trigger > even if the BMC is completely out to lunch for long periods of time > (as long as the watchdogd check command does not also get stuck). > > Sponsored by: Netflix > Reviewed by: imp > Differential Revision: https://reviews.freebsd.org/D36555 > --- > sys/dev/ipmi/ipmi.c | 33 ++++++++++++++++++++++++++++++--- > sys/dev/ipmi/ipmi_kcs.c | 28 +++++++++++++++++++++++++++- > sys/dev/ipmi/ipmivars.h | 2 ++ > 3 files changed, 59 insertions(+), 4 deletions(-) > > diff --git a/sys/dev/ipmi/ipmi.c b/sys/dev/ipmi/ipmi.c > index d79690d55c68..b8705a81627b 100644 > --- a/sys/dev/ipmi/ipmi.c > +++ b/sys/dev/ipmi/ipmi.c > @@ -208,6 +208,15 @@ ipmi_dtor(void *arg) > IPMI_LOCK(sc); > if (dev->ipmi_requests) { > /* Throw away any pending requests for this device. */ > + TAILQ_FOREACH_SAFE(req, &sc->ipmi_pending_requests_highpri, ir_link, > + nreq) { > + if (req->ir_owner == dev) { > + TAILQ_REMOVE(&sc->ipmi_pending_requests_highpri, req, > + ir_link); > + dev->ipmi_requests--; > + ipmi_free_request(req); > + } > + } > TAILQ_FOREACH_SAFE(req, &sc->ipmi_pending_requests, ir_link, > nreq) { > if (req->ir_owner == dev) { > @@ -579,13 +588,19 @@ ipmi_dequeue_request(struct ipmi_softc *sc) > > IPMI_LOCK_ASSERT(sc); > > - while (!sc->ipmi_detaching && TAILQ_EMPTY(&sc->ipmi_pending_requests)) > + while (!sc->ipmi_detaching && TAILQ_EMPTY(&sc->ipmi_pending_requests) && > + TAILQ_EMPTY(&sc->ipmi_pending_requests_highpri)) > cv_wait(&sc->ipmi_request_added, &sc->ipmi_requests_lock); > if (sc->ipmi_detaching) > return (NULL); > > - req = TAILQ_FIRST(&sc->ipmi_pending_requests); > - TAILQ_REMOVE(&sc->ipmi_pending_requests, req, ir_link); > + req = TAILQ_FIRST(&sc->ipmi_pending_requests_highpri); > + if (req != NULL) > + TAILQ_REMOVE(&sc->ipmi_pending_requests_highpri, req, ir_link); > + else { > + req = TAILQ_FIRST(&sc->ipmi_pending_requests); > + TAILQ_REMOVE(&sc->ipmi_pending_requests, req, ir_link); > + } > return (req); > } > > @@ -601,6 +616,17 @@ ipmi_polled_enqueue_request(struct ipmi_softc *sc, struct ipmi_request *req) > return (0); > } > > +int > +ipmi_polled_enqueue_request_highpri(struct ipmi_softc *sc, struct ipmi_request *req) > +{ > + > + IPMI_LOCK_ASSERT(sc); > + > + TAILQ_INSERT_TAIL(&sc->ipmi_pending_requests_highpri, req, ir_link); > + cv_signal(&sc->ipmi_request_added); > + return (0); > +} > + > /* > * Watchdog event handler. > */ > @@ -817,6 +843,7 @@ ipmi_startup(void *arg) > mtx_init(&sc->ipmi_requests_lock, "ipmi requests", NULL, MTX_DEF); > mtx_init(&sc->ipmi_io_lock, "ipmi io", NULL, MTX_DEF); > cv_init(&sc->ipmi_request_added, "ipmireq"); > + TAILQ_INIT(&sc->ipmi_pending_requests_highpri); > TAILQ_INIT(&sc->ipmi_pending_requests); > > /* Initialize interface-dependent state. */ > diff --git a/sys/dev/ipmi/ipmi_kcs.c b/sys/dev/ipmi/ipmi_kcs.c > index df3b37614eb7..5908ec88e039 100644 > --- a/sys/dev/ipmi/ipmi_kcs.c > +++ b/sys/dev/ipmi/ipmi_kcs.c > @@ -32,6 +32,7 @@ __FBSDID("$FreeBSD$"); > #include <sys/param.h> > #include <sys/systm.h> > #include <sys/bus.h> > +#include <sys/conf.h> > #include <sys/condvar.h> > #include <sys/eventhandler.h> > #include <sys/kernel.h> > @@ -490,7 +491,21 @@ kcs_startup(struct ipmi_softc *sc) > } > > static int > -kcs_driver_request(struct ipmi_softc *sc, struct ipmi_request *req, int timo) > +kcs_driver_request_queue(struct ipmi_softc *sc, struct ipmi_request *req, int timo) > +{ > + int error; > + > + IPMI_LOCK(sc); > + ipmi_polled_enqueue_request_highpri(sc, req); > + error = msleep(req, &sc->ipmi_requests_lock, 0, "ipmireq", timo); > + if (error == 0) > + error = req->ir_error; > + IPMI_UNLOCK(sc); > + return (error); > +} > + > +static int > +kcs_driver_request_poll(struct ipmi_softc *sc, struct ipmi_request *req) > { > int i, ok; > > @@ -504,6 +519,17 @@ kcs_driver_request(struct ipmi_softc *sc, struct ipmi_request *req, int timo) > return (req->ir_error); > } > > +static int > +kcs_driver_request(struct ipmi_softc *sc, struct ipmi_request *req, int timo) > +{ > + > + if (KERNEL_PANICKED() || dumping) > + return (kcs_driver_request_poll(sc, req)); > + else > + return (kcs_driver_request_queue(sc, req, timo)); > +} > + > + > int > ipmi_kcs_attach(struct ipmi_softc *sc) > { > diff --git a/sys/dev/ipmi/ipmivars.h b/sys/dev/ipmi/ipmivars.h > index b0548ee3d7c3..1468e86be73b 100644 > --- a/sys/dev/ipmi/ipmivars.h > +++ b/sys/dev/ipmi/ipmivars.h > @@ -111,6 +111,7 @@ struct ipmi_softc { > uint8_t ipmi_dev_support; /* IPMI_ADS_* */ > struct cdev *ipmi_cdev; > TAILQ_HEAD(,ipmi_request) ipmi_pending_requests; > + TAILQ_HEAD(,ipmi_request) ipmi_pending_requests_highpri; > int ipmi_driver_requests_polled; > eventhandler_tag ipmi_power_cycle_tag; > eventhandler_tag ipmi_watchdog_tag; > @@ -237,6 +238,7 @@ void ipmi_complete_request(struct ipmi_softc *, struct ipmi_request *); > struct ipmi_request *ipmi_dequeue_request(struct ipmi_softc *); > void ipmi_free_request(struct ipmi_request *); > int ipmi_polled_enqueue_request(struct ipmi_softc *, struct ipmi_request *); > +int ipmi_polled_enqueue_request_highpri(struct ipmi_softc *, struct ipmi_request *); > int ipmi_submit_driver_request(struct ipmi_softc *, struct ipmi_request *, > int); > >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Y2GV5zRzPj%2BvSynz>