Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 01 Nov 2022 15:09:09 -0700
From:      Ravi Pokala <rpokala@freebsd.org>
To:        Chuck Silvers <chuq@chuq.com>
Cc:        <src-committers@FreeBSD.org>, <dev-commits-src-all@FreeBSD.org>, <dev-commits-src-main@FreeBSD.org>
Subject:   Re: f0f3e3e961d3 - main - ipmi: use a queue for kcs driver requests when possible
Message-ID:  <E65B5AB0-0EF7-4BA1-8440-104AA09A695D@panasas.com>
In-Reply-To: <Y2GV5zRzPj%2BvSynz@spathi.chuq.com>
References:  <202211011755.2A1HtMll099820@gitrepo.freebsd.org> <930FB7AD-A360-460E-9638-AEEDD85AF2DE@panasas.com> <Y2GV5zRzPj%2BvSynz@spathi.chuq.com>

next in thread | previous in thread | raw e-mail | index | archive | help
My understanding is that IPMB and KCS are different physical-layer interfac=
es to the BMC, and only one of them is active on any given motherboard. So i=
f the kernel is using KCS, then so is userland. This can be grepped out of /=
var/run/dmesg.boot:

    ipmi0: <IPMI System Interface> port 0xca2,0xca3 on acpi0
    ipmi0: KCS mode found at io 0xca2 on acpi
    ipmi0: IPMI device rev. 1, firmware rev. 1.73, version 2.0, device supp=
ort mask 0xbf
    ipmi0: Number of channels 2
    ipmi0: Attached watchdog
    ipmi0: Establishing power cycle handler

-Ravi (rpokala@)

=EF=BB=BF-----Original Message-----
From: Chuck Silvers <chuq@chuq.com>
Date: 2022-11-01, Tuesday at 14:55
To: Ravi Pokala <rpokala@freebsd.org>
Cc: <src-committers@FreeBSD.org>, <dev-commits-src-all@FreeBSD.org>, <dev-c=
ommits-src-main@FreeBSD.org>
Subject: Re: f0f3e3e961d3 - main - ipmi: use a queue for kcs driver request=
s when possible

    Hi Ravi,

    the usual path (the IPMI_SYSTEM_INTERFACE_ADDR_TYPE case) for an "appli=
cation" request
    (ie. one from userland) is:

    ipmi_polled_enqueue_request()
    sc->ipmi_enqueue_request()
    ipmi_ioctl()

    which uses the regular (low priority) queue.

    in the non-IPMI_SYSTEM_INTERFACE_ADDR_TYPE case (aka. the IPMB case),
    ipmi_ioctl() does issue two driver requests to set up the IPMB stuff be=
fore
    then issuing the real application request with sc->ipmi_enqueue_request=
() as above.
    I don't know much about this IPMB thing, it looks like a kind of pass-t=
hrough
    interface for communicating with another device that sits behind the ma=
in
    BMC IPMI controller.

    I suppose that it would make sense to change these two IPMB setup drive=
r requests
    to use the regular (low priority) queue (since sleeping is ok in that p=
ath),
    but I don't think I have any way to test that since our systems don't u=
se this path.

    -Chuck


    On Tue, Nov 01, 2022 at 11:53:57AM -0700, Ravi Pokala wrote:
    > Hi Chuck,
    >=20
    > As best as I can tell, the stack for handling a request from userland=
 (e.g. `ipmitool') looks like this:
    >=20
    > ipmi_polled_enqueue_request_highpri()
    > kcs_driver_request_queue() /* assuming not panicked or dumping core *=
/
    > kcs_driver_request()
    > sc->ipmi_driver_request()
    > ipmi_submit_driver_request()
    > ipmi_ioctl()
    >=20
    > And the stack for handling kernel-internal requests (i.e. watchdog) i=
s essentially the same, except for calling ipmi_submit_driver_request() dire=
ctly rather than going through ipmi_ioctl(). Specifically, there doesn't see=
m to be anything that would put the two different types of requests on two d=
ifferent queues; it looks like everything goes onto the highpri queue.
    >=20
    > What am I missing?
    >=20
    > Thanks,
    >=20
    > Ravi (rpokala@)
    >=20
    > =EF=BB=BF-----Original Message-----
    > From: <owner-src-committers@freebsd.org> on behalf of Chuck Silvers <=
chs@FreeBSD.org>
    > Date: 2022-11-01, Tuesday at 10:55
    > To: <src-committers@FreeBSD.org>, <dev-commits-src-all@FreeBSD.org>, =
<dev-commits-src-main@FreeBSD.org>
    > Subject: git: f0f3e3e961d3 - main - ipmi: use a queue for kcs driver =
requests when possible
    >=20
    >     The branch main has been updated by chs:
    >=20
    >     URL: https://cgit.FreeBSD.org/src/commit/?id=3Df0f3e3e961d311f1cd93=
8f1319385e7f454525f1
    >=20
    >     commit f0f3e3e961d311f1cd938f1319385e7f454525f1
    >     Author:     Chuck Silvers <chs@FreeBSD.org>
    >     AuthorDate: 2022-11-01 17:55:14 +0000
    >     Commit:     Chuck Silvers <chs@FreeBSD.org>
    >     CommitDate: 2022-11-01 17:55:14 +0000
    >=20
    >         ipmi: use a queue for kcs driver requests when possible
    >=20
    >         The ipmi watchdog pretimeout action can trigger unintentional=
ly in
    >         certain rare, complicated situations.  What we have seen at N=
etflix
    >         is that the BMC can sometimes be sent a continuous stream of
    >         writes to port 0x80, and due to what is a bug or misconfigura=
tion
    >         in the BMC software, this results in the BMC running out of m=
emory,
    >         becoming very slow to respond to KCS requests, and eventually=
 being
    >         rebooted by its own internal watchdog.  While that is going o=
n in
    >         the BMC, back in the host OS, a number of requests are pendin=
g in
    >         the ipmi request queue, and the kcs_loop thread is working on
    >         processing these requests.  All of the KCS accesses to proces=
s
    >         those requests are timing out and eventually failing because =
the
    >         BMC is responding very slowly or not at all, and the kcs_loop=
 thread
    >         is holding the IPMI_IO_LOCK the whole time that is going on.
    >         Meanwhile the watchdogd process in the host is trying to pat =
the
    >         BMC watchdog, and this process is sleeping waiting to get the
    >         IPMI_IO_LOCK.  It's not entirely clear why the watchdogd proc=
ess
    >         is sleeping for this lock, because the intention is that a th=
read
    >         holding the IPMI_IO_LOCK should not sleep and thus any thread
    >         that wants the lock should just spin to wait for it.  My best=
 guess
    >         is that the kcs_loop thread is spinning waiting for the BMC t=
o
    >         respond for so long that it is eventually preempted, and duri=
ng
    >         the brief interval when the kcs_loop thread is not running,
    >         the watchdogd thread notices that the lock holder is not runn=
ing
    >         and sleeps.  When the kcs_loop thread eventually finishes pro=
cessing
    >         one request, it drops the IPMI_IO_LOCK and then immediately t=
akes the
    >         lock again so it can process the next request in the queue.
    >         Because the watchdogd thread is sleeping at this point, the k=
cs_loop
    >         always wins the race to acquire the IPMI_IO_LOCK, thus starvi=
ng
    >         the watchdogd thread.  The callout for the watchdog pretimeou=
t
    >         would be reset by the watchdogd thread after its request to t=
he BMC
    >         watchdog completes, but since that request never processed, t=
he
    >         pretimeout callout eventually fires, even though there is not=
hing
    >         actually wrong with the host.
    >=20
    >         To prevent this saga from unfolding:
    >=20
    >          - when kcs_driver_request() is called in a context where it =
can sleep,
    >            queue the request and let the worker thread process it rat=
her than
    >            trying to process in the original thread.
    >          - add a new high-priority queue for driver requests, so that=
 the
    >            watchdog patting requests will be processed as quickly as =
possible
    >            even if lots of application requests have already been que=
ued.
    >=20
    >         With these two changes, the watchdog pretimeout action does n=
ot trigger
    >         even if the BMC is completely out to lunch for long periods o=
f time
    >         (as long as the watchdogd check command does not also get stu=
ck).
    >=20
    >         Sponsored by:   Netflix
    >         Reviewed by:    imp
    >         Differential Revision:  https://reviews.freebsd.org/D36555
    >     ---
    >      sys/dev/ipmi/ipmi.c     | 33 ++++++++++++++++++++++++++++++---
    >      sys/dev/ipmi/ipmi_kcs.c | 28 +++++++++++++++++++++++++++-
    >      sys/dev/ipmi/ipmivars.h |  2 ++
    >      3 files changed, 59 insertions(+), 4 deletions(-)
    >=20
    >     diff --git a/sys/dev/ipmi/ipmi.c b/sys/dev/ipmi/ipmi.c
    >     index d79690d55c68..b8705a81627b 100644
    >     --- a/sys/dev/ipmi/ipmi.c
    >     +++ b/sys/dev/ipmi/ipmi.c
    >     @@ -208,6 +208,15 @@ ipmi_dtor(void *arg)
    >      	IPMI_LOCK(sc);
    >      	if (dev->ipmi_requests) {
    >      		/* Throw away any pending requests for this device. */
    >     +		TAILQ_FOREACH_SAFE(req, &sc->ipmi_pending_requests_highpri, ir=
_link,
    >     +		    nreq) {
    >     +			if (req->ir_owner =3D=3D dev) {
    >     +				TAILQ_REMOVE(&sc->ipmi_pending_requests_highpri, req,
    >     +				    ir_link);
    >     +				dev->ipmi_requests--;
    >     +				ipmi_free_request(req);
    >     +			}
    >     +		}
    >      		TAILQ_FOREACH_SAFE(req, &sc->ipmi_pending_requests, ir_link,
    >      		    nreq) {
    >      			if (req->ir_owner =3D=3D dev) {
    >     @@ -579,13 +588,19 @@ ipmi_dequeue_request(struct ipmi_softc *sc)
    >=20
    >      	IPMI_LOCK_ASSERT(sc);
    >=20
    >     -	while (!sc->ipmi_detaching && TAILQ_EMPTY(&sc->ipmi_pending_req=
uests))
    >     +	while (!sc->ipmi_detaching && TAILQ_EMPTY(&sc->ipmi_pending_req=
uests) &&
    >     +	    TAILQ_EMPTY(&sc->ipmi_pending_requests_highpri))
    >      		cv_wait(&sc->ipmi_request_added, &sc->ipmi_requests_lock);
    >      	if (sc->ipmi_detaching)
    >      		return (NULL);
    >=20
    >     -	req =3D TAILQ_FIRST(&sc->ipmi_pending_requests);
    >     -	TAILQ_REMOVE(&sc->ipmi_pending_requests, req, ir_link);
    >     +	req =3D TAILQ_FIRST(&sc->ipmi_pending_requests_highpri);
    >     +	if (req !=3D NULL)
    >     +		TAILQ_REMOVE(&sc->ipmi_pending_requests_highpri, req, ir_link)=
;
    >     +	else {
    >     +		req =3D TAILQ_FIRST(&sc->ipmi_pending_requests);
    >     +		TAILQ_REMOVE(&sc->ipmi_pending_requests, req, ir_link);
    >     +	}
    >      	return (req);
    >      }
    >=20
    >     @@ -601,6 +616,17 @@ ipmi_polled_enqueue_request(struct ipmi_soft=
c *sc, struct ipmi_request *req)
    >      	return (0);
    >      }
    >=20
    >     +int
    >     +ipmi_polled_enqueue_request_highpri(struct ipmi_softc *sc, struc=
t ipmi_request *req)
    >     +{
    >     +
    >     +	IPMI_LOCK_ASSERT(sc);
    >     +
    >     +	TAILQ_INSERT_TAIL(&sc->ipmi_pending_requests_highpri, req, ir_l=
ink);
    >     +	cv_signal(&sc->ipmi_request_added);
    >     +	return (0);
    >     +}
    >     +
    >      /*
    >       * Watchdog event handler.
    >       */
    >     @@ -817,6 +843,7 @@ ipmi_startup(void *arg)
    >      	mtx_init(&sc->ipmi_requests_lock, "ipmi requests", NULL, MTX_DE=
F);
    >      	mtx_init(&sc->ipmi_io_lock, "ipmi io", NULL, MTX_DEF);
    >      	cv_init(&sc->ipmi_request_added, "ipmireq");
    >     +	TAILQ_INIT(&sc->ipmi_pending_requests_highpri);
    >      	TAILQ_INIT(&sc->ipmi_pending_requests);
    >=20
    >      	/* Initialize interface-dependent state. */
    >     diff --git a/sys/dev/ipmi/ipmi_kcs.c b/sys/dev/ipmi/ipmi_kcs.c
    >     index df3b37614eb7..5908ec88e039 100644
    >     --- a/sys/dev/ipmi/ipmi_kcs.c
    >     +++ b/sys/dev/ipmi/ipmi_kcs.c
    >     @@ -32,6 +32,7 @@ __FBSDID("$FreeBSD$");
    >      #include <sys/param.h>
    >      #include <sys/systm.h>
    >      #include <sys/bus.h>
    >     +#include <sys/conf.h>
    >      #include <sys/condvar.h>
    >      #include <sys/eventhandler.h>
    >      #include <sys/kernel.h>
    >     @@ -490,7 +491,21 @@ kcs_startup(struct ipmi_softc *sc)
    >      }
    >=20
    >      static int
    >     -kcs_driver_request(struct ipmi_softc *sc, struct ipmi_request *r=
eq, int timo)
    >     +kcs_driver_request_queue(struct ipmi_softc *sc, struct ipmi_requ=
est *req, int timo)
    >     +{
    >     +	int error;
    >     +
    >     +	IPMI_LOCK(sc);
    >     +	ipmi_polled_enqueue_request_highpri(sc, req);
    >     +	error =3D msleep(req, &sc->ipmi_requests_lock, 0, "ipmireq", timo=
);
    >     +	if (error =3D=3D 0)
    >     +		error =3D req->ir_error;
    >     +	IPMI_UNLOCK(sc);
    >     +	return (error);
    >     +}
    >     +
    >     +static int
    >     +kcs_driver_request_poll(struct ipmi_softc *sc, struct ipmi_reque=
st *req)
    >      {
    >      	int i, ok;
    >=20
    >     @@ -504,6 +519,17 @@ kcs_driver_request(struct ipmi_softc *sc, st=
ruct ipmi_request *req, int timo)
    >      	return (req->ir_error);
    >      }
    >=20
    >     +static int
    >     +kcs_driver_request(struct ipmi_softc *sc, struct ipmi_request *r=
eq, int timo)
    >     +{
    >     +
    >     +	if (KERNEL_PANICKED() || dumping)
    >     +		return (kcs_driver_request_poll(sc, req));
    >     +	else
    >     +		return (kcs_driver_request_queue(sc, req, timo));
    >     +}
    >     +
    >     +
    >      int
    >      ipmi_kcs_attach(struct ipmi_softc *sc)
    >      {
    >     diff --git a/sys/dev/ipmi/ipmivars.h b/sys/dev/ipmi/ipmivars.h
    >     index b0548ee3d7c3..1468e86be73b 100644
    >     --- a/sys/dev/ipmi/ipmivars.h
    >     +++ b/sys/dev/ipmi/ipmivars.h
    >     @@ -111,6 +111,7 @@ struct ipmi_softc {
    >      	uint8_t			ipmi_dev_support;	/* IPMI_ADS_* */
    >      	struct cdev		*ipmi_cdev;
    >      	TAILQ_HEAD(,ipmi_request) ipmi_pending_requests;
    >     +	TAILQ_HEAD(,ipmi_request) ipmi_pending_requests_highpri;
    >      	int			ipmi_driver_requests_polled;
    >      	eventhandler_tag	ipmi_power_cycle_tag;
    >      	eventhandler_tag	ipmi_watchdog_tag;
    >     @@ -237,6 +238,7 @@ void	ipmi_complete_request(struct ipmi_softc =
*, struct ipmi_request *);
    >      struct ipmi_request *ipmi_dequeue_request(struct ipmi_softc *);
    >      void	ipmi_free_request(struct ipmi_request *);
    >      int	ipmi_polled_enqueue_request(struct ipmi_softc *, struct ipmi=
_request *);
    >     +int	ipmi_polled_enqueue_request_highpri(struct ipmi_softc *, str=
uct ipmi_request *);
    >      int	ipmi_submit_driver_request(struct ipmi_softc *, struct ipmi_=
request *,
    >      	    int);
    >=20
    >=20





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?E65B5AB0-0EF7-4BA1-8440-104AA09A695D>