Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 23 May 2022 20:40:01 +0000
From:      bugzilla-noreply@freebsd.org
To:        bugs@FreeBSD.org
Subject:   [Bug 264191] debugnet panics with mbuf cache with multiple instances of the same driver
Message-ID:  <bug-264191-227@https.bugs.freebsd.org/bugzilla/>

next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D264191

            Bug ID: 264191
           Summary: debugnet panics with mbuf cache with multiple
                    instances of the same driver
           Product: Base System
           Version: CURRENT
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Only Me
          Priority: ---
         Component: kern
          Assignee: bugs@FreeBSD.org
          Reporter: bdrewery@FreeBSD.org

1. debugnet_mbuf_reinit() is racy.

With netdump we would only populate the mbuf cache when a device was
*configured*. Now we populate the cache when the device comes up and if it
*supports* debugnet. Thus if we have a driver with multiple devices then ea=
ch
device coming up will cause debugnet_mbuf_reinit() to race between multiple
threads while touching the mbufqs. This is easily fixed but leaves more iss=
ues.

Doing this during driver link up makes sense because we may not configure t=
he
device until after panic in ddb with .netdump.=20

2. dn_buf_import() may overflow an mbuf from the queue with trash_init() on
<without INVARIANTS>.

If 1 device has jumbo frames, MTU 9000, and the other normal MTU of 1500, t=
he
hwm/dn_clsize can become MJUM9BYTES (9216).

[This next part may only be a problem for something like mlx4 which has some
cached mbufs of its own. This can be seen in mlx4_en_alloc_buf() where it
appears to always keep 1 extra mbuf around for each ring. It appears it may=
 use
that mbuf at panic time if mlx4_en_alloc_mbuf() fails. The issue I ran into
downstream was a very different allocation scenario but the FreeBSD version
appears to have a similar issue.]

If the device that is used at dump time has an MTU of 1500 it is possible f=
or
the device to return a smaller mbuf to the dn_clustq than expected for that
zone (vs the high water mark of 9216). When it is removed in dn_buf_import(=
) it
has trash_init(9216) ran over it rather than the expected MCLBYTES size.

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-264191-227>