Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 15 Mar 2012 00:39:30 -0700
From:      Juli Mallett <jmallett@FreeBSD.org>
To:        freebsd-net@freebsd.org
Subject:   MSI-X + em(4) = Refresh mbufs: hdr dmamap load failure - 22
Message-ID:  <CACVs6=9rTNAjEEdy7sBNEWPtoTdkx7eifZisQF5JTESAorQeJQ@mail.gmail.com>

next in thread | raw e-mail | index | archive | help
All,

On both stable/9 and trunk I see that with one of either the 82571EB
or 82574L I am flooded with messages in the form of:

Refresh mbufs: hdr dmamap load failure - 22

If I disable msix, then the messages go away.  I am not sure why msix
vs. non-msix would matter in this case unless in the msix case there's
some kind of case of spurious interrupts causing em_rxeof to be called
without any packets available.  If that happens then perhaps
e1000_rx_unrefreshed() is called when no buffers have been processed
and then em_refresh_mbufs wrongly refreshes the whole ring?

This seems like it would be a problem because the
bus_dmamap_load_mbuf_sg code is called unconditionally, even when a
new mbuf isn't being allocated.  In that case, the mapping already
exists.  Wouldn't it be necessary to unload and then reload the mbuf?
So either it's a bug that em_refresh_mbufs is being called at all, or
it's naively reusing mbufs in a way that actually guarantees an error,
right?  Also, in the case where it frees, only m_free is called =E2=80=94 i=
s
there never a case where that should be an m_freem?  I can imagine
some, but they are likely impossible with the receive path of the
driver.  (I don't know for sure because the receive path and the mbuf
refresh code keep changing and I've been unable to keep up.)

I don't know which part it is, of course, because I don't know what
port it's coming from.  Like three other printfs in the driver where
which device is being used matters tremendously, it uses a bare printf
and not a device_printf.  I could modify the driver, but for now
disabling msix is easier than continuing to load new kernels to try to
debug the problem.

Is anyone else seeing this?  Has anyone further investigated the
problem?  Is there a patch floating around and I just haven't found
the right search terms?

Thanks in advance,
Juli.

PS: Yes, I know this is kind of a crappy bug report, sorry.  I've had
a limited amount of time to investigate so far, and don't want to
delay reporting it until I am able to get more time with the
problematic hardware.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CACVs6=9rTNAjEEdy7sBNEWPtoTdkx7eifZisQF5JTESAorQeJQ>