Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 22 Mar 2014 17:41:23 -0400 (EDT)
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        Christopher Forgeron <csforgeron@gmail.com>
Cc:        FreeBSD Net <freebsd-net@freebsd.org>, Garrett Wollman <wollman@freebsd.org>, Jack Vogel <jfvogel@gmail.com>, Markus Gebert <markus.gebert@hostpoint.ch>
Subject:   Re: 9.2 ixgbe tx queue hang
Message-ID:  <1752303953.1405506.1395524483238.JavaMail.root@uoguelph.ca>
In-Reply-To: <CAB2_NwADUfs%2BbKV9QE_C4B1vchnzGWr1TK4C7wP8Fh8m94_mHA@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Christopher Forgeron wrote:
> 
> 
> 
> 
> 
> 
> Ah yes, I see it now: Line #658
> 
> #if defined(INET) || defined(INET6)
> /* Initialize to max value. */
> if (ifp->if_hw_tsomax == 0)
> ifp->if_hw_tsomax = IP_MAXPACKET;
> KASSERT(ifp->if_hw_tsomax <= IP_MAXPACKET &&
> ifp->if_hw_tsomax >= IP_MAXPACKET / 8,
> ("%s: tsomax outside of range", __func__));
> #endif
> 
> 
> Should this be the location where it's being set rather than in
> ixgbe? I would assume that other drivers could fall prey to this
> issue.
> 
All of this should be prepended with "I'm an NFS guy, not a networking
guy, so I might be wrong".

Other drivers (and ixgbe for the 82598 chip) can handle a packet that
is in more than 32 mbufs. (I think the 82598 handles 100, grep for SCATTER
in *.h in sys/dev/ixgbe.)

Now, since several drivers do have this 32 mbufs limit, I can see an argument
for making the default a little smaller to make these work, since the
driver can override the default. (About now someone usually jumps in and says
something along the lines of "You can't do that until all the drivers that
can handle IP_MAXPACKET are fixed to set if_hw_tsomax" and since I can't fix
drivers I can't test, that pretty much puts a stop on it.)

You see the problem isn't that IP_MAXPACKET is too big, but that the hardware
has a limit of 32 non-contiguous chunks (mbufs)/packet and 32 * MCLBYTES = 64K.
(Hardware/network drivers that can handle 35 or more chunks (they like to call
 them transmit segments, although ixgbe uses the term scatter) shouldn't have
 any problems.)

I have an untested patch that adds a tsomaxseg count to use along with tsomax
bytes so that a driver could inform tcp_output() it can only handle 32 mbufs
and then tcp_output() would limit a TSO segment using both, but I can't test
it, so who knows when/if that might happen.

I also have a patch that modifies NFS to use pagesize clusters (reducing the
mbuf count in the list), but that one causes grief when testing on an i386
(seems to run out of kernel memory to the point where it can't allocate something
 called "boundary tags" and pretty well wedges the machine at that point.)
Since I don't know how to fix this (I thought of making the patch "amd64 only"),
I can't really commit this to head, either.

As such, I think it's going to be "fix the drivers one at a time" and tell
folks to "disable TSO or limit rsize,wsize to 32K" when they run into trouble.
(As you might have guessed, I'd rather just be "the NFS guy", but since NFS
 "triggers the problem" I\m kinda stuck with it;-)

> Also should we not also subtract ETHER_VLAN_ENCAP_LEN from tsomax to
> make sure VLANs fit?
> 
No idea. (I wouldn't know a VLAN if it jumped up and tried to
bite me on the nose.;-) So, I have no idea what does this, but
if it means the total ethernet header size can be > 14bytes, then I'd agree.

> Perhaps there is something in the newer network code that is filling
> up the frames to the point where they are full - thus a TSO =
> IP_MAXPACKET is just now causing problems.
> 
Yea, I have no idea why this didn't bite running 9.1. (Did 9.1 have
TSO enabled by default?)

> I'm back on the 9.2-STABLE ixgbe with the tso patch for now. I'll
> make it run overnight while copying a few TB of data to make sure
> it's stable there before investigating the 10.0-STABLE driver more.
> 
I have no idea what needs to be changed to back-port a 10.0 driver to
9.2.

Good luck with it and thanks for what you've learned sofar, rick

> ..and there is still the case of the denied jumbo clusters on boot -
> something else is off someplace.
> 
> BTW - In all of this, I did not mention that my ix0 uses a MTU of
> 9000 - I assume others assumed this.
> 
> 
> 
> 
> 
> 
> 
> 
> On Fri, Mar 21, 2014 at 11:39 PM, Rick Macklem < rmacklem@uoguelph.ca
> > wrote:
> 
> 
> 
> Christopher Forgeron wrote:
> > It may be a little early, but I think that's it!
> > 
> > It's been running without error for nearly an hour - It's very rare
> > it
> > would go this long under this much load.
> > 
> > I'm going to let it run longer, then abort and install the kernel
> > with the
> > extra printfs so I can see what value ifp->if_hw_tsomax is before
> > you
> > set
> > it.
> > 
> I think you'll just find it set to 0. Code in if_attach_internal()
> { in sys/net/if.c } sets it to IP_MAXPACKET (which is 65535) if it
> is 0. In other words, if the if_attach routine in the driver doesn't
> set it, this code sets it to the maximum possible value.
> 
> Here's the snippet:
> /* Initialize to max value. */
> 657 if (ifp->if_hw_tsomax == 0)
> 658 ifp->if_hw_tsomax = IP_MAXPACKET;
> 
> Anyhow, this sounds like progress.
> 
> As far as NFS is concerned, I'd rather set it to a smaller value
> (maybe 56K) so that m_defrag() doesn't need to be called, but I
> suspect others wouldn't like this.
> 
> Hopefully Jack can decide if this patch is ok?
> 
> Thanks yet again for doing this testing, rick
> ps: I've attached it again, so Jack (and anyone else who reads this)
> can look at it.
> pss: Please report if it keeps working for you.
> 
> 
> 
> > It still had netstat -m denied entries on boot, but they are not
> > climbing
> > like they did before:
> > 
> > 
> > $ uptime
> > 9:32PM up 25 mins, 4 users, load averages: 2.43, 6.15, 4.65
> > $ netstat -m
> > 21556/7034/28590 mbufs in use (current/cache/total)
> > 4080/3076/7156/6127254 mbuf clusters in use
> > (current/cache/total/max)
> > 4080/2281 mbuf+clusters out of packet secondary zone in use
> > (current/cache)
> > 0/53/53/3063627 4k (page size) jumbo clusters in use
> > (current/cache/total/max)
> > 16444/118/16562/907741 9k jumbo clusters in use
> > (current/cache/total/max)
> > 0/0/0/510604 16k jumbo clusters in use (current/cache/total/max)
> > 161545K/9184K/170729K bytes allocated to network
> > (current/cache/total)
> > 17972/2230/4111 requests for mbufs denied
> > (mbufs/clusters/mbuf+clusters)
> > 0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
> > 0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
> > 35/8909/0 requests for jumbo clusters denied (4k/9k/16k)
> > 0 requests for sfbufs denied
> > 0 requests for sfbufs delayed
> > 0 requests for I/O initiated by sendfile
> > 
> > - Started off bad with the 9k denials, but it's not going up!
> > 
> > uptime
> > 10:20PM up 1:13, 6 users, load averages: 2.10, 3.15, 3.67
> > root@SAN0:/usr/home/aatech # netstat -m
> > 21569/7141/28710 mbufs in use (current/cache/total)
> > 4080/3308/7388/6127254 mbuf clusters in use
> > (current/cache/total/max)
> > 4080/2281 mbuf+clusters out of packet secondary zone in use
> > (current/cache)
> > 0/53/53/3063627 4k (page size) jumbo clusters in use
> > (current/cache/total/max)
> > 16447/121/16568/907741 9k jumbo clusters in use
> > (current/cache/total/max)
> > 0/0/0/510604 16k jumbo clusters in use (current/cache/total/max)
> > 161575K/9702K/171277K bytes allocated to network
> > (current/cache/total)
> > 17972/2261/4111 requests for mbufs denied
> > (mbufs/clusters/mbuf+clusters)
> > 0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
> > 0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
> > 35/8913/0 requests for jumbo clusters denied (4k/9k/16k)
> > 0 requests for sfbufs denied
> > 0 requests for sfbufs delayed
> > 0 requests for I/O initiated by sendfile
> > 
> > This is the 9.2 ixgbe that I'm patching into 10.0, I'll move into
> > the
> > base
> > 10.0 code tomorrow.
> > 
> > 
> > On Fri, Mar 21, 2014 at 8:44 PM, Rick Macklem <
> > rmacklem@uoguelph.ca >
> > wrote:
> > 
> > > Christopher Forgeron wrote:
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > Hello all,
> > > > 
> > > > I ran Jack's ixgbe MJUM9BYTES removal patch, and let iometer
> > > > hammer
> > > > away at the NFS store overnight - But the problem is still
> > > > there.
> > > > 
> > > > 
> > > > From what I read, I think the MJUM9BYTES removal is probably
> > > > good
> > > > cleanup (as long as it doesn't trade performance on a lightly
> > > > memory
> > > > loaded system for performance on a heavily memory loaded
> > > > system).
> > > > If
> > > > I can stabilize my system, I may attempt those benchmarks.
> > > > 
> > > > 
> > > > I think the fix will be obvious at boot for me - My 9.2 has a
> > > > 'clean'
> > > > netstat
> > > > - Until I can boot and see a 'netstat -m' that looks similar to
> > > > that,
> > > > I'm going to have this problem.
> > > > 
> > > > 
> > > > Markus: Do your systems show denied mbufs at boot like mine
> > > > does?
> > > > 
> > > > 
> > > > Turning off TSO works for me, but at a performance hit.
> > > > 
> > > > I'll compile Rick's patch (and extra debugging) this morning
> > > > and
> > > > let
> > > > you know soon.
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > On Thu, Mar 20, 2014 at 11:47 PM, Christopher Forgeron <
> > > > csforgeron@gmail.com > wrote:
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > BTW - I think this will end up being a TSO issue, not the patch
> > > > that
> > > > Jack applied.
> > > > 
> > > > When I boot Jack's patch (MJUM9BYTES removal) this is what
> > > > netstat -m
> > > > shows:
> > > > 
> > > > 21489/2886/24375 mbufs in use (current/cache/total)
> > > > 4080/626/4706/6127254 mbuf clusters in use
> > > > (current/cache/total/max)
> > > > 4080/587 mbuf+clusters out of packet secondary zone in use
> > > > (current/cache)
> > > > 16384/50/16434/3063627 4k (page size) jumbo clusters in use
> > > > (current/cache/total/max)
> > > > 0/0/0/907741 9k jumbo clusters in use (current/cache/total/max)
> > > > 
> > > > 0/0/0/510604 16k jumbo clusters in use
> > > > (current/cache/total/max)
> > > > 79068K/2173K/81241K bytes allocated to network
> > > > (current/cache/total)
> > > > 18831/545/4542 requests for mbufs denied
> > > > (mbufs/clusters/mbuf+clusters)
> > > > 
> > > > 0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
> > > > 0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
> > > > 15626/0/0 requests for jumbo clusters denied (4k/9k/16k)
> > > > 
> > > > 0 requests for sfbufs denied
> > > > 0 requests for sfbufs delayed
> > > > 0 requests for I/O initiated by sendfile
> > > > 
> > > > Here is an un-patched boot:
> > > > 
> > > > 21550/7400/28950 mbufs in use (current/cache/total)
> > > > 4080/3760/7840/6127254 mbuf clusters in use
> > > > (current/cache/total/max)
> > > > 4080/2769 mbuf+clusters out of packet secondary zone in use
> > > > (current/cache)
> > > > 0/42/42/3063627 4k (page size) jumbo clusters in use
> > > > (current/cache/total/max)
> > > > 16439/129/16568/907741 9k jumbo clusters in use
> > > > (current/cache/total/max)
> > > > 
> > > > 0/0/0/510604 16k jumbo clusters in use
> > > > (current/cache/total/max)
> > > > 161498K/10699K/172197K bytes allocated to network
> > > > (current/cache/total)
> > > > 18345/155/4099 requests for mbufs denied
> > > > (mbufs/clusters/mbuf+clusters)
> > > > 
> > > > 0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
> > > > 0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
> > > > 3/3723/0 requests for jumbo clusters denied (4k/9k/16k)
> > > > 
> > > > 0 requests for sfbufs denied
> > > > 0 requests for sfbufs delayed
> > > > 0 requests for I/O initiated by sendfile
> > > > 
> > > > 
> > > > 
> > > > See how removing the MJUM9BYTES is just pushing the problem
> > > > from
> > > > the
> > > > 9k jumbo cluster into the 4k jumbo cluster?
> > > > 
> > > > Compare this to my FreeBSD 9.2 STABLE machine from ~ Dec 2013 :
> > > > Exact
> > > > same hardware, revisions, zpool size, etc. Just it's running an
> > > > older FreeBSD.
> > > > 
> > > > # uname -a
> > > > FreeBSD SAN1.XXXXX 9.2-STABLE FreeBSD 9.2-STABLE #0: Wed Dec 25
> > > > 15:12:14 AST 2013 aatech@FreeBSD-Update
> > > > Server:/usr/obj/usr/src/sys/GENERIC amd64
> > > > 
> > > > root@SAN1:/san1 # uptime
> > > > 7:44AM up 58 days, 38 mins, 4 users, load averages: 0.42, 0.80,
> > > > 0.91
> > > > 
> > > > root@SAN1:/san1 # netstat -m
> > > > 37930/15755/53685 mbufs in use (current/cache/total)
> > > > 4080/10996/15076/524288 mbuf clusters in use
> > > > (current/cache/total/max)
> > > > 4080/5775 mbuf+clusters out of packet secondary zone in use
> > > > (current/cache)
> > > > 0/692/692/262144 4k (page size) jumbo clusters in use
> > > > (current/cache/total/max)
> > > > 32773/4257/37030/96000 9k jumbo clusters in use
> > > > (current/cache/total/max)
> > > > 
> > > > 0/0/0/508538 16k jumbo clusters in use
> > > > (current/cache/total/max)
> > > > 312599K/67011K/379611K bytes allocated to network
> > > > (current/cache/total)
> > > > 
> > > > 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
> > > > 0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
> > > > 0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
> > > > 0/0/0 requests for jumbo clusters denied (4k/9k/16k)
> > > > 0/0/0 sfbufs in use (current/peak/max)
> > > > 0 requests for sfbufs denied
> > > > 0 requests for sfbufs delayed
> > > > 0 requests for I/O initiated by sendfile
> > > > 0 calls to protocol drain routines
> > > > 
> > > > Lastly, please note this link:
> > > > 
> > > > http://lists.freebsd.org/pipermail/freebsd-net/2012-October/033660.html
> > > > 
> > > Hmm, this mentioned the ethernet header being in the TSO segment.
> > > I
> > > think
> > > I already mentioned my TCP/IP is rusty and I know diddly about
> > > TSO.
> > > However, at a glance it does appear the driver uses
> > > ether_output()
> > > for
> > > TSO segments and, as such, I think an ethernet header is
> > > prepended
> > > to the
> > > TSO segment. (This makes sense, since how else would the hardware
> > > know
> > > what ethernet header to use for the TCP segments generated.)
> > > 
> > > I think prepending the ethernet header could push the total
> > > length
> > > over 64K, given a default if_hw_tsomax == IP_MAXPACKET. And over
> > > 64K
> > > isn't going to fit in 32 * 2K (mclbytes) clusters, etc and so
> > > forth.
> > > 
> > > Anyhow, I think the attached patch will reduce if_hw_tsomax, so
> > > that
> > > the result should fit in 32 clusters and avoid EFBIG for this
> > > case,
> > > so it might be worth a try?
> > > (I still can't think of why the CSUM_TSO bit isn't set for the
> > > printf()
> > > case, but it seems TSO segments could generate EFBIG errors.)
> > > 
> > > Maybe worth a try, rick
> > > 
> > > > It's so old that I assume the TSO leak that he speaks of has
> > > > been
> > > > patched, but perhaps not. More things to look into tomorrow.
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > 
> > > _______________________________________________
> > > freebsd-net@freebsd.org mailing list
> > > http://lists.freebsd.org/mailman/listinfo/freebsd-net
> > > To unsubscribe, send any mail to
> > > " freebsd-net-unsubscribe@freebsd.org "
> > > 
> > _______________________________________________
> > freebsd-net@freebsd.org mailing list
> > http://lists.freebsd.org/mailman/listinfo/freebsd-net
> > To unsubscribe, send any mail to
> > " freebsd-net-unsubscribe@freebsd.org "
> > 
> 
> 



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1752303953.1405506.1395524483238.JavaMail.root>