Date: Fri, 21 Mar 2014 18:21:38 -0400 (EDT) From: Rick Macklem <rmacklem@uoguelph.ca> To: Christopher Forgeron <csforgeron@gmail.com> Cc: FreeBSD Net <freebsd-net@freebsd.org>, Jack Vogel <jfvogel@gmail.com>, Markus Gebert <markus.gebert@hostpoint.ch> Subject: Re: 9.2 ixgbe tx queue hang (packets that exceed 65535bytes in length) Message-ID: <994670630.1147191.1395440498646.JavaMail.root@uoguelph.ca> In-Reply-To: <CAB2_NwCKS7a-BUrKxo_WBYCP-_VWDeX_8e0jS1ek3S3L13=VZQ@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Christopher Forgeron wrote: > (Pardon me, for some reason my gmail is sending on my cut-n-pastes if > I cr > down too fast) > > First set of logs: > > Mar 21 11:07:00 SAN0 kernel: before pklen=65542 actl=65542 csum=4116 Ok, so this isn't a TSO segment then, unless I don't understand how the csum flags are used, which is quite possible. Assuming that you printed this out in decimal: 4116->0x1014 Looking in mbuf.h, 0x1014 is CSUM_SCTP_VALID | CSUM_FRAGMENT | CSUM_UDP alternately, if 4116 is hex, then it is: CSUM_TCP_IPV6 | CSUM_IP_CHECKED | CSUM_FRAGMENT | CSUM_UDP either way, it doesn't appear to be a TCP TSO? (But you said that disabling TSO fixed the problem, so colour me confused by this.;-) Sorry, but my rusty networking is confused by this, so maybe someone else can explain it? (I don't think any packet handed to the net interface should exceed 65535. Am I right?) Anyhow, all I can say is that I think these mbuf chains should fail with EFBIG, since they are too big. I have no idea where they come from and I don't know why this would lead to exhaustion of the transmit descriptor entries, which seems to be when things get really wedged. (From what little I can see in the driver sources, these transmit descriptor entries should be released via interrupts, but I've just glanced at it.) Sorry, but I think this will need someone conversant with the networking side to figure out, rick > Mar 21 11:07:00 SAN0 kernel: after mbcnt=33 pklen=65542 actl=65542 > Mar 21 11:07:00 SAN0 kernel: before pklen=65542 actl=65542 csum=4116 > Mar 21 11:07:00 SAN0 kernel: after mbcnt=33 pklen=65542 actl=65542 > Mar 21 11:07:00 SAN0 kernel: before pklen=65542 actl=65542 csum=4116 > Mar 21 11:07:00 SAN0 kernel: after mbcnt=33 pklen=65542 actl=65542 > Mar 21 11:07:00 SAN0 kernel: before pklen=65542 actl=65542 csum=4116 > Mar 21 11:07:00 SAN0 kernel: after mbcnt=33 pklen=65542 actl=65542 > Mar 21 11:07:00 SAN0 kernel: before pklen=65542 actl=65542 csum=4116 > > Here's a few later on. > > Mar 21 11:10:09 SAN0 kernel: before pklen=65538 actl=65538 csum=4116 > Mar 21 11:10:09 SAN0 kernel: after mbcnt=33 pklen=65538 actl=65538 > Mar 21 11:10:09 SAN0 kernel: before pklen=65538 actl=65538 csum=4116 > Mar 21 11:10:09 SAN0 kernel: after mbcnt=33 pklen=65538 actl=65538 > Mar 21 11:10:09 SAN0 kernel: before pklen=65538 actl=65538 csum=4116 > Mar 21 11:10:09 SAN0 kernel: after mbcnt=33 pklen=65538 actl=65538 > Mar 21 11:10:09 SAN0 kernel: before pklen=65538 actl=65538 csum=4116 > Mar 21 11:10:09 SAN0 kernel: after mbcnt=33 pklen=65538 actl=65538 > > Mar 21 11:23:00 SAN0 kernel: after mbcnt=33 pklen=65546 actl=65546 > Mar 21 11:23:01 SAN0 kernel: before pklen=65546 actl=65546 csum=4116 > Mar 21 11:23:01 SAN0 kernel: after mbcnt=33 pklen=65546 actl=65546 > Mar 21 11:23:03 SAN0 kernel: before pklen=65546 actl=65546 csum=4116 > Mar 21 11:23:03 SAN0 kernel: after mbcnt=33 pklen=65546 actl=65546 > Mar 21 11:23:04 SAN0 kernel: before pklen=65546 actl=65546 csum=4116 > Mar 21 11:23:04 SAN0 kernel: after mbcnt=33 pklen=65546 actl=65546 > > Mar 21 11:41:25 SAN0 kernel: before pklen=65538 actl=65538 csum=4116 > Mar 21 11:41:25 SAN0 kernel: after mbcnt=33 pklen=65538 actl=65538 > Mar 21 11:41:25 SAN0 kernel: before pklen=65538 actl=65538 csum=4116 > Mar 21 11:41:25 SAN0 kernel: after mbcnt=33 pklen=65538 actl=65538 > Mar 21 11:41:25 SAN0 kernel: before pklen=65538 actl=65538 csum=4116 > Mar 21 11:41:25 SAN0 kernel: after mbcnt=33 pklen=65538 actl=65538 > Mar 21 11:41:25 SAN0 kernel: before pklen=65538 actl=65538 csum=4116 > Mar 21 11:41:25 SAN0 kernel: after mbcnt=33 pklen=65538 actl=65538 > Mar 21 11:41:26 SAN0 kernel: before pklen=65538 actl=65538 csum=4116 > Mar 21 11:41:26 SAN0 kernel: after mbcnt=33 pklen=65538 actl=65538 > Mar 21 11:41:26 SAN0 kernel: before pklen=65538 actl=65538 csum=4116 > Mar 21 11:41:26 SAN0 kernel: after mbcnt=33 pklen=65538 actl=65538 > > To be clear, I changed tp->t_tsomax to IP_MAXPACKET at ~ 777 in > sys/netinet/tcp_output.c like so: > > if (len > IP_MAXPACKET - hdrlen) { > len = IP_MAXPACKET - hdrlen; > sendalot = 1; > } > > I notice there is more that is different between 9.1 and 10 for this > file: > http://fxr.watson.org/fxr/diff/netinet/tcp_output.c?v=FREEBSD10;diffval=FREEBSD91;diffvar=v > > I'm going to attempt inserting a 9.1 tcp_output.c and see if that > makes any > difference. > > Otherwise, I wait further ideas from the list. > > Thanks. > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to > "freebsd-net-unsubscribe@freebsd.org" >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?994670630.1147191.1395440498646.JavaMail.root>