Date: Thu, 20 Mar 2014 22:13:22 -0400 (EDT) From: Rick Macklem <rmacklem@uoguelph.ca> To: Christopher Forgeron <csforgeron@gmail.com> Cc: freebsd-net@freebsd.org Subject: Re: 9.2 ixgbe tx queue hang Message-ID: <1543350122.637684.1395368002237.JavaMail.root@uoguelph.ca> In-Reply-To: <CAB2_NwB=21H5pcx=Wzz5gV38eRN%2BtfwhY28m2FZhdEi6X3JE7g@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Christopher Forgeron wrote: > > Output from the patch you gave me (I have screens of it.. let me know > what you're hoping to see. > > > Mar 20 16:37:22 SAN0 kernel: after mbcnt=33 pklen=65538 actl=65538 > Mar 20 16:37:22 SAN0 kernel: before pklen=65538 actl=65538 Hmm. I think this means that the loop that generates TSO segments in tcp_output() is broken, since I'm pretty sure that the maximum size should be is IP_MAXPACKET (65535). Either that or some non-TCP socket is trying to send a packet that exceeds IP_MAXPACKET for some reason. Would it be possible to add a printf() for m->m_pkthdr.csum_flags to the before case, in the "if" that generates the before printf? I didn't think to put this in, but CSUM_TSO will be set if it is a TSO segment, I think? My networking is very rusty. (If how to add this isn't obvious, just email and I'll update the patch.) Thanks for doing this, rick > Mar 20 16:37:22 SAN0 kernel: after mbcnt=33 pklen=65538 actl=65538 > Mar 20 16:37:22 SAN0 kernel: before pklen=65538 actl=65538 > Mar 20 16:37:22 SAN0 kernel: after mbcnt=33 pklen=65538 actl=65538 > Mar 20 16:37:22 SAN0 kernel: before pklen=65542 actl=65542 > Mar 20 16:37:22 SAN0 kernel: after mbcnt=33 pklen=65542 actl=65542 > Mar 20 16:37:22 SAN0 kernel: before pklen=65542 actl=65542 > Mar 20 16:37:22 SAN0 kernel: after mbcnt=33 pklen=65542 actl=65542 > Mar 20 16:37:22 SAN0 kernel: before pklen=65542 actl=65542 > Mar 20 16:37:22 SAN0 kernel: after mbcnt=33 pklen=65542 actl=65542 > Mar 20 16:37:22 SAN0 kernel: before pklen=65542 actl=65542 > Mar 20 16:37:22 SAN0 kernel: after mbcnt=33 pklen=65542 actl=65542 > Mar 20 16:37:22 SAN0 kernel: before pklen=65542 actl=65542 > Mar 20 16:37:22 SAN0 kernel: after mbcnt=33 pklen=65542 actl=65542 > Mar 20 16:37:22 SAN0 kernel: before pklen=65538 actl=65538 > Mar 20 16:37:22 SAN0 kernel: after mbcnt=33 pklen=65538 actl=65538 > Mar 20 16:37:22 SAN0 kernel: before pklen=65538 actl=65538 > Mar 20 16:37:22 SAN0 kernel: after mbcnt=33 pklen=65538 actl=65538 > Mar 20 16:37:22 SAN0 kernel: before pklen=65538 actl=65538 > Mar 20 16:37:22 SAN0 kernel: after mbcnt=33 pklen=65538 actl=65538 > Mar 20 16:37:22 SAN0 kernel: before pklen=65538 actl=65538 > Mar 20 16:37:22 SAN0 kernel: after mbcnt=33 pklen=65538 actl=65538 > Mar 20 16:37:22 SAN0 kernel: before pklen=65538 actl=65538 > Mar 20 16:37:22 SAN0 kernel: after mbcnt=33 pklen=65538 actl=65538 > Mar 20 16:37:22 SAN0 kernel: before pklen=65538 actl=65538 > Mar 20 16:37:22 SAN0 kernel: after mbcnt=33 pklen=65538 actl=65538 > Mar 20 16:37:22 SAN0 kernel: before pklen=65538 actl=65538 > Mar 20 16:37:22 SAN0 kernel: after mbcnt=33 pklen=65538 actl=65538 > Mar 20 16:37:23 SAN0 kernel: before pklen=65538 actl=65538 > Mar 20 16:37:23 SAN0 kernel: after mbcnt=33 pklen=65538 actl=65538 > Mar 20 16:37:23 SAN0 kernel: before pklen=65542 actl=65542 > Mar 20 16:37:23 SAN0 kernel: after mbcnt=33 pklen=65542 actl=65542 > Mar 20 16:37:23 SAN0 kernel: before pklen=65542 actl=65542 > Mar 20 16:37:23 SAN0 kernel: after mbcnt=33 pklen=65542 actl=65542 > Mar 20 16:37:23 SAN0 kernel: before pklen=65542 actl=65542 > Mar 20 16:37:23 SAN0 kernel: after mbcnt=33 pklen=65542 actl=65 > > > > > > On Wed, Mar 19, 2014 at 11:29 PM, Rick Macklem < rmacklem@uoguelph.ca > > wrote: > > > > Christopher Forgeron wrote: > > Hello, > > > > > > > > I can report this problem as well on 10.0-RELEASE. > > > > > > > > I think it's the same as kern/183390? > > > > > > > > I have two physically identical machines, one running 9.2-STABLE, > > and > > one > > on 10.0-RELEASE. > > > > > > > > My 10.0 machine used to be running 9.0-STABLE for over a year > > without > > any > > problems. > > > > > > > > I'm not having the problems with 9.2-STABLE as far as I can tell, > > but > > it > > does seem to be a load-based issue more than anything. Since my 9.2 > > system > > is in production, I'm unable to load it to see if the problem > > exists > > there. > > I have a ping_logger.py running on it now to see if it's > > experiencing > > problems briefly or not. > > > > > > > > I am able to reproduce it fairly reliably within 15 min of a reboot > > by > > loading the server via NFS with iometer and some large NFS file > > copies at > > the same time. I seem to need to sustain ~2 Gbps for a few minutes. > > > If you can easily do so, testing with the attached patch might shed > some light on the problem. It just adds a couple of diagnostic checks > before and after m_defrag() is called when bus_dmamap_load_mbuf_sg() > returns EFBIG. > > If the "before" printf happens, it would suggest a problem with the > loop in tcp_output() that creates TSO segments. > > If the "after" printf happens, it would suggest that m_defrag() > somehow > doesn't create a list of 32 or fewer mbufs for the TSO segment. > > I don't have any ix hardware, so this patch is completely untested. > > Just something maybe worth trying, rick > > > > > > > > It will happen with just ix0 (no lagg) or with lagg enabled across > > ix0 and > > ix1. > > > > > > > > I've been load-testing new FreeBSD-10.0-RELEASE SAN's for > > production > > use > > here, so I'm quite willing to put time into this to help find out > > where > > it's coming from. It took me a day to track down my iometer issues > > as > > being network related, and another day to isolate and write scripts > > to > > reproduce. > > > > > > > > The symptom I notice is: > > > > - A running flood ping (ping -f 172.16.0.31) to the same > > > > hardware > > (running 9.2) will come back with "ping: sendto: File too large" > > when > > the > > problem occurs > > > > - Network connectivity is very spotty during these incidents > > > > - It can run with sporadic ping errors, or it can run a > > straight > > set of errors for minutes at a time > > > > - After a long run of ping errors, ESXi will show a > > disconnect > > from the hosted NFS stores on this machine. > > > > - I've yet to see it happen right after boot. Fastest is > > around 5 > > min, normally it's within 15 min. > > > > > > > > System Specs: > > > > > > > > - Dell PowerEdge M610x Blade > > > > - 2 Xeon 6600 @ 2.40GHz (24 Cores total) > > > > - 96 Gig RAM > > > > - 35.3 TB ZFS Mirrored pool, lz4 compression on my test pool > > (ZFS > > pool is the latest) > > > > - Intel 520-DA2 10 Gb dual-port Blade Mezz. Cards > > > > > > > > Currently this 10.0 testing machine is clean for all sysctl's other > > than > > hw.intr_storm_threshold=9900. I have the problem if that's set or > > not, so I > > leave it on. > > > > > > > > ( I used to set manual nmbclusters, etc. as per the Intel > > Readme.doc, > > but I > > notice that the defaults on the new 10.0 system are larger. I did > > try > > using > > all of the old sysctl's from an older 9.0-STABLE, and still had the > > problem, but it did seem to take longer to occur? I haven't run > > enough > > tests to confirm that time observation is true. ) > > > > > > > > What logs / info can I provide to help? > > > > > > > > I have written a small script called ping_logger.py that pings an > > IP, > > and > > checks to see if there is an error. On error it will execute and > > log: > > > > - netstat -m > > > > > - sysctl hw.ix > > > > - sysctl dev.ix > > > > > > > > then go back to pinging. It will also log those values on the > > startup > > of > > the script, and every 5 min (so you can see the progression on the > > system). > > I can add any number of things to the reporting, so I'm looking for > > suggestions. > > > > > > > > This results in some large log files, but I can email a .gz > > directly > > to > > anyone who need them, or perhaps put it up on a website. > > > > > > > > I will also make the ping_logger.py script available if anyone else > > wants > > it. > > > > > > > > > > > > LASTLY: > > > > > > > > The one thing I can see that is different in my 10.0 System and my > > 9.2 is: > > > > > > > > 9.2's netstat -m: > > > > > > > > > 37965/16290/54255 mbufs in use (current/cache/total) > > > > 4080/8360/12440/524288 mbuf clusters in use > > (current/cache/total/max) > > > > 4080/4751 mbuf+clusters out of packet secondary zone in use > > (current/cache) > > > > 0/452/452/262144 4k (page size) jumbo clusters in use > > (current/cache/total/max) > > > > 32773/4129/36902/96000 9k jumbo clusters in use > > (current/cache/total/max) > > > > 0/0/0/508538 16k jumbo clusters in use (current/cache/total/max) > > > > 312608K/59761K/372369K bytes allocated to network > > (current/cache/total) > > > > 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) > > > > 0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters) > > > > 0/0/0 requests for jumbo clusters delayed (4k/9k/16k) > > > > 0/0/0 requests for jumbo clusters denied (4k/9k/16k) > > > > 0/0/0 sfbufs in use (current/peak/max) > > > > 0 requests for sfbufs denied > > > > 0 requests for sfbufs delayed > > > > 0 requests for I/O initiated by sendfile > > > > 0 calls to protocol drain routines > > > > > > > > > > > > 10.0's netstat -m: > > > > > > > > > 21512/24448/45960 mbufs in use (current/cache/total) > > > > 4080/16976/21056/6127254 mbuf clusters in use > > (current/cache/total/max) > > > > 4080/16384 mbuf+clusters out of packet secondary zone in use > > (current/cache) > > > > 0/23/23/3063627 4k (page size) jumbo clusters in use > > (current/cache/total/max) > > > > 16384/158/16542/907741 9k jumbo clusters in use > > (current/cache/total/max) > > > > 0/0/0/510604 16k jumbo clusters in use (current/cache/total/max) > > > > 160994K/41578K/202572K bytes allocated to network > > (current/cache/total) > > > > 17488/13290/20464 requests for mbufs denied > > (mbufs/clusters/mbuf+clusters) > > > > 0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters) > > > > 0/0/0 requests for jumbo clusters delayed (4k/9k/16k) > > > > 7/16462/0 requests for jumbo clusters denied (4k/9k/16k) > > > > 0 requests for sfbufs denied > > > > 0 requests for sfbufs delayed > > > > 0 requests for I/O initiated by sendfile > > > > > > > > Way more mbuf clusters in use, but also I never get denied/delayed > > results > > in 9.2 - but I have them in 10.0 right away after a reboot. > > > > > > > > Thanks for any help.. > > _______________________________________________ > > freebsd-net@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-net > > To unsubscribe, send any mail to > > " freebsd-net-unsubscribe@freebsd.org " > > > >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1543350122.637684.1395368002237.JavaMail.root>