Date: Sat, 22 Mar 2014 17:18:14 -0400 (EDT) From: Rick Macklem <rmacklem@uoguelph.ca> To: Christopher Forgeron <csforgeron@gmail.com> Cc: FreeBSD Net <freebsd-net@freebsd.org>, Garrett Wollman <wollman@freebsd.org>, Jack Vogel <jfvogel@gmail.com>, Markus Gebert <markus.gebert@hostpoint.ch> Subject: Re: 9.2 ixgbe tx queue hang Message-ID: <1055107814.1401328.1395523094565.JavaMail.root@uoguelph.ca> In-Reply-To: <CAB2_NwDRAxmnszh7jKKPfvxBdgaA9Z0CcJ9c1wSNncKb55Td5w@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Christopher Forgeron wrote: > Status Update: Hopeful, but not done. > > So the 9.2-STABLE ixgbe with Rick's TSO patch has been running all > night > while iometer hammered away at it. It's got over 8 hours of test time > on > it. > > It's still running, the CPU queues are not clogged, and everything is > functional. > > However, my ping_logger.py did record 23 incidents of "sendto: File > too > large" over the 8 hour run. > Well, you could try making if_hw_tsomax somewhat smaller. (I can't see how the packet including ethernet header would be more than 64K with the patch, but?? For example, the ether_output() code can call ng_output() and I have no idea if that might grow the data size of the packet?) To be honest, the optimum for NFS would be setting if_hw_tsomax == 56K, since that would avoid the overhead of the m_defrag() calls. However, it is suboptimal for other TCP transfers. One other thing you could do (if you still have them) is scan the logs for the code with my previous printf() patch and see if there is ever a size > 65549 in it. If there is, then if_hw_tsomax needs to be smaller by at least that size - 65549. (65535 + 14 == 65549) If I were you, I'd try setting it to 57344 (56K) instead of "num_segs * MCLBYTES - ETHER_HDR_LEN" ie. replace ifp->if_hw_tsomax = adapter->num_segs * MCLBYTES - ETHER_HDR_LEN; with ifp->if_hw_tsomax = 57344; in the patch. Then see if all the errors go away. (Jack probably won't like making it that small, but it will show if decreasing it a bit will completely fix the problem.) > That's really nothing compared to what I usually run into - Normally > I'd > have 23 incidents within a 5 minute span. > > During those 23 incidents, (ping_logger.py triggers a cpuset ping) I > see > it's having the same symptoms of clogging on a few CPU cores. That > clogging > does go away, a symptom that Markus says he sometimes experiences. > > So I would say the TSO patch makes things remarkably better, but > something > else is still up. Unfortunately, with the TSO patch in place it's now > harder to trigger the error, so testing will be more difficult. > > Could someone confirm for me where the jumbo clusters denied/mbuf > denied > counters come from for netstat? Would it be from a m_defrag call that > fails? > I'm not familiar enough with the mbuf/uma allocators to "confirm" it, but I believe the "denied" refers to cases where m_getjcl() fails to get a jumbo mbuf and returns NULL. If this were to happen in m_defrag(), it would return NULL and the ix driver returns ENOBUFS, so this is not the case for EFBIG errors. I don't know if increasing the limits for the jumbo mbufs via sysctl will help. If you are using the code without Jack's patch, which uses 9K mbufs, then I think it can fragment the address space and result in no 9K contiguous areas to allocate from. (I'm just going by what Garrett and others have said about this.) > I feel the netstat -m stats on boot are part of this issue - I was > able to > greatly reduce them during one of my test iterations. I'm going to > see if I > can repeat that with the TSO patch. > > Getting this working on the 10-STABLE ixgbe: > > Mike's contributed some edits (slightly different thread) I want to > try on > that driver. At the same time, a diff of 9.2 <-> 10.0 may give hints, > as > the 10.0 driver with TSO patch has issues quickly, and frequently... > it's > doing something that aggravates this condition. > > > Thanks for all the help, please keep the suggestions or tidbits of > info > coming. > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to > "freebsd-net-unsubscribe@freebsd.org" >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1055107814.1401328.1395523094565.JavaMail.root>