Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 20 Mar 2014 22:13:22 -0400 (EDT)
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        Christopher Forgeron <csforgeron@gmail.com>
Cc:        freebsd-net@freebsd.org
Subject:   Re: 9.2 ixgbe tx queue hang
Message-ID:  <1543350122.637684.1395368002237.JavaMail.root@uoguelph.ca>
In-Reply-To: <CAB2_NwB=21H5pcx=Wzz5gV38eRN%2BtfwhY28m2FZhdEi6X3JE7g@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Christopher Forgeron wrote:
> 
> Output from the patch you gave me (I have screens of it.. let me know
> what you're hoping to see.
> 
> 
> Mar 20 16:37:22 SAN0 kernel: after mbcnt=33 pklen=65538 actl=65538
> Mar 20 16:37:22 SAN0 kernel: before pklen=65538 actl=65538
Hmm. I think this means that the loop that generates TSO segments in
tcp_output() is broken, since I'm pretty sure that the maximum size
should be is IP_MAXPACKET (65535).

Either that or some non-TCP socket is trying to send a packet that
exceeds IP_MAXPACKET for some reason.

Would it be possible to add a printf() for m->m_pkthdr.csum_flags
to the before case, in the "if" that generates the before printf?
I didn't think to put this in, but CSUM_TSO will be set if it
is a TSO segment, I think? My networking is very rusty.
(If how to add this isn't obvious, just email and I'll update
 the patch.)

Thanks for doing this, rick

> Mar 20 16:37:22 SAN0 kernel: after mbcnt=33 pklen=65538 actl=65538
> Mar 20 16:37:22 SAN0 kernel: before pklen=65538 actl=65538
> Mar 20 16:37:22 SAN0 kernel: after mbcnt=33 pklen=65538 actl=65538
> Mar 20 16:37:22 SAN0 kernel: before pklen=65542 actl=65542
> Mar 20 16:37:22 SAN0 kernel: after mbcnt=33 pklen=65542 actl=65542
> Mar 20 16:37:22 SAN0 kernel: before pklen=65542 actl=65542
> Mar 20 16:37:22 SAN0 kernel: after mbcnt=33 pklen=65542 actl=65542
> Mar 20 16:37:22 SAN0 kernel: before pklen=65542 actl=65542
> Mar 20 16:37:22 SAN0 kernel: after mbcnt=33 pklen=65542 actl=65542
> Mar 20 16:37:22 SAN0 kernel: before pklen=65542 actl=65542
> Mar 20 16:37:22 SAN0 kernel: after mbcnt=33 pklen=65542 actl=65542
> Mar 20 16:37:22 SAN0 kernel: before pklen=65542 actl=65542
> Mar 20 16:37:22 SAN0 kernel: after mbcnt=33 pklen=65542 actl=65542
> Mar 20 16:37:22 SAN0 kernel: before pklen=65538 actl=65538
> Mar 20 16:37:22 SAN0 kernel: after mbcnt=33 pklen=65538 actl=65538
> Mar 20 16:37:22 SAN0 kernel: before pklen=65538 actl=65538
> Mar 20 16:37:22 SAN0 kernel: after mbcnt=33 pklen=65538 actl=65538
> Mar 20 16:37:22 SAN0 kernel: before pklen=65538 actl=65538
> Mar 20 16:37:22 SAN0 kernel: after mbcnt=33 pklen=65538 actl=65538
> Mar 20 16:37:22 SAN0 kernel: before pklen=65538 actl=65538
> Mar 20 16:37:22 SAN0 kernel: after mbcnt=33 pklen=65538 actl=65538
> Mar 20 16:37:22 SAN0 kernel: before pklen=65538 actl=65538
> Mar 20 16:37:22 SAN0 kernel: after mbcnt=33 pklen=65538 actl=65538
> Mar 20 16:37:22 SAN0 kernel: before pklen=65538 actl=65538
> Mar 20 16:37:22 SAN0 kernel: after mbcnt=33 pklen=65538 actl=65538
> Mar 20 16:37:22 SAN0 kernel: before pklen=65538 actl=65538
> Mar 20 16:37:22 SAN0 kernel: after mbcnt=33 pklen=65538 actl=65538
> Mar 20 16:37:23 SAN0 kernel: before pklen=65538 actl=65538
> Mar 20 16:37:23 SAN0 kernel: after mbcnt=33 pklen=65538 actl=65538
> Mar 20 16:37:23 SAN0 kernel: before pklen=65542 actl=65542
> Mar 20 16:37:23 SAN0 kernel: after mbcnt=33 pklen=65542 actl=65542
> Mar 20 16:37:23 SAN0 kernel: before pklen=65542 actl=65542
> Mar 20 16:37:23 SAN0 kernel: after mbcnt=33 pklen=65542 actl=65542
> Mar 20 16:37:23 SAN0 kernel: before pklen=65542 actl=65542
> Mar 20 16:37:23 SAN0 kernel: after mbcnt=33 pklen=65542 actl=65
> 
> 
> 
> 
> 
> On Wed, Mar 19, 2014 at 11:29 PM, Rick Macklem < rmacklem@uoguelph.ca
> > wrote:
> 
> 
> 
> Christopher Forgeron wrote:
> > Hello,
> > 
> > 
> > 
> > I can report this problem as well on 10.0-RELEASE.
> > 
> > 
> > 
> > I think it's the same as kern/183390?
> > 
> > 
> > 
> > I have two physically identical machines, one running 9.2-STABLE,
> > and
> > one
> > on 10.0-RELEASE.
> > 
> > 
> > 
> > My 10.0 machine used to be running 9.0-STABLE for over a year
> > without
> > any
> > problems.
> > 
> > 
> > 
> > I'm not having the problems with 9.2-STABLE as far as I can tell,
> > but
> > it
> > does seem to be a load-based issue more than anything. Since my 9.2
> > system
> > is in production, I'm unable to load it to see if the problem
> > exists
> > there.
> > I have a ping_logger.py running on it now to see if it's
> > experiencing
> > problems briefly or not.
> > 
> > 
> > 
> > I am able to reproduce it fairly reliably within 15 min of a reboot
> > by
> > loading the server via NFS with iometer and some large NFS file
> > copies at
> > the same time. I seem to need to sustain ~2 Gbps for a few minutes.
> > 
> If you can easily do so, testing with the attached patch might shed
> some light on the problem. It just adds a couple of diagnostic checks
> before and after m_defrag() is called when bus_dmamap_load_mbuf_sg()
> returns EFBIG.
> 
> If the "before" printf happens, it would suggest a problem with the
> loop in tcp_output() that creates TSO segments.
> 
> If the "after" printf happens, it would suggest that m_defrag()
> somehow
> doesn't create a list of 32 or fewer mbufs for the TSO segment.
> 
> I don't have any ix hardware, so this patch is completely untested.
> 
> Just something maybe worth trying, rick
> 
> 
> > 
> > 
> > It will happen with just ix0 (no lagg) or with lagg enabled across
> > ix0 and
> > ix1.
> > 
> > 
> > 
> > I've been load-testing new FreeBSD-10.0-RELEASE SAN's for
> > production
> > use
> > here, so I'm quite willing to put time into this to help find out
> > where
> > it's coming from. It took me a day to track down my iometer issues
> > as
> > being network related, and another day to isolate and write scripts
> > to
> > reproduce.
> > 
> > 
> > 
> > The symptom I notice is:
> > 
> > - A running flood ping (ping -f 172.16.0.31) to the same
> 
> 
> > hardware
> > (running 9.2) will come back with "ping: sendto: File too large"
> > when
> > the
> > problem occurs
> > 
> > - Network connectivity is very spotty during these incidents
> > 
> > - It can run with sporadic ping errors, or it can run a
> > straight
> > set of errors for minutes at a time
> > 
> > - After a long run of ping errors, ESXi will show a
> > disconnect
> > from the hosted NFS stores on this machine.
> > 
> > - I've yet to see it happen right after boot. Fastest is
> > around 5
> > min, normally it's within 15 min.
> > 
> > 
> > 
> > System Specs:
> > 
> > 
> > 
> > - Dell PowerEdge M610x Blade
> > 
> > - 2 Xeon 6600 @ 2.40GHz (24 Cores total)
> > 
> > - 96 Gig RAM
> > 
> > - 35.3 TB ZFS Mirrored pool, lz4 compression on my test pool
> > (ZFS
> > pool is the latest)
> > 
> > - Intel 520-DA2 10 Gb dual-port Blade Mezz. Cards
> > 
> > 
> > 
> > Currently this 10.0 testing machine is clean for all sysctl's other
> > than
> > hw.intr_storm_threshold=9900. I have the problem if that's set or
> > not, so I
> > leave it on.
> > 
> > 
> > 
> > ( I used to set manual nmbclusters, etc. as per the Intel
> > Readme.doc,
> > but I
> > notice that the defaults on the new 10.0 system are larger. I did
> > try
> > using
> > all of the old sysctl's from an older 9.0-STABLE, and still had the
> > problem, but it did seem to take longer to occur? I haven't run
> > enough
> > tests to confirm that time observation is true. )
> > 
> > 
> > 
> > What logs / info can I provide to help?
> > 
> > 
> > 
> > I have written a small script called ping_logger.py that pings an
> > IP,
> > and
> > checks to see if there is an error. On error it will execute and
> > log:
> > 
> > - netstat -m
> 
> > 
> > - sysctl hw.ix
> > 
> > - sysctl dev.ix
> > 
> > 
> > 
> > then go back to pinging. It will also log those values on the
> > startup
> > of
> > the script, and every 5 min (so you can see the progression on the
> > system).
> > I can add any number of things to the reporting, so I'm looking for
> > suggestions.
> > 
> > 
> > 
> > This results in some large log files, but I can email a .gz
> > directly
> > to
> > anyone who need them, or perhaps put it up on a website.
> > 
> > 
> > 
> > I will also make the ping_logger.py script available if anyone else
> > wants
> > it.
> > 
> > 
> > 
> > 
> > 
> > LASTLY:
> > 
> > 
> > 
> > The one thing I can see that is different in my 10.0 System and my
> > 9.2 is:
> > 
> > 
> > 
> > 9.2's netstat -m:
> 
> > 
> > 
> > 
> > 37965/16290/54255 mbufs in use (current/cache/total)
> > 
> > 4080/8360/12440/524288 mbuf clusters in use
> > (current/cache/total/max)
> > 
> > 4080/4751 mbuf+clusters out of packet secondary zone in use
> > (current/cache)
> > 
> > 0/452/452/262144 4k (page size) jumbo clusters in use
> > (current/cache/total/max)
> > 
> > 32773/4129/36902/96000 9k jumbo clusters in use
> > (current/cache/total/max)
> > 
> > 0/0/0/508538 16k jumbo clusters in use (current/cache/total/max)
> > 
> > 312608K/59761K/372369K bytes allocated to network
> > (current/cache/total)
> > 
> > 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
> > 
> > 0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
> > 
> > 0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
> > 
> > 0/0/0 requests for jumbo clusters denied (4k/9k/16k)
> > 
> > 0/0/0 sfbufs in use (current/peak/max)
> > 
> > 0 requests for sfbufs denied
> > 
> > 0 requests for sfbufs delayed
> > 
> > 0 requests for I/O initiated by sendfile
> > 
> > 0 calls to protocol drain routines
> > 
> > 
> > 
> > 
> > 
> > 10.0's netstat -m:
> 
> > 
> > 
> > 
> > 21512/24448/45960 mbufs in use (current/cache/total)
> > 
> > 4080/16976/21056/6127254 mbuf clusters in use
> > (current/cache/total/max)
> > 
> > 4080/16384 mbuf+clusters out of packet secondary zone in use
> > (current/cache)
> > 
> > 0/23/23/3063627 4k (page size) jumbo clusters in use
> > (current/cache/total/max)
> > 
> > 16384/158/16542/907741 9k jumbo clusters in use
> > (current/cache/total/max)
> > 
> > 0/0/0/510604 16k jumbo clusters in use (current/cache/total/max)
> > 
> > 160994K/41578K/202572K bytes allocated to network
> > (current/cache/total)
> > 
> > 17488/13290/20464 requests for mbufs denied
> > (mbufs/clusters/mbuf+clusters)
> > 
> > 0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
> > 
> > 0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
> > 
> > 7/16462/0 requests for jumbo clusters denied (4k/9k/16k)
> > 
> > 0 requests for sfbufs denied
> > 
> > 0 requests for sfbufs delayed
> > 
> > 0 requests for I/O initiated by sendfile
> > 
> > 
> > 
> > Way more mbuf clusters in use, but also I never get denied/delayed
> > results
> > in 9.2 - but I have them in 10.0 right away after a reboot.
> > 
> > 
> > 
> > Thanks for any help..
> > _______________________________________________
> > freebsd-net@freebsd.org mailing list
> > http://lists.freebsd.org/mailman/listinfo/freebsd-net
> > To unsubscribe, send any mail to
> > " freebsd-net-unsubscribe@freebsd.org "
> > 
> 
> 



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1543350122.637684.1395368002237.JavaMail.root>