Date: Mon, 24 Mar 2014 11:56:55 -0300 From: Christopher Forgeron <csforgeron@gmail.com> To: Rick Macklem <rmacklem@uoguelph.ca> Cc: FreeBSD Net <freebsd-net@freebsd.org>, Garrett Wollman <wollman@freebsd.org>, Jack Vogel <jfvogel@gmail.com>, Markus Gebert <markus.gebert@hostpoint.ch> Subject: Re: 9.2 ixgbe tx queue hang Message-ID: <CAB2_NwCHM9D1HZSMsuQQ-dYNAt-t2721jKqfO=2h3M4qdumY7w@mail.gmail.com> In-Reply-To: <CAB2_NwAbHzFqa8RM5pwV7Yy5t=96JwzaF%2BSdjJN9kK3uhKKn_w@mail.gmail.com> References: <CAB2_NwAcDPM6YKNLQMC0=YSp%2Bn9nBpXGJQR9ajbgbfcQFoWYPw@mail.gmail.com> <1164414873.1690348.1395622026185.JavaMail.root@uoguelph.ca> <CAB2_NwAbHzFqa8RM5pwV7Yy5t=96JwzaF%2BSdjJN9kK3uhKKn_w@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
I'm going to split this into different posts to focus on each topic. This is about setting IP_MAXPACKET to 65495 Update on Last Night's Run: (Last night's run is a kernel with IP_MAXPACKET = 65495) - Uptime on this run: 10:53AM up 13:21, 5 users, load averages: 1.98, 2.09, 2.13 - Ping logger records no ping errors for the entire run. - At Mar 24th 10:57 I did a grep through the night's log for 'before' (which is the printf logging that Rick suggested a few days ago), and saved it to before_total.txt - With wc -l on before_total.txt I can see that we have 504 lines, thus 504 incidents of the packet being above IP_MAXPACKET during this run. - I did tr -c '[:alnum:]' '[\n*]' < before_total.txt | sort | uniq -c | sort -nr | head -50 to list the most common words. Ignoring the non-pklen output. The relevant output is: 344 65498 (3) 330 65506 (11) 330 65502 (7) - First # being the # of times. (Each pklen is printed twice on the log, thus 2x the total line count). - Last (#) being the byte overrun from 65495 - A fairly even distribution of each type of packet overrun. You will recall that my IP_MAXPACKET is 65495, so each of these packet lengths represents a overshoot. The fact that we have only 3 different types of overrun is good - It suggests a non-random event, more like a broken 'if' statement for a particular case. If IP_MAXPACKET was set to 65535 as it normally is, I would have had 504 incidents of errors, with a chance that any one of them could have blocked the queue for considerable time. Question: Should there be logic that discards packets that are over IP_MAXPACKET to ensure that we don't end up in a blocked queue situation again? Moving forward, I am doing two things: 1) I'm running a longer test with TSO disabled on my ix0 adapter. I want to make sure that over say 4 hours I don't have even 1 packet over 65495. This will at least locate the issue to TSO related code. 2) I have tcpdump running, to see if I can capture the packets over 65495. Here is my command. Any suggestions on additional switches I should include? tcpdump -ennvvXS greater 65495 I'll report in on this again once I have new info. Thanks for reading. On Mon, Mar 24, 2014 at 2:14 AM, Christopher Forgeron <csforgeron@gmail.com>wrote: > Hi, > > I'll follow up more tomorrow, as it's late and I don't have time for > detail. > > The basic TSO patch didn't work, as packets were were still going over > 65535 by a fair amount. I thought I wrote that earlier, but I am dumping a > lot of info into a few threads, so I apologize if I'm not as concise as I > could be. > > However, setting IP_MAXPACKET did. 4 hours of continuous run-time, no > issues. No lost pings, no issues. Of course this isn't a fix - but it helps > isolate the problem. > > what the story is a few months down the road. > > > > > > Thanks for the patches, will have to start giving them code-names so > > we can keep them straight. :-) I guess we have printf, tsomax, and > > this one. > > > > > >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAB2_NwCHM9D1HZSMsuQQ-dYNAt-t2721jKqfO=2h3M4qdumY7w>