From owner-freebsd-net@FreeBSD.ORG Mon Mar 24 14:56:56 2014 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 6F718BF9; Mon, 24 Mar 2014 14:56:56 +0000 (UTC) Received: from mail-qc0-x22c.google.com (mail-qc0-x22c.google.com [IPv6:2607:f8b0:400d:c01::22c]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 1C0F7EA9; Mon, 24 Mar 2014 14:56:56 +0000 (UTC) Received: by mail-qc0-f172.google.com with SMTP id i8so6019162qcq.17 for ; Mon, 24 Mar 2014 07:56:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=sogmB9J7Efu7vmxTxdGy5sGBfwqNvWaLzRtLEMGCQZM=; b=J6Yh5rh1HAaYn2HMQFKPZiIL2IQxdla8EzEbISLPokgbS+flp8Tb0+V098Cv/5pENd 5vSKIHpbelotegquEThQ4e406Y2TQQ40M991ZE9GdRNS1NJELcM9Q3wOwu4siDE0nU7Y FW2HRAlhDq7p8rsrW9QvnwHfrSdEbeI0U1Of+q9BUxX2ppB4NTd++q4fZjEBrSz58RIw lSxwG9FlwmdCVmlRQBT1zvHZ1ioo/wk9gF1YrBcKfKmE671PB+ShTaN8DIBOq0n/y2+/ hO/+wGoXIdpPooqUS0gE5xDvXtsjbPygUN82h/XsmNHJBIu++ssr1SKLQHRzYwGZe5lS ThLA== MIME-Version: 1.0 X-Received: by 10.140.48.199 with SMTP id o65mr36702147qga.16.1395673015311; Mon, 24 Mar 2014 07:56:55 -0700 (PDT) Received: by 10.96.79.97 with HTTP; Mon, 24 Mar 2014 07:56:55 -0700 (PDT) In-Reply-To: References: <1164414873.1690348.1395622026185.JavaMail.root@uoguelph.ca> Date: Mon, 24 Mar 2014 11:56:55 -0300 Message-ID: Subject: Re: 9.2 ixgbe tx queue hang From: Christopher Forgeron To: Rick Macklem Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.17 Cc: FreeBSD Net , Garrett Wollman , Jack Vogel , Markus Gebert X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 24 Mar 2014 14:56:56 -0000 I'm going to split this into different posts to focus on each topic. This is about setting IP_MAXPACKET to 65495 Update on Last Night's Run: (Last night's run is a kernel with IP_MAXPACKET = 65495) - Uptime on this run: 10:53AM up 13:21, 5 users, load averages: 1.98, 2.09, 2.13 - Ping logger records no ping errors for the entire run. - At Mar 24th 10:57 I did a grep through the night's log for 'before' (which is the printf logging that Rick suggested a few days ago), and saved it to before_total.txt - With wc -l on before_total.txt I can see that we have 504 lines, thus 504 incidents of the packet being above IP_MAXPACKET during this run. - I did tr -c '[:alnum:]' '[\n*]' < before_total.txt | sort | uniq -c | sort -nr | head -50 to list the most common words. Ignoring the non-pklen output. The relevant output is: 344 65498 (3) 330 65506 (11) 330 65502 (7) - First # being the # of times. (Each pklen is printed twice on the log, thus 2x the total line count). - Last (#) being the byte overrun from 65495 - A fairly even distribution of each type of packet overrun. You will recall that my IP_MAXPACKET is 65495, so each of these packet lengths represents a overshoot. The fact that we have only 3 different types of overrun is good - It suggests a non-random event, more like a broken 'if' statement for a particular case. If IP_MAXPACKET was set to 65535 as it normally is, I would have had 504 incidents of errors, with a chance that any one of them could have blocked the queue for considerable time. Question: Should there be logic that discards packets that are over IP_MAXPACKET to ensure that we don't end up in a blocked queue situation again? Moving forward, I am doing two things: 1) I'm running a longer test with TSO disabled on my ix0 adapter. I want to make sure that over say 4 hours I don't have even 1 packet over 65495. This will at least locate the issue to TSO related code. 2) I have tcpdump running, to see if I can capture the packets over 65495. Here is my command. Any suggestions on additional switches I should include? tcpdump -ennvvXS greater 65495 I'll report in on this again once I have new info. Thanks for reading. On Mon, Mar 24, 2014 at 2:14 AM, Christopher Forgeron wrote: > Hi, > > I'll follow up more tomorrow, as it's late and I don't have time for > detail. > > The basic TSO patch didn't work, as packets were were still going over > 65535 by a fair amount. I thought I wrote that earlier, but I am dumping a > lot of info into a few threads, so I apologize if I'm not as concise as I > could be. > > However, setting IP_MAXPACKET did. 4 hours of continuous run-time, no > issues. No lost pings, no issues. Of course this isn't a fix - but it helps > isolate the problem. > > what the story is a few months down the road. > > > > > > Thanks for the patches, will have to start giving them code-names so > > we can keep them straight. :-) I guess we have printf, tsomax, and > > this one. > > > > > >