From owner-freebsd-net@FreeBSD.ORG Thu Mar 20 10:03:48 2014 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 347B9BCD for ; Thu, 20 Mar 2014 10:03:48 +0000 (UTC) Received: from m13-3.163.com (m13-3.163.com [220.181.13.3]) by mx1.freebsd.org (Postfix) with ESMTP id 8C4E8D64 for ; Thu, 20 Mar 2014 10:03:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=163.com; s=s110527; h=Date:From:Subject:MIME-Version:Message-ID; bh=9XEyg 2wyA83ISDJfSegLBHI7n/0Qu4Ax1o+6mPXNAWU=; b=mHAhPNEmlz/jTTmkMeKTq 0a72pUHhiL/IuuSUMfbDHeSOXYNjsvXCn2oC7RcGWpdI8I0/6Bx1lQpzFXSJ/yA+ cD+hsQzoCDnreA2kZt8vxLjbf9M4XFA2XG6FsfNiydpc4Odu9cBSSVSgnSej2oUN KQk9xcI3SvDcip63k7K55k= Received: from mstian88$163.com ( [113.135.117.59] ) by ajax-webmail-wmsvr3 (Coremail) ; Thu, 20 Mar 2014 18:03:43 +0800 (CST) X-Originating-IP: [113.135.117.59] Date: Thu, 20 Mar 2014 18:03:43 +0800 (CST) From: mstian88 To: net@freebsd.org Subject: some problem about netmap X-Priority: 3 X-Mailer: Coremail Webmail Server Version SP_ntes V3.5 build 20131204(24406.5820.5783) Copyright (c) 2002-2014 www.mailtech.cn 163com X-CM-CTRLDATA: oKFSuWZvb3Rlcl9odG09ODg5Ojgx MIME-Version: 1.0 Message-ID: <59d70d80.2b3cc.144def24445.Coremail.mstian88@163.com> X-CM-TRANSID: A8GowAB3clUAvSpTP4AkAA--.4278W X-CM-SenderInfo: 5pvwxtrqyyqiywtou0bp/1tbiNBRUJVC-A-XTyAAAsT X-Coremail-Antispam: 1U5529EdanIXcx71UUUUU7vcSsGvfC2KfnxnUU== Content-Type: text/plain; charset=GBK Content-Transfer-Encoding: base64 X-Content-Filtered-By: Mailman/MimeDel 2.1.17 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Mar 2014 10:03:48 -0000 Li9wa3QtZ2VuIC1pIGV0aDEgLWYgcnggLVgKdGhlIHByaW50IGluZm8gc2hvd3Mgc2xvdC0+bGVu IGlzIDIwNDg/ICB3aHk/CnRoZSBwYWNrZXRzIHdhcyBzZW5kZWQgd2l0aCB0Y3ByZXBsYXkuCndo ZW4gTXVsdGlwbGUgcGFja2V0cyBsZW4gaXMgMTUxNCwgcGt0LWdlbiByZWNlaXZlIGF2aWFsID4x LCAgdGhlIHNsb3QtPmxlbiBpcyAyMDQ4CgoKbGlrZSB0aGlzOgpbMjIxM106c2xvdGxlblsxNTE0 XSxjdXJpZHhbMTg5XSxidWZpZHhbMjQ3NjddLGF2YWlsWzNdLGl3aGlsZVs0NzVdLGlmb3JbMTZd LGJ1Zl9vZnNbNTYwMzMyOF0scmluZ3NbMzJdICAKWzIyMTRdOnNsb3RsZW5bMjA0OF0sY3VyaWR4 WzE5MF0sYnVmaWR4WzI0NzY4XSxhdmFpbFsyXSxpd2hpbGVbNDc1XSxpZm9yWzE2XSxidWZfb2Zz WzU2MDMzMjhdLHJpbmdzWzMyXSAgClsyMjE1XTpzbG90bGVuWzkyNl0sY3VyaWR4WzE5MV0sYnVm aWR4WzI0NzY5XSxhdmFpbFsxXSxpd2hpbGVbNDc1XSxpZm9yWzE2XSxidWZfb2ZzWzU2MDMzMjhd LHJpbmdzWzMyXSAgClsyMjE2XTpzbG90bGVuWzE1MTRdLGN1cmlkeFsxOTJdLGJ1ZmlkeFsyNDc3 MF0sYXZhaWxbMV0saXdoaWxlWzQ3Nl0saWZvclsxNl0sYnVmX29mc1s1NjAzMzI4XSxyaW5nc1sz Ml0gIA== From owner-freebsd-net@FreeBSD.ORG Thu Mar 20 10:41:00 2014 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 648345D2 for ; Thu, 20 Mar 2014 10:41:00 +0000 (UTC) Received: from mail.adm.hostpoint.ch (mail.adm.hostpoint.ch [IPv6:2a00:d70:0:a::e0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 230A8101 for ; Thu, 20 Mar 2014 10:40:59 +0000 (UTC) Received: from [2001:1620:2013:1:4535:ed23:3991:6e11] (port=62880) by mail.adm.hostpoint.ch with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.80.1 (FreeBSD)) (envelope-from ) id 1WQaPB-000Ee0-7e; Thu, 20 Mar 2014 11:40:57 +0100 Content-Type: text/plain; charset=windows-1252 Mime-Version: 1.0 (Mac OS X Mail 7.2 \(1874\)) Subject: Re: 9.2 ixgbe tx queue hang From: Markus Gebert In-Reply-To: Date: Thu, 20 Mar 2014 11:40:18 +0100 Content-Transfer-Encoding: quoted-printable Message-Id: References: To: Christopher Forgeron X-Mailer: Apple Mail (2.1874) Cc: freebsd-net@freebsd.org, Rick Macklem , Jack Vogel X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Mar 2014 10:41:00 -0000 On 19.03.2014, at 20:17, Christopher Forgeron = wrote: > Hello, >=20 >=20 >=20 > I can report this problem as well on 10.0-RELEASE. >=20 >=20 >=20 > I think it's the same as kern/183390? Possible. We still see this on nfsclients only, but I=92m not convinced = that nfs is the only trigger. > I have two physically identical machines, one running 9.2-STABLE, and = one > on 10.0-RELEASE. >=20 >=20 >=20 > My 10.0 machine used to be running 9.0-STABLE for over a year without = any > problems. >=20 >=20 >=20 > I'm not having the problems with 9.2-STABLE as far as I can tell, but = it > does seem to be a load-based issue more than anything. Since my 9.2 = system > is in production, I'm unable to load it to see if the problem exists = there. > I have a ping_logger.py running on it now to see if it's experiencing > problems briefly or not. I our case, when it happens, the problem persists for quite some time = (minutes or hours) if we don=92t interact (ifconfig or reboot). > I am able to reproduce it fairly reliably within 15 min of a reboot by > loading the server via NFS with iometer and some large NFS file copies = at > the same time. I seem to need to sustain ~2 Gbps for a few minutes. That=92s probably why we can=92t reproduce it reliably here. Although = having 10gig cards in our blade servers, the ones affected are connected = to a 1gig switch. > It will happen with just ix0 (no lagg) or with lagg enabled across ix0 = and > ix1. Same here. > I've been load-testing new FreeBSD-10.0-RELEASE SAN's for production = use > here, so I'm quite willing to put time into this to help find out = where > it's coming from. It took me a day to track down my iometer issues as > being network related, and another day to isolate and write scripts to > reproduce. >=20 >=20 >=20 > The symptom I notice is: >=20 > - A running flood ping (ping -f 172.16.0.31) to the same = hardware > (running 9.2) will come back with "ping: sendto: File too large" when = the > problem occurs >=20 > - Network connectivity is very spotty during these incidents >=20 > - It can run with sporadic ping errors, or it can run a = straight > set of errors for minutes at a time >=20 > - After a long run of ping errors, ESXi will show a = disconnect > from the hosted NFS stores on this machine. >=20 > - I've yet to see it happen right after boot. Fastest is = around 5 > min, normally it's within 15 min. Can you try this when the problem occurs? for CPU in {0..7}; do echo "CPU${CPU}"; cpuset -l ${CPU} ping -i 0.2 -c = 2 -W 1 10.0.0.1 | grep sendto; done It will tie ping to certain cpus to test the different tx queues of your = ix interface. If the pings reliably fail only on some queues, then your = problem is more likely to be the same as ours. Also, if you have dtrace available: kldload dtraceall dtrace -n 'fbt:::return / arg1 =3D=3D EFBIG && execname =3D=3D "ping" / = { stack(); }' while you run pings over the interface affected. This will give you = hints about where the EFBIG error comes from. > [=85] Markus