From owner-freebsd-net@freebsd.org Wed Aug 26 05:56:41 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id ECCED9C0A8E; Wed, 26 Aug 2015 05:56:40 +0000 (UTC) (envelope-from danny@cs.huji.ac.il) Received: from kabab.cs.huji.ac.il (kabab.cs.huji.ac.il [132.65.116.210]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 7841ECB4; Wed, 26 Aug 2015 05:56:40 +0000 (UTC) (envelope-from danny@cs.huji.ac.il) Received: from mbpro-w.cs.huji.ac.il ([132.65.80.91]) by kabab.cs.huji.ac.il with esmtp id 1ZUThD-000NRe-W0; Wed, 26 Aug 2015 08:56:28 +0300 Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\)) Subject: Re: ix(intel) vs mlxen(mellanox) 10Gb performance From: Daniel Braniss In-Reply-To: <2112273205.29795512.1440419111720.JavaMail.zimbra@uoguelph.ca> Date: Wed, 26 Aug 2015 08:56:27 +0300 Cc: Hans Petter Selasky , pyunyh@gmail.com, FreeBSD Net , FreeBSD stable , Gleb Smirnoff Message-Id: <1E679659-BA50-42C3-B569-03579E322685@cs.huji.ac.il> References: <1D52028A-B39F-4F9B-BD38-CB1D73BF5D56@cs.huji.ac.il> <1153838447.28656490.1440193567940.JavaMail.zimbra@uoguelph.ca> <15D19823-08F7-4E55-BBD0-CE230F67D26E@cs.huji.ac.il> <818666007.28930310.1440244756872.JavaMail.zimbra@uoguelph.ca> <49173B1F-7B5E-4D59-8651-63D97B0CB5AC@cs.huji.ac.il> <1815942485.29539597.1440370972998.JavaMail.zimbra@uoguelph.ca> <55DAC623.60006@selasky.org> <62C7B1A3-CC6B-41A1-B254-6399F19F8FF7@cs.huji.ac.il> <2112273205.29795512.1440419111720.JavaMail.zimbra@uoguelph.ca> To: Rick Macklem X-Mailer: Apple Mail (2.2104) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 26 Aug 2015 05:56:41 -0000 > On Aug 24, 2015, at 3:25 PM, Rick Macklem = wrote: >=20 > Daniel Braniss wrote: >>=20 >>> On 24 Aug 2015, at 10:22, Hans Petter Selasky = wrote: >>>=20 >>> On 08/24/15 01:02, Rick Macklem wrote: >>>> The other thing is the degradation seems to cut the rate by about = half >>>> each time. >>>> 300-->150-->70 I have no idea if this helps to explain it. >>>=20 >>> Might be a NUMA binding issue for the processes involved. >>>=20 >>> man cpuset >>>=20 >>> --HPS >>=20 >> I can=E2=80=99t see how this is relevant, given that the same host, = using the >> mellanox/mlxen >> behave much better. > Well, the "ix" driver has a bunch of tunables for things like "number = of queues" > and although I'll admit I don't understand how these queues are used, = I think > they are related to CPUs and their caches. There is also something = called IXGBE_FDIR, > which others have recommended be disabled. (The code is #ifdef = IXGBE_FDIR, but I don't > know if it defined for your kernel?) There are also tunables for = interrupt rate and > something called hw.ixgbe_tx_process_limit, which appears to limit the = number of packets > to send or something like that? > (I suspect Hans would understand this stuff much better than I do, = since I don't understand > it at all.;-) >=20 but how does this explain the fact that, at the same time, the throughput to the NetApp is about 70MG/s while to a FreeBSD it=E2=80=99s above 150MB/s? (window size negotiation?) switching off TSO evens out this diff. > At a glance, the mellanox driver looks very different. >=20 >> I=E2=80=99m getting different results with the intel/ix depending who = is the nfs >> server >>=20 > Who knows until you figure out what is actually going on. It could = just be the timing of > handling the write RPCs or when the different servers send acks for = the TCP segments or ... > that causes this for one server and not another. >=20 > One of the principals used when investigating airplane accidents is to = "never assume anything" > and just try to collect the facts until the pieces of the puzzle fall = in place. I think the > same principal works for this kind of stuff. > I once had a case where a specific read of one NFS file would fail on = certain machines. > I won't bore you with the details, but after weeks we got to the point = where we had a lab > of identical machines (exactly the same hardware and exactly the same = software loaded on them) > and we could reproduce this problem on about half the machines and not = the other half. We > (myself and the guy I worked with) finally noticed the failing = machines were on network ports > for a given switch. We moved the net cables to another switch and the = problem went away. > --> This particular network switch was broken in such a way that it = would garble one specific > packet consistently, but worked fine for everything else. > My point here is that, if someone had suggested the "network switch = might be broken" at the > beginning of investigating this, I would have probably dismissed it, = based on "the network is > working just fine", but in the end, that was the problem. > --> I am not suggesting you have a broken network switch, just "don't = take anything off the > table until you know what is actually going on". >=20 > And to be honest, you may never know, but it is fun to try and solve = these puzzles. one needs to find the clues =E2=80=A6 at the moment: when things go bad, they stay bad ix/nfs/tcp/tso and NetApp when things are ok, the numbers fluctuate, which is probably due = to loads on the system, but they are far above the 70MB/s (100 to 200) > Beyond what I already suggested, I'd look at the "ix" driver's stats = and tunables and > see if any of the tunables has an effect. (And, yes, it will take time = to work through these.) >=20 > Good luck with it, rick >=20 >>=20 >> danny >>=20 >> _______________________________________________ >> freebsd-stable@freebsd.org = mailing list >> https://lists.freebsd.org/mailman/listinfo/freebsd-stable = >> To unsubscribe, send any mail to = "freebsd-stable-unsubscribe@freebsd.org = "