Date: Wed, 30 May 2012 20:33:25 -0700 From: Kevin Oberman <kob6558@gmail.com> To: Andrew Gallatin <gallatin@cs.duke.edu> Cc: freebsd-net@freebsd.org, Andrew Gallatin <gallatin@myri.com> Subject: Re: Major performance hit with ToS setting Message-ID: <CAN6yY1v%2Bvf=SW7WDGHxCkJtOdj8K3f450jNxFWK_Jc%2B-pFg0nA@mail.gmail.com> In-Reply-To: <4FBF88CE.20209@cs.duke.edu> References: <CAN6yY1sLxFJ18ANO7nQqLetnJiT-K6pHC-X3yT1dWuWGa0VLUg@mail.gmail.com> <4FBF88CE.20209@cs.duke.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, May 25, 2012 at 6:27 AM, Andrew Gallatin <gallatin@cs.duke.edu> wro= te: > On 05/24/12 18:55, Kevin Oberman wrote: > >> >> This is,of course, on a 10G interface. On 7.3 there is little > > > Hi Kevin, > > > What you're seeing looks almost like a checksum is bad, or > there is some other packet damage. =A0Do you see any > error counters increasing if you run netstat -s before > and after the test & compare the results? > > Thinking that, perhaps, this was a bug in my mxge(4), I attempted > to reproduce it this morning between =A08.3 and 9.0 boxes and > failed to see the bad behavior.. > > % nuttcp-6.1.2 -c32t -t diablo1-m < /dev/zero > =A09161.7500 MB / =A010.21 sec =3D 7526.5792 Mbps 53 %TX 97 %RX 0 host-re= trans > 0.11 msRTT > % nuttcp-6.1.2 =A0-t diablo1-m < /dev/zero > =A09140.6180 MB / =A010.21 sec =3D 7509.8270 Mbps 53 %TX 97 %RX 0 host-re= trans > 0.11 msRTT > > > However, I don't have any 8.2-r box handy, so I cannot > exactly repro your experiment... Drew and Bjorn, At this point the flying fickle finger of fate (oops, just dated myself) is pointing to a bug in the CUBIC congestion control, which we run. But its really weird in several ways. I built another system from the same source files and it works fine, unlike all of the existing systems. I need to confirm that all systems have identical hardware including the Myricom firmware. I suspect some edge case is biting only in unusual cases. I used SIFTR at the suggestion of Lawrence Stewart who headed the project to bring plugable congestion algorithms to FreeBSD and found really odd congestion behavior. First, I do see a triple ACK, but the congestion window suddenly drops from 73K to 8K. If I understand CUBIC, it should half the congestion window, not what is happening.. It then increases slowly (in slow start) to 82K. while the slow-start bytes are INCREASING, the congestion window again goes to 8K while the SS size moves from 36K up to 52K. It just continues to bound wildly between 8K (always the low point) and between 64k and 82K. The swings start at 83K and, over the first few seconds the peaks drop to about 64K. I am trying to think of any way that anything other then the CC algorithm could do this, but have not to this point. I will try installing Hamilton and see how it works. On the other hand, how could changing the ToS bits trigger this behavior? I have sent all of my data to Lawrence Stewart and I expect to here from him soon, but I'd appreciate it if you can provide any other idea on what could cause this. I might also mention that about 4 years ago when I was testing 10G cards I saw something similar (using tcptrace) when testing between a Myricom card and a Chelsio, but that is a pretty vague daqta point and I no longer have the traces to examine. Again, if you want to look at network performance issues, SIFTR is an awesome tool. --=20 R. Kevin Oberman, Network Engineer E-mail: kob6558@gmail.com
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAN6yY1v%2Bvf=SW7WDGHxCkJtOdj8K3f450jNxFWK_Jc%2B-pFg0nA>