Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 22 Oct 2010 12:40:52 +0530
From:      Sriram Gorti <gsriram@gmail.com>
To:        Lawrence Stewart <lstewart@freebsd.org>
Cc:        freebsd-net@freebsd.org, Andre Oppermann <andre@freebsd.org>
Subject:   Re: Question on TCP reassembly counter
Message-ID:  <AANLkTinvt4kCQNkf1ueDw0CFaYE9SELsBK8nR2yQKytZ@mail.gmail.com>
In-Reply-To: <4CBB6CE9.1030009@freebsd.org>
References:  <AANLkTikWWmrnBy_DGgSsDbh6NAzWGKCWiFPnCRkwoDRi@mail.gmail.com> <4CA5D1F0.3000307@freebsd.org> <4CA9B6AC.20403@freebsd.org> <4CBB6CE9.1030009@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Hi,

On Mon, Oct 18, 2010 at 3:08 AM, Lawrence Stewart <lstewart@freebsd.org> wr=
ote:
> On 10/04/10 22:12, Lawrence Stewart wrote:
>> On 10/01/10 22:20, Andre Oppermann wrote:
>>> On 01.10.2010 12:01, Sriram Gorti wrote:
>>>> Hi,
>>>>
>>>> In the following is an observation when testing our XLR/XLS network
>>>> driver with 16 concurrent instances of netperf on FreeBSD-CURRENT.
>>>> Based on this observation, I have a question on which I hope to get
>>>> some understanding from here.
>>>>
>>>> When running 16 concurrent netperf instances (each for about 20
>>>> seconds), it was found that after some number of runs performance
>>>> degraded badly (almost by a factor of 5). All subsequent runs remained
>>>> so. Started debugging this from TCP-side as other driver tests were
>>>> doing fine for comparably long durations on same board+s/w.
>>>>
>>>> netstat indicated the following:
>>>>
>>>> $ netstat -s -f inet -p tcp | grep discarded
>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A00 discarded for bad checksums
>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A00 discarded for bad header offset f=
ields
>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A00 discarded because packet too shor=
t
>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A07318 discarded due to memory proble=
ms
>>>>
>>>> Then, traced the "discarded due to memory problems" to the following
>>>> counter:
>>>>
>>>> $ sysctl -a net.inet.tcp.reass
>>>> net.inet.tcp.reass.overflows: 7318
>>>> net.inet.tcp.reass.maxqlen: 48
>>>> net.inet.tcp.reass.cursegments: 1594<--- // corresponds to
>>>> V_tcp_reass_qsize variable
>>>> net.inet.tcp.reass.maxsegments: 1600
>>>>
>>>> Our guess for the need for reassembly (in this low-packet-loss test
>>>> setup) was the lack of per-flow classification in the driver, causing
>>>> it to spew incoming packets across the 16 h/w cpus instead of packets
>>>> of a flow being sent to the same cpu. While we are working on
>>>> addressing this driver limitation, debugged further to see how/why the
>>>> V_tcp_reass_qsize grew (assuming that out-of-order segments should
>>>> have dropped to zero at the end of the run). It was seen that this
>>>> counter was actually growing up from the initial runs but only when it
>>>> reached near to maxsgements, perf degradation was seen. Then, started
>>>> looking at vmstat also to see how many of the reassembly segments were
>>>> lost. But, there were no segments lost. We could not reconcile "no
>>>> lost segments" with "growth of this counter across test runs".
>>>
>>> A patch is in the works to properly autoscale the reassembly queue
>>> and should be comitted shortly.
>>>
>>>> $ sysctl net.inet.tcp.reass ; vmstat -z | egrep "FREE|mbuf|tcpre"
>>>> net.inet.tcp.reass.overflows: 0
>>>> net.inet.tcp.reass.maxqlen: 48
>>>> net.inet.tcp.reass.cursegments: 147
>>>> net.inet.tcp.reass.maxsegments: 1600
>>>> ITEM =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 SIZE =A0LIMIT =A0 =A0 USED =
=A0 =A0 FREE =A0 =A0 =A0REQ FAIL SLEEP
>>>> mbuf_packet: =A0 =A0 =A0 =A0 =A0 =A0256, =A0 =A0 =A00, =A0 =A04096, =
=A0 =A03200, 5653833, =A0 0, =A0 0
>>>> mbuf: =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 256, =A0 =A0 =A00, =A0 =A0 =
=A0 1, =A0 =A02048, 4766910, =A0 0, =A0 0
>>>> mbuf_cluster: =A0 =A0 =A0 =A0 =A02048, =A025600, =A0 =A07296, =A0 =A0 =
=A0 6, =A0 =A07297, =A0 0, =A0 0
>>>> mbuf_jumbo_page: =A0 =A0 =A0 4096, =A012800, =A0 =A0 =A0 0, =A0 =A0 =
=A0 0, =A0 =A0 =A0 0, =A0 0, =A0 0
>>>> mbuf_jumbo_9k: =A0 =A0 =A0 =A0 9216, =A0 6400, =A0 =A0 =A0 0, =A0 =A0 =
=A0 0, =A0 =A0 =A0 0, =A0 0, =A0 0
>>>> mbuf_jumbo_16k: =A0 =A0 =A0 16384, =A0 3200, =A0 =A0 =A0 0, =A0 =A0 =
=A0 0, =A0 =A0 =A0 0, =A0 0, =A0 0
>>>> mbuf_ext_refcnt: =A0 =A0 =A0 =A0 =A04, =A0 =A0 =A00, =A0 =A0 =A0 0, =
=A0 =A0 =A0 0, =A0 =A0 =A0 0, =A0 0, =A0 0
>>>> tcpreass: =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A020, =A0 1690, =A0 =A0 =A0 0, =
=A0 =A0 845, 1757074, =A0 0, =A0 0
>>>>
>>>> In view of these observations, my question is: is it possible for the
>>>> V_tcp_reass_qsize variable to be unsafely updated on SMP ? (The
>>>> particular flavor of XLS that was used in the test had 4 cores with 4
>>>> h/w threads/core). I see that the tcp_reass function assumes some lock
>>>> is taken but not sure if it is the per-socket or the global tcp lock.
>>>
>>> The updating of the global counter is indeed unsafe and becomes obsolet=
e
>>> with the autotuning patch.
>>>
>>> The patch is reviewed by me and ready for commit. =A0However lstewart@ =
is
>>> currently writing his thesis and has only very little spare time. =A0I'=
ll
>>> send you the patch in private email so you can continue your testing.
>>
>> Quick update on this: patch is blocked while waiting for Jeff to review
>> some related UMA changes. As soon as I get the all clear I'll push
>> everything into head.
>
> Revision 213913 of the svn head branch finally has all patches. If you
> encounter any additional odd behaviour related to reassembly or notice
> net.inet.tcp.reass.overflows increasing, please let me know.
>

Thanks for the fix. Tried it on XLR/XLS and the earlier tests pass
now. net.inet.tcp.reass.overflows was always zero after the tests (and
in the samples I took while the tests were running).

One observation though: net.inet.tcp.reass.cursegments was non-zero
(it was just 1) after 30 rounds, where each round is (as earlier)
15-concurrent instances of netperf for 20s. This was on the netserver
side. And, it was zero before the netperf runs. On the other hand,
Andre told me (in a separate mail) that this counter is not relevant
anymore - so, should I just ignore it ?

---
Sriram Gorti

> Cheers,
> Lawrence
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?AANLkTinvt4kCQNkf1ueDw0CFaYE9SELsBK8nR2yQKytZ>