Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 4 Feb 2020 12:37:08 -0800
From:      Navdeep Parhar <np@FreeBSD.org>
To:        Slawa Olhovchenkov <slw@zxy.spb.ru>
Cc:        freebsd-net@freebsd.org
Subject:   Re: Chelsio NETMAP performance
Message-ID:  <3a8dfebd-aa26-84ad-a03a-0271b61a89a3@FreeBSD.org>
In-Reply-To: <20200204162005.GC8012@zxy.spb.ru>
References:  <20200203201728.GC8028@zxy.spb.ru> <863de9e1-42cc-6f3a-5c1f-1bf737714c9f@FreeBSD.org> <20200203222321.GB8012@zxy.spb.ru> <6868f207-d054-3d45-b60d-eaf7115760c1@FreeBSD.org> <20200204162005.GC8012@zxy.spb.ru>

next in thread | previous in thread | raw e-mail | index | archive | help
On 2/4/20 8:20 AM, Slawa Olhovchenkov wrote:
> On Mon, Feb 03, 2020 at 02:39:03PM -0800, Navdeep Parhar wrote:
>=20
>> On 2/3/20 2:23 PM, Slawa Olhovchenkov wrote:
>>> On Mon, Feb 03, 2020 at 01:39:52PM -0800, Navdeep Parhar wrote:
>>>
>>>> On 2/3/20 12:17 PM, Slawa Olhovchenkov wrote:
>>>>> I am try to use Chelsio T540-CR in netmap mode and see poor (compar=
ed
>>>>> to Intel 82599ES) performance.
>>>>
>>>> What approximate FreeBSD version is this?
>>>
>>> 12.1-STABLE
>>>
>>>>>
>>>>> Same application ac receive only about 8.9Mpss, compared to 12.5Mpp=
s
>>>>> at Intel.
>>>>>
>>>>> pmc profile show mostly time spend in:
>>>>>
>>>>> 49.76%  [17802]    service_nm_rxq @ /boot/kernel/if_cxgbe.ko
>>>>>  100.0%  [17802]     t4_vi_intr
>>>>>   100.0%  [17802]      ithread_loop @ /boot/kernel/kernel
>>>>>    100.0%  [17802]       fork_exit
>>>>>
>>>>>
>>>>> to be exact at line
>>>>>
>>>>>         while ((d->rsp.u.type_gen & F_RSPD_GEN) =3D=3D nm_rxq->iq_g=
en) {
>>>>>
>>>>> Is this maximum limit for this vendor?
>>>>
>>>> No, a T540 should be able to sink full 10Gbps (14.88Mpps) on a singl=
e rx
>>>> queue.  Try adding this to your loader.conf:
>>>>
>>>> hw.cxgbe.toecaps_allowed=3D"0"
>>>>
>>>> Then try simple netmap "pkt-gen -f rx" instead of any custom app and=
 see
>>>> how many pps it's able to sink.
>>>
>>> Thanks! `hw.cxgbe.toecaps_allowed=3D"0"` allow recive 14Mpps for may
>>> application too!
>>>
>>> Now I am got only 10% less performance compared to Intel, as I see by=

>>> higher Chelsio interrupt cpu time (top show about 30% for every
>>> interrupt handler). Is this normal? Is this posible to optimize?
>>
>> Try changing the interrupt holdoff timer for the netmap rx queues.
>>
>> This shows the list of timers available (in microseconds):
>> # sysctl dev.t5nex.0.holdoff_timers
>>
>> nm_holdoff_tmr_idx is a 0-based index into the list above.  So if the
>> tmr idx is 0 you are using the 0th (first) value from the list of
>> timers.  Try increasing nm_holdoff_tmr_idx and see if that brings down=

>> the interrupt rate under control.
>>
>> # sysctl hw.cxgbe.nm_holdoff_tmr_idx=3D3/4/5
>=20
> OK, interrupt rate go down, but interrupt time about same.
> (interrupt rate for intel card about 0, compared to 25% chelsio).

I think iflib runs a lot of stuff in taskqueues rather than the driver
ithread so the CPU accounting may vary.  Use dtrace to see if
netmap_rx_irq is being called by an ithread or a taskqueue to figure out
what driver does what.

Are you also transmitting a lot out of this node or is it mostly Rx?
There's no need to worry about Tx updates (and the interrupts they might
generate) if this is an Rx-mostly workload.

> Most time spent in service_nm_rxq(), in while() check.
> Is this posible to do some prefetch?
> Trivial `__builtin_prefetch(64+(char*)d);` in body of loop don't
> change anything.
>=20
> Is this posible to do batch prefetch before cycle?

prefetches are not possible here.  That while condition is waiting for
the ownership bit of the rx descriptor to  flip, indicating there is
work for the driver to do.

Regards,
Navdeep





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3a8dfebd-aa26-84ad-a03a-0271b61a89a3>