Date: Wed, 5 Feb 2020 14:38:32 +0300 From: Slawa Olhovchenkov <slw@zxy.spb.ru> To: Navdeep Parhar <np@FreeBSD.org> Cc: freebsd-net@freebsd.org Subject: Re: Chelsio NETMAP performance Message-ID: <20200205113832.GE8012@zxy.spb.ru> In-Reply-To: <3a8dfebd-aa26-84ad-a03a-0271b61a89a3@FreeBSD.org> References: <20200203201728.GC8028@zxy.spb.ru> <863de9e1-42cc-6f3a-5c1f-1bf737714c9f@FreeBSD.org> <20200203222321.GB8012@zxy.spb.ru> <6868f207-d054-3d45-b60d-eaf7115760c1@FreeBSD.org> <20200204162005.GC8012@zxy.spb.ru> <3a8dfebd-aa26-84ad-a03a-0271b61a89a3@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Feb 04, 2020 at 12:37:08PM -0800, Navdeep Parhar wrote: > >> nm_holdoff_tmr_idx is a 0-based index into the list above. So if the > >> tmr idx is 0 you are using the 0th (first) value from the list of > >> timers. Try increasing nm_holdoff_tmr_idx and see if that brings down > >> the interrupt rate under control. > >> > >> # sysctl hw.cxgbe.nm_holdoff_tmr_idx=3/4/5 > > > > OK, interrupt rate go down, but interrupt time about same. > > (interrupt rate for intel card about 0, compared to 25% chelsio). > > I think iflib runs a lot of stuff in taskqueues rather than the driver > ithread so the CPU accounting may vary. Use dtrace to see if Don't think this is impact: worker's CPU core w/o any syscalls and only w/ bunding workker thread and NIC irq handler show about 100% user CPU time. May be some cache-miss work performed later, at poll(2) time in case of intel driver compared to chelsio (do at interrupt time)? > netmap_rx_irq is being called by an ithread or a taskqueue to figure out > what driver does what. Can you explain some more? I am not sure about dtrace probe to use and later evaluation > Are you also transmitting a lot out of this node or is it mostly Rx? > There's no need to worry about Tx updates (and the interrupts they might > generate) if this is an Rx-mostly workload. Traffic depended. This is DDoS protection, in case of SYN-flood Tx about same as Rx. In any case Tx (as I see) is significant cheaper to Rx. x10 at least. But there are nuances in case of simultaneous. > > Most time spent in service_nm_rxq(), in while() check. > > Is this posible to do some prefetch? > > Trivial `__builtin_prefetch(64+(char*)d);` in body of loop don't > > change anything. > > > > Is this posible to do batch prefetch before cycle? > > prefetches are not possible here. That while condition is waiting for > the ownership bit of the rx descriptor to flip, indicating there is > work for the driver to do. No way to do some estimeation? Count packets pending in Rx queue?
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20200205113832.GE8012>