Date: Thu, 13 Oct 2005 14:36:19 -0600 From: Scott Long <scottl@samsco.org> To: sean@mcneil.com Cc: freebsd-amd64@freebsd.org, Mike Tancsa <mike@sentex.net> Subject: Re: realtek performance (was Re: good ATI chipset results) Message-ID: <434EC543.7010903@samsco.org> In-Reply-To: <1129235139.2203.1.camel@server.mcneil.com> References: <6.2.3.4.0.20051013090818.07a5c9a0@64.7.153.2> <200510131322.45968.jhb@freebsd.org> <434EB98B.6080503@samsco.org> <200510131617.53621.jhb@freebsd.org> <1129235139.2203.1.camel@server.mcneil.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Sean McNeil wrote: > On Thu, 2005-10-13 at 16:17 -0400, John Baldwin wrote: > >>On Thursday 13 October 2005 03:46 pm, Scott Long wrote: >> >>>John Baldwin wrote: >>> >>>>On Thursday 13 October 2005 01:07 pm, Sean McNeil wrote: >>>> >>>>>On Thu, 2005-10-13 at 11:49 -0400, John Baldwin wrote: >>>>> >>>>>>On Thursday 13 October 2005 11:13 am, Sean McNeil wrote: >>>>>> >>>>>>>On Thu, 2005-10-13 at 09:17 -0400, Mike Tancsa wrote: >>>>>>> >>>>>>>>Havent really seen anyone else use this board, but I have had good >>>>>>>>luck with it so far >>>>>>>> >>>>>>>>http://www.ecs.com.tw/ECSWeb/Products/ProductsDetail.aspx?DetailID=50 >>>>>>>>6&Me nuID=90&LanID=0 >>>>>>>> >>>>>>>>Its a micro ATX formfactor with built in video and the onboard NIC is >>>>>>>>a realtek. (Although its not the fastest NIC, its driver is stable >>>>>>>>and mature-- especially compared to the headaches people seem to have >>>>>>>>with the NVIDIA NICs.) >>>>>>> >>>>>>>Is this the RealTek 8169S Single-chip Gigabit Ethernet? >>>>>>> >>>>>>>For those interested, here are some changes I always use to increase >>>>>>>the performance of the above NIC. With these mods, I can stream over >>>>>>>20 MBps video multicast and do other stuff over the network without >>>>>>>issues. Without the changes, xmit is horrible with severe UDP packet >>>>>>>loss. >>>>>> >>>>>>So, I see two changes. One is to up the number of descriptors from 32 >>>>>>rx and 64 tx to 64 rx and 64 tx on some models and 1024 rx and 1024 tx >>>>>>on other modules. The other thing is that you seem to pessimize TX >>>>>>performance by always forcing the send packets to be coalesced into one >>>>>>mbuf (which requires doing an alloc and then copying all of the data) >>>>>>instead of making use of scatter/gatter for sending packets. Do you >>>>>>need both changes or do just the higher descriptor counts make the >>>>>>difference? >>>>> >>>>>Actually, I've found that the higher descriptor counts do not make a >>>>>noticeable difference. The only thing that mattered was to eliminate >>>>>the scatter/gather of sending packets. I can't remember why I left the >>>>>descriptor increase in there. I think it was to get the best use out of >>>>>the hardware. >>>> >>>>Hmm, odd. Scott, do you have any ideas why m_defrag() plus one >>>>descriptor would be faster than s/g dma for re(4)? >>> >>>There are two things that I would consider. First is that >>>bus_dmamap_load_mbuf_sg() >>>should be use, as that cuts out some indirection (and thus latency) in >>>the code. Second >>>is that not all DMA engines are created equal, and I honestly wouldn't >>>expect a whole lot >>>out of Realtek given the price point of this chip. It might be >>>optimized only for operating >>>on only a single S/G element, for example. Maybe it's really slow at >>>pre-fetching s/g >>>elements, or maybe it has some sort of a stall after each DMA sement >>>transfer while it >>>restarts a state machine. I've seen evidence in other hardware that >>>only one S/G element >>>should be used even though there are slots for 2 (or 3 in the case of 9k >>>jumbo frames). One >>>thing to keep in mind is the difference in the driver models between >>>Windows and BSD >>>that Bill Paul talked about the other day. In the Windows world, the >>>driver owns the >>>network packet memory, whereas in BSD the stack owns it (in the form of >>>mbufs). This >>>means that the driver can pre-allocate a contiguous slab and populate >>>the descriptor rings >>>with it without ever having to worry about s/g fragmentation, while in >>>BSD fragmentation >>>is a fact of life. So it's likely yet another case of hardware being >>>optimized for certain >>>characteristics of Windows at the expense of other operating systems. >> >>Ok. Sean, do you think you can trim the patch down to just the m_defrag() >>changes and test that to make sure that is all that is needed? > > > Certainly. I can even write a small comment that specifies it is most > likely some sort of hardware limitation of DMAing fragmented data :) > > I'm knee deep in something else at the moment, so it might be a day or > two. I'll try to get it to you sooner. > > Sean > > Do also consider using bus_dmamap_load_mbuf_sg(). It does make a difference on gige drivers. Scott
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?434EC543.7010903>