Date: Thu, 13 Oct 2005 13:46:19 -0600 From: Scott Long <scottl@samsco.org> To: John Baldwin <jhb@freebsd.org> Cc: freebsd-amd64@freebsd.org, Mike Tancsa <mike@sentex.net> Subject: Re: realtek performance (was Re: good ATI chipset results) Message-ID: <434EB98B.6080503@samsco.org> In-Reply-To: <200510131322.45968.jhb@freebsd.org> References: <6.2.3.4.0.20051013090818.07a5c9a0@64.7.153.2> <200510131149.24411.jhb@freebsd.org> <1129223254.9093.3.camel@server.mcneil.com> <200510131322.45968.jhb@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
John Baldwin wrote: > On Thursday 13 October 2005 01:07 pm, Sean McNeil wrote: > >>On Thu, 2005-10-13 at 11:49 -0400, John Baldwin wrote: >> >>>On Thursday 13 October 2005 11:13 am, Sean McNeil wrote: >>> >>>>On Thu, 2005-10-13 at 09:17 -0400, Mike Tancsa wrote: >>>> >>>>>Havent really seen anyone else use this board, but I have had good >>>>>luck with it so far >>>>> >>>>>http://www.ecs.com.tw/ECSWeb/Products/ProductsDetail.aspx?DetailID=50 >>>>>6&Me nuID=90&LanID=0 >>>>> >>>>>Its a micro ATX formfactor with built in video and the onboard NIC is >>>>>a realtek. (Although its not the fastest NIC, its driver is stable >>>>>and mature-- especially compared to the headaches people seem to have >>>>>with the NVIDIA NICs.) >>>> >>>>Is this the RealTek 8169S Single-chip Gigabit Ethernet? >>>> >>>>For those interested, here are some changes I always use to increase >>>>the performance of the above NIC. With these mods, I can stream over >>>>20 MBps video multicast and do other stuff over the network without >>>>issues. Without the changes, xmit is horrible with severe UDP packet >>>>loss. >>> >>>So, I see two changes. One is to up the number of descriptors from 32 rx >>>and 64 tx to 64 rx and 64 tx on some models and 1024 rx and 1024 tx on >>>other modules. The other thing is that you seem to pessimize TX >>>performance by always forcing the send packets to be coalesced into one >>>mbuf (which requires doing an alloc and then copying all of the data) >>>instead of making use of scatter/gatter for sending packets. Do you need >>>both changes or do just the higher descriptor counts make the difference? >> >>Actually, I've found that the higher descriptor counts do not make a >>noticeable difference. The only thing that mattered was to eliminate >>the scatter/gather of sending packets. I can't remember why I left the >>descriptor increase in there. I think it was to get the best use out of >>the hardware. > > > Hmm, odd. Scott, do you have any ideas why m_defrag() plus one descriptor > would be faster than s/g dma for re(4)? > There are two things that I would consider. First is that bus_dmamap_load_mbuf_sg() should be use, as that cuts out some indirection (and thus latency) in the code. Second is that not all DMA engines are created equal, and I honestly wouldn't expect a whole lot out of Realtek given the price point of this chip. It might be optimized only for operating on only a single S/G element, for example. Maybe it's really slow at pre-fetching s/g elements, or maybe it has some sort of a stall after each DMA sement transfer while it restarts a state machine. I've seen evidence in other hardware that only one S/G element should be used even though there are slots for 2 (or 3 in the case of 9k jumbo frames). One thing to keep in mind is the difference in the driver models between Windows and BSD that Bill Paul talked about the other day. In the Windows world, the driver owns the network packet memory, whereas in BSD the stack owns it (in the form of mbufs). This means that the driver can pre-allocate a contiguous slab and populate the descriptor rings with it without ever having to worry about s/g fragmentation, while in BSD fragmentation is a fact of life. So it's likely yet another case of hardware being optimized for certain characteristics of Windows at the expense of other operating systems. Scott
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?434EB98B.6080503>