Date: Thu, 13 Oct 2005 16:17:51 -0400 From: John Baldwin <jhb@freebsd.org> To: Scott Long <scottl@samsco.org> Cc: freebsd-amd64@freebsd.org, Mike Tancsa <mike@sentex.net> Subject: Re: realtek performance (was Re: good ATI chipset results) Message-ID: <200510131617.53621.jhb@freebsd.org> In-Reply-To: <434EB98B.6080503@samsco.org> References: <6.2.3.4.0.20051013090818.07a5c9a0@64.7.153.2> <200510131322.45968.jhb@freebsd.org> <434EB98B.6080503@samsco.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thursday 13 October 2005 03:46 pm, Scott Long wrote: > John Baldwin wrote: > > On Thursday 13 October 2005 01:07 pm, Sean McNeil wrote: > >>On Thu, 2005-10-13 at 11:49 -0400, John Baldwin wrote: > >>>On Thursday 13 October 2005 11:13 am, Sean McNeil wrote: > >>>>On Thu, 2005-10-13 at 09:17 -0400, Mike Tancsa wrote: > >>>>>Havent really seen anyone else use this board, but I have had good > >>>>>luck with it so far > >>>>> > >>>>>http://www.ecs.com.tw/ECSWeb/Products/ProductsDetail.aspx?DetailID=50 > >>>>>6&Me nuID=90&LanID=0 > >>>>> > >>>>>Its a micro ATX formfactor with built in video and the onboard NIC is > >>>>>a realtek. (Although its not the fastest NIC, its driver is stable > >>>>>and mature-- especially compared to the headaches people seem to have > >>>>>with the NVIDIA NICs.) > >>>> > >>>>Is this the RealTek 8169S Single-chip Gigabit Ethernet? > >>>> > >>>>For those interested, here are some changes I always use to increase > >>>>the performance of the above NIC. With these mods, I can stream over > >>>>20 MBps video multicast and do other stuff over the network without > >>>>issues. Without the changes, xmit is horrible with severe UDP packet > >>>>loss. > >>> > >>>So, I see two changes. One is to up the number of descriptors from 32 > >>> rx and 64 tx to 64 rx and 64 tx on some models and 1024 rx and 1024 tx > >>> on other modules. The other thing is that you seem to pessimize TX > >>> performance by always forcing the send packets to be coalesced into one > >>> mbuf (which requires doing an alloc and then copying all of the data) > >>> instead of making use of scatter/gatter for sending packets. Do you > >>> need both changes or do just the higher descriptor counts make the > >>> difference? > >> > >>Actually, I've found that the higher descriptor counts do not make a > >>noticeable difference. The only thing that mattered was to eliminate > >>the scatter/gather of sending packets. I can't remember why I left the > >>descriptor increase in there. I think it was to get the best use out of > >>the hardware. > > > > Hmm, odd. Scott, do you have any ideas why m_defrag() plus one > > descriptor would be faster than s/g dma for re(4)? > > There are two things that I would consider. First is that > bus_dmamap_load_mbuf_sg() > should be use, as that cuts out some indirection (and thus latency) in > the code. Second > is that not all DMA engines are created equal, and I honestly wouldn't > expect a whole lot > out of Realtek given the price point of this chip. It might be > optimized only for operating > on only a single S/G element, for example. Maybe it's really slow at > pre-fetching s/g > elements, or maybe it has some sort of a stall after each DMA sement > transfer while it > restarts a state machine. I've seen evidence in other hardware that > only one S/G element > should be used even though there are slots for 2 (or 3 in the case of 9k > jumbo frames). One > thing to keep in mind is the difference in the driver models between > Windows and BSD > that Bill Paul talked about the other day. In the Windows world, the > driver owns the > network packet memory, whereas in BSD the stack owns it (in the form of > mbufs). This > means that the driver can pre-allocate a contiguous slab and populate > the descriptor rings > with it without ever having to worry about s/g fragmentation, while in > BSD fragmentation > is a fact of life. So it's likely yet another case of hardware being > optimized for certain > characteristics of Windows at the expense of other operating systems. Ok. Sean, do you think you can trim the patch down to just the m_defrag() changes and test that to make sure that is all that is needed? -- John Baldwin <jhb@FreeBSD.org> <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve" = http://www.FreeBSD.org
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200510131617.53621.jhb>