Date: Thu, 13 Oct 2005 13:25:39 -0700 From: Sean McNeil <sean@mcneil.com> To: John Baldwin <jhb@freebsd.org> Cc: freebsd-amd64@freebsd.org, Mike Tancsa <mike@sentex.net> Subject: Re: realtek performance (was Re: good ATI chipset results) Message-ID: <1129235139.2203.1.camel@server.mcneil.com> In-Reply-To: <200510131617.53621.jhb@freebsd.org> References: <6.2.3.4.0.20051013090818.07a5c9a0@64.7.153.2> <200510131322.45968.jhb@freebsd.org> <434EB98B.6080503@samsco.org> <200510131617.53621.jhb@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 2005-10-13 at 16:17 -0400, John Baldwin wrote: > On Thursday 13 October 2005 03:46 pm, Scott Long wrote: > > John Baldwin wrote: > > > On Thursday 13 October 2005 01:07 pm, Sean McNeil wrote: > > >>On Thu, 2005-10-13 at 11:49 -0400, John Baldwin wrote: > > >>>On Thursday 13 October 2005 11:13 am, Sean McNeil wrote: > > >>>>On Thu, 2005-10-13 at 09:17 -0400, Mike Tancsa wrote: > > >>>>>Havent really seen anyone else use this board, but I have had good > > >>>>>luck with it so far > > >>>>> > > >>>>>http://www.ecs.com.tw/ECSWeb/Products/ProductsDetail.aspx?DetailID=50 > > >>>>>6&Me nuID=90&LanID=0 > > >>>>> > > >>>>>Its a micro ATX formfactor with built in video and the onboard NIC is > > >>>>>a realtek. (Although its not the fastest NIC, its driver is stable > > >>>>>and mature-- especially compared to the headaches people seem to have > > >>>>>with the NVIDIA NICs.) > > >>>> > > >>>>Is this the RealTek 8169S Single-chip Gigabit Ethernet? > > >>>> > > >>>>For those interested, here are some changes I always use to increase > > >>>>the performance of the above NIC. With these mods, I can stream over > > >>>>20 MBps video multicast and do other stuff over the network without > > >>>>issues. Without the changes, xmit is horrible with severe UDP packet > > >>>>loss. > > >>> > > >>>So, I see two changes. One is to up the number of descriptors from 32 > > >>> rx and 64 tx to 64 rx and 64 tx on some models and 1024 rx and 1024 tx > > >>> on other modules. The other thing is that you seem to pessimize TX > > >>> performance by always forcing the send packets to be coalesced into one > > >>> mbuf (which requires doing an alloc and then copying all of the data) > > >>> instead of making use of scatter/gatter for sending packets. Do you > > >>> need both changes or do just the higher descriptor counts make the > > >>> difference? > > >> > > >>Actually, I've found that the higher descriptor counts do not make a > > >>noticeable difference. The only thing that mattered was to eliminate > > >>the scatter/gather of sending packets. I can't remember why I left the > > >>descriptor increase in there. I think it was to get the best use out of > > >>the hardware. > > > > > > Hmm, odd. Scott, do you have any ideas why m_defrag() plus one > > > descriptor would be faster than s/g dma for re(4)? > > > > There are two things that I would consider. First is that > > bus_dmamap_load_mbuf_sg() > > should be use, as that cuts out some indirection (and thus latency) in > > the code. Second > > is that not all DMA engines are created equal, and I honestly wouldn't > > expect a whole lot > > out of Realtek given the price point of this chip. It might be > > optimized only for operating > > on only a single S/G element, for example. Maybe it's really slow at > > pre-fetching s/g > > elements, or maybe it has some sort of a stall after each DMA sement > > transfer while it > > restarts a state machine. I've seen evidence in other hardware that > > only one S/G element > > should be used even though there are slots for 2 (or 3 in the case of 9k > > jumbo frames). One > > thing to keep in mind is the difference in the driver models between > > Windows and BSD > > that Bill Paul talked about the other day. In the Windows world, the > > driver owns the > > network packet memory, whereas in BSD the stack owns it (in the form of > > mbufs). This > > means that the driver can pre-allocate a contiguous slab and populate > > the descriptor rings > > with it without ever having to worry about s/g fragmentation, while in > > BSD fragmentation > > is a fact of life. So it's likely yet another case of hardware being > > optimized for certain > > characteristics of Windows at the expense of other operating systems. > > Ok. Sean, do you think you can trim the patch down to just the m_defrag() > changes and test that to make sure that is all that is needed? Certainly. I can even write a small comment that specifies it is most likely some sort of hardware limitation of DMAing fragmented data :) I'm knee deep in something else at the moment, so it might be a day or two. I'll try to get it to you sooner. Sean
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1129235139.2203.1.camel>