Date: Mon, 8 Apr 2002 20:36:31 -0400 (EDT) From: Andrew Gallatin <gallatin@cs.duke.edu> To: Terry Lambert <tlambert2@mindspring.com> Cc: freebsd-hackers@freebsd.org Subject: Re: performance of mbufs vs contig buffers? Message-ID: <15538.14223.494295.766977@grasshopper.cs.duke.edu> In-Reply-To: <3CB21FCF.6B018811@mindspring.com> References: <15538.5971.620626.548508@grasshopper.cs.duke.edu> <3CB21FCF.6B018811@mindspring.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Terry Lambert writes: > Andrew Gallatin wrote: > > After updating the firmware on our our 2 gigabit nic to allow enough > > scatter entries per packet to stock the 9K (jumbo frame) receive > > rings with cluster mubfs rather than contigmalloc'ed buffers(*), I > > noticed a dramatic performance decrease: netperf TCP_STREAM > > performance dropped from 1.6Gb/sec to 1.2Gb/sec. > > [ ... ] > > > Is it possible that my problems are being caused by cache misses in > > on cluster mbufs occuring when copying out to userspace as another > > packet is being DMA'ed up? I'd thought that since the cache line size > > is 32 bytes, I'd be pretty much equally screwed either way. > > [ ... ] > > > Does anybody have any ideas why contig malloc'ed buffers are so much > > quicker? > > Instrument m_pullup(), and see how much it's being called in > both cases. Probably you are seeing the 2 byte misalignment > of the TCP payload in the the ethernet packet. The TCP payload is aligned. We stock the rings so that the ethernet header is intentionally misaligned, which makes the IP portion of the packet land aligned. (actually, we encapsulate the ethernet traffic behind another 16-bit header, so everything ends up aligned without the +2/-2 stuff). > My other guess would be that the clusters you are dealing > with are non-contiguous. This has both scatter/gather > implications, and cache-line implications when using them. Please elaborate... What sort of scatter/gather implications? Microbenchmarks don't show much of a difference DMA'ing to non-contigous vs. contigous pages. (over 400MB/sec in all cases). Also, we get close to link speed DMA'ing to user space, and with page coloring, that virtually guarantees that the pages are not physically contigous. Based on the UDP behaviour, I think that its cache implications. The bottleneck seems to be when copyout() reads the recently DMA'ed data. The driver reads the first few dozen bytes (so as to touch up the csum by subracting off the extra bits the DMA engines added in). We do hardware csum offloading, so the entire packet is not read until copyout() is called. > Having thought about this problem before, I think that what > you probably need is to chunk the buffers up, and treat them > as M_EXT type mbufs (e.g. go with contigmalloc). I really, really hate doing this for a variety of reasons. Mainly that the user may not expect the NIC driver is doing this & it may take her a while to realize that adjusting NMBCLUSTERS has no effect. Although... Hmmm.. I could use a small amount of private buffers while I have them & then fall back to contig buffers when I run out. I'd still like to fully understand the problem though; sweeping it under the rug bothers me. > To be able to use "generic" mbufs for this, what's really > needed is the ability to have variable size mbufs. At the > very least, I think a single mbuf should be of a size so > that the MTU fits inside it. Fixing this would be a large > amount of work, and the gain is uncertain. > > You can get a minor idea of the available gain by looking > at the Tigon II firmware changes to use page based buffer > allocations, per Bill Paul & Co.. If you're thinking of what I'm thinking of (the zero copy stuff), I wrote that code. ;) I seem to remember you talking about seeing a 10% speedup from using 4MB pages for cluster mbufs. How did you do that? I'd like to see what affect it has with this workload. Thanks! Drew To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?15538.14223.494295.766977>