Date: Thu, 4 Jul 2002 16:59:02 -0400 (EDT) From: Andrew Gallatin <gallatin@cs.duke.edu> To: Bosko Milekic <bmilekic@unixdaemons.com> Cc: "Kenneth D. Merry" <ken@kdm.org>, current@FreeBSD.ORG, net@FreeBSD.ORG Subject: virtually contig jumbo mbufs (was Re: new zero copy sockets snapshot) Message-ID: <15652.46870.463359.853754@grasshopper.cs.duke.edu> In-Reply-To: <20020620134723.A22954@unixdaemons.com> References: <20020618223635.A98350@panzer.kdm.org> <xzpelf3ida1.fsf@flood.ping.uio.no> <20020619090046.A2063@panzer.kdm.org> <20020619120641.A18434@unixdaemons.com> <15633.17238.109126.952673@grasshopper.cs.duke.edu> <20020619233721.A30669@unixdaemons.com> <15633.62357.79381.405511@grasshopper.cs.duke.edu> <20020620114511.A22413@unixdaemons.com> <15634.534.696063.241224@grasshopper.cs.duke.edu> <20020620134723.A22954@unixdaemons.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Bosko Milekic writes: > > One question. I've observed some really anomolous behaviour under > > -stable with my Myricom GM driver (2Gb/s + 2Gb/s link speed, Dual 1GHz > > pIII). When I use 4K mbufs for receives, the best speed I see is > > about 1300Mb/sec. However, if I use private 9K physically contiguous > > buffers I see 1850Mb/sec (iperf TCP). > > > > The obvious conclusion is that there's a lot of overhead in setting up > > the DMA engines, but that's not the case; we have a fairly quick chain > > dma engine. I've provided a "control" by breaking my contiguous > > buffers down into 4K chunks so that I do the same number of DMAs in > > both cases and I still see ~1850 Mb/sec for the 9K buffers. > > > > A coworker suggested that the problem was that when doing copyouts to > > userspace, the PIII was doing speculative reads and loading the cache > > with the next page. However, we then start copying from a totally > > different address using discontigous buffers, so we effectively take > > 2x the number of cache misses we'd need to. Does that sound > > reasonable to you? I need to try malloc'ing virtually contigous and > > physically discontigous buffers & see if I get the same (good) > > performance... > > I believe that the Intel chips do "virtual page caching" and that the > logic that does the virtual -> physical address translation sits between > the L2 cache and RAM. If that is indeed the case, then your idea of > testing with virtually contiguous pages is a good one. > Unfortunately, I don't know if the PIII is doing speculative > cache-loads, but it could very well be the case. If it is and if in > fact the chip does caching based on virtual addresses, then providing it > with virtually contiguous address space may yield better results. If > you try this, please let me know. I'm extremely interested in seeing > the results! contigmalloc'ed private jumbo mbufs (same as bge, if_ti, etc): % iperf -c ugly-my -l 32k -fm ------------------------------------------------------------ Client connecting to ugly-my, TCP port 5001 TCP window size: 0.2 MByte (default) ------------------------------------------------------------ [ 3] local 192.168.1.3 port 1031 connected with 192.168.1.4 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.0 sec 2137 MBytes 1792 Mbits/sec malloc'ed, physically discontigous private jumbo mbufs: % iperf -c ugly-my -l 32k -fm ------------------------------------------------------------ Client connecting to ugly-my, TCP port 5001 TCP window size: 0.2 MByte (default) ------------------------------------------------------------ [ 3] local 192.168.1.3 port 1029 connected with 192.168.1.4 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.0 sec 2131 MBytes 1788 Mbits/sec So I'd be willing to believe that the 4Mb/sec loss was due to the extra overhead of setting up 2 additional DMAs. So it looks like this idea would work. Drew To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-net" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?15652.46870.463359.853754>