Date: Mon, 07 Jul 2008 15:37:28 +0200 From: Andre Oppermann <andre@freebsd.org> To: Bruce Evans <brde@optusnet.com.au> Cc: FreeBSD Net <freebsd-net@FreeBSD.org>, Ingo Flaschberger <if@xip.at>, Paul <paul@gtcomm.net> Subject: Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] Message-ID: <48721C18.4060109@freebsd.org> In-Reply-To: <20080707213356.G7572@besplex.bde.org> References: <4867420D.7090406@gtcomm.net> <ea7b9c170806302050p2a3a5480t29923a4ac2d7c852@mail.gmail.com> <4869ACFC.5020205@gtcomm.net> <4869B025.9080006@gtcomm.net> <486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net> <486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net> <alpine.LFD.1.10.0807021052041.557@filebunker.xip.at> <486B4F11.6040906@gtcomm.net> <alpine.LFD.1.10.0807021155280.557@filebunker.xip.at> <486BC7F5.5070604@gtcomm.net> <20080703160540.W6369@delplex.bde.org> <486C7F93.7010308@gtcomm.net> <20080703195521.O6973@delplex.bde.org> <486D35A0.4000302@gtcomm.net> <alpine.LFD.1.10.0807041106591.19613@filebunker.xip.at> <486DF1A3.9000409@gtcomm.net> <alpine.LFD.1.10.0807041303490.20760@filebunker.xip.at> <486E65E6.3060301@gtcomm.net> <alpine.LFD.1.10.0807052356130.2145@filebunker.xip.at> <4871DB8E.5070903@freebsd.org> <20080707191918.B4703@besplex.bde.org> <4871FB66.1060406@freebsd.org> <20080707213356.G7572@besplex.bde.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Bruce Evans wrote: > On Mon, 7 Jul 2008, Andre Oppermann wrote: > >> Bruce Evans wrote: >>> What are the other overheads? I calculate 1.644Mpps counting the >>> inter-frame >>> gap, with 64-byte packets and 64-header_size payloads. If the 64 bytes >>> is for the payload, then the max is much lower. >> >> The theoretical maximum at 64byte frames is 1,488,100. I've looked >> up my notes the 1.244Mpps number can be ajusted to 1.488Mpps. > > Where is the extra? I still get 1.644736 Mpps (10^9/(8*64+96)). > 1.488095 is for 64 bits extra (10^9/(8*64+96+64)). The preamble has 64 bits and is in addition to the inter-frame gap. >>>>> I hoped to reach 1Mpps with the hardware I mentioned some mails >>>>> before, but 2Mpps is far far away. >>>>> Currently I get 160kpps via pci-32mbit-33mhz-1,2ghz mobile pentium. >>>> >>>> This is more or less expected. PCI32 is not able to sustain high >>>> packet rates. The bus setup times kill the speed. For larger packets >>>> the ratio gets much better and some reasonable throughput can be >>>> achieved. >>> >>> I get about 640 kpps without forwarding (sendto: slightly faster; >>> recvfrom: slightly slower) on a 2.2GHz A64. Underclocking the memory >>> from 200MHz to 100MHz only reduces the speed by about 10%, while not >>> overclocking the CPU by 10% reduces the speed by the same 10%, so the >>> system is apparently still mainly CPU-bound. >> >> On PCI32@33MHz? He's using a 1.2GHz Mobile Pentium on top of that. > > Yes. My example shows that FreeBSD is more CPU-bound than I/O bound up > to CPUs considerably faster than a 1.2GHz Pentium (though PentiumM is > fast relative to its clock speed). The memory interface may matter more > than the CPU clock. > >>>> NetFPGA doesn't have enough TCAM space to be useful for real routing >>>> (as in Internet sized routing table). The trick many embedded >>>> networking >>>> CPUs use is cache prefetching that is integrated with the network >>>> controller. The first 64-128bytes of every packet are transferred >>>> automatically into the L2 cache by the hardware. This allows >>>> relatively >>>> slow CPUs (700 MHz Broadcom BCM1250 in Cisco NPE-G1 or 1.67-GHz >>>> Freescale >>>> 7448 in NPE-G2) to get more than 1Mpps. Until something like this is >>>> possible on Intel or AMD x86 CPUs we have a ceiling limited by RAM >>>> speed. >>> >>> Does using fa$ter memory (speed and/or latency) help here? 64 bytes >>> is so small that latency may be more of a problem, especially without >>> a prefetch. >> >> Latency. For IPv4 packet forwarding only one cache line per packet >> is fetched. More memory speed only helps with the DMA from/to the >> network card. > > I use low-end memory, but on the machine that does 640 kpps it somehow > has latency almost 4 times as low as on new FreeBSD cluster machines > (~42 nsec instead of ~150). perfmon (fixed for AXP and A64) and hwpmc > report an average of 11 k8-dc-misses per sendto() while sending via > bge at 640 kpps. 11 * 42 accounts for 442 nsec out of the 1562 per > packet at this rate. 11 * 150 = 1650 would probably make this rate > unachievable despite the system having 20 times as much CPU and bus. We were talking routing here. That is a packet received via network interface and sent out on another. Crosses the PCI bus twice. -- Andre
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?48721C18.4060109>