Date: Mon, 18 Apr 2011 23:09:06 +0200 (CEST) From: Ingo Flaschberger <if@xip.at> To: "K. Macy" <kmacy@freebsd.org> Cc: freebsd-net@freebsd.org Subject: Re: Routing enhancement - reduce routing table locking Message-ID: <alpine.LRH.2.00.1104182209120.8693@filebunker.xip.at> In-Reply-To: <BANLkTim6HMGibDB4ucs%2BtEfqv-LBnF4O-w@mail.gmail.com> References: <alpine.LRH.2.00.1104050303140.2152@filebunker.xip.at> <alpine.LRH.2.00.1104061426350.2152@filebunker.xip.at> <alpine.LRH.2.00.1104180051450.8693@filebunker.xip.at> <BANLkTik39HvVire6Hzi9U6J2BwKV7apCCg@mail.gmail.com> <alpine.LRH.2.00.1104181852420.8693@filebunker.xip.at> <BANLkTim0hoHDnrweYz%2Bvc7zOvMubddJmGg@mail.gmail.com> <BANLkTim6HMGibDB4ucs%2BtEfqv-LBnF4O-w@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
> It occurred to me that I should add a couple of qualifications to the > previous statements. 1.6Mpps is line rate for GigE and I only know of > it to be achievable by igb hardware. The most I've seen em hardware > achieve is 1.1Mpps. Furthermore, in order to achieve that you would > have to enable IFNET_MULTIQUEUE in the driver, because by default the > driver uses the traditional (slow) IFQ as opposed overloading > if_transmit and doing its own queueing when needed. Support for > efficient multi-queue software queueing is provided by buf_ring, a > lock-free multi-producer ring buffer written just for this purpose. > > Thus, the fairly low transmit rate may be attributable to driver locking. Currently the quad core hardware is in production, I can only test with the single core 1,2ghz pentiumM. Also no igb cards. em cards, 82541GI, with polling 8.2 i386 with patch rmlock-copy, 400k /32 routes, 64byte packets: fastfw standard flowtable 1 dest: 85607pps 57189pps 57216pps rand. dest: 83945pps 54976pps 55007pps standard routing seems to be as fast as flowtable. flowtable does not support fastforward, rmlock-copy-patch supports fastforward. 8.2 i386 w/o patch, 400k /32 routes, 64byte packets: fastfw standard flowtable 1 dest: 84792pps 55357pps 55515pps rand. dest: 80156pps 52320pps 52300pps so even on a single cpu system less locking improve performance, but as you mentioned above, the bottlenecks of this system are the "desktop" pci network cards. I would really like to see some compareable tests with 10gbe hardware. Kind regards, Ingo Flaschberger
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?alpine.LRH.2.00.1104182209120.8693>