Date: Sun, 05 Apr 2009 19:29:10 +0200 From: Ivan Voras <ivoras@freebsd.org> To: freebsd-net@freebsd.org Subject: Re: Advice on a multithreaded netisr patch? Message-ID: <grappq$tsg$1@ger.gmane.org> In-Reply-To: <alpine.BSF.2.00.0904051440460.12639@fledge.watson.org> References: <gra7mq$ei8$1@ger.gmane.org> <alpine.BSF.2.00.0904051422280.12639@fledge.watson.org> <grac1s$p56$1@ger.gmane.org> <alpine.BSF.2.00.0904051440460.12639@fledge.watson.org>
next in thread | previous in thread | raw e-mail | index | archive | help
This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enig1FAE96532F0E09824EF6C434 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Robert Watson wrote: >=20 > On Sun, 5 Apr 2009, Ivan Voras wrote: >=20 >>>> I thought this has something to deal with NIC moderation (em) but >>>> can't really explain it. The bad performance part (not the jump) is >>>> also visible over the loopback interface. >>> >>> FYI, if you want high performance, you really want a card supporting >>> multiple input queues -- igb, cxgb, mxge, etc. if_em-only cards are >>> fundamentally less scalable in an SMP environment because they >>> require input or output to occur only from one CPU at a time. >> >> Makes sense, but on the other hand - I see people are routing at least= >> 250,000 packets per seconds per direction with these cards, so they >> probably aren't the bottleneck (pro/1000 pt on pci-e). >=20 > The argument is not that they are slower (although they probably are a > bit slower), rather that they introduce serialization bottlenecks by > requiring synchronization between CPUs in order to distribute the work.= =20 > Certainly some of the scalability issues in the stack are not a result > of that, but a good number are. I'd like to understand more. If (in netisr) I have a mbuf with headers, is this data already transfered from the card or is it magically "not here yet"? In the first case, the package reception code path is not changed until it's queued on a thread, on which it's handled in the future (or is the influence of "other" data like timers and internal TCP reassembly buffers so large?). In the second case, why? > Historically, we've had a number of bottlenecks in, say, the bulk data > receive and send paths, such as: >=20 > - Initial receipt and processing of packets on a single CPU as a result= > of a > single input queue from the hardware. Addressed by using multiple in= put > queue hardware with appropriately configured drivers (generally the > default > is to use multiple input queues in 7.x and 8.x for supporting hardwar= e). As the card and the OS can already process many packets per second for something fairly complex as routing (http://www.tancsa.com/blast.html), and TCP chokes swi:net at 100% of a core, isn't this indication there's certainly more space for improvement even with a single-queue old-fashioned NICs? --------------enig1FAE96532F0E09824EF6C434 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAknY6mYACgkQldnAQVacBcjOfwCeOKtS8skAua5SW8DwMiFIdozi TFMAn0LkN2TD0wVJ9tkz9rnP6x3BSRjR =8O6z -----END PGP SIGNATURE----- --------------enig1FAE96532F0E09824EF6C434--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?grappq$tsg$1>