Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 05 Apr 2009 19:29:10 +0200
From:      Ivan Voras <ivoras@freebsd.org>
To:        freebsd-net@freebsd.org
Subject:   Re: Advice on a multithreaded netisr  patch?
Message-ID:  <grappq$tsg$1@ger.gmane.org>
In-Reply-To: <alpine.BSF.2.00.0904051440460.12639@fledge.watson.org>
References:  <gra7mq$ei8$1@ger.gmane.org>	<alpine.BSF.2.00.0904051422280.12639@fledge.watson.org>	<grac1s$p56$1@ger.gmane.org> <alpine.BSF.2.00.0904051440460.12639@fledge.watson.org>

next in thread | previous in thread | raw e-mail | index | archive | help
This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--------------enig1FAE96532F0E09824EF6C434
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Robert Watson wrote:
>=20
> On Sun, 5 Apr 2009, Ivan Voras wrote:
>=20
>>>> I thought this has something to deal with NIC moderation (em) but
>>>> can't really explain it. The bad performance part (not the jump) is
>>>> also visible over the loopback interface.
>>>
>>> FYI, if you want high performance, you really want a card supporting
>>> multiple input queues -- igb, cxgb, mxge, etc.  if_em-only cards are
>>> fundamentally less scalable in an SMP environment because they
>>> require input or output to occur only from one CPU at a time.
>>
>> Makes sense, but on the other hand - I see people are routing at least=

>> 250,000 packets per seconds per direction with these cards, so they
>> probably aren't the bottleneck (pro/1000 pt on pci-e).
>=20
> The argument is not that they are slower (although they probably are a
> bit slower), rather that they introduce serialization bottlenecks by
> requiring synchronization between CPUs in order to distribute the work.=
=20
> Certainly some of the scalability issues in the stack are not a result
> of that, but a good number are.

I'd like to understand more. If (in netisr) I have a mbuf with headers,
is this data already transfered from the card or is it magically "not
here yet"?

In the first case, the package reception code path is not changed until
it's queued on a thread, on which it's handled in the future (or is the
influence of "other" data like timers and internal TCP reassembly
buffers so large?). In the second case, why?

> Historically, we've had a number of bottlenecks in, say, the bulk data
> receive and send paths, such as:
>=20
> - Initial receipt and processing of packets on a single CPU as a result=

> of a
>   single input queue from the hardware.  Addressed by using multiple in=
put
>   queue hardware with appropriately configured drivers (generally the
> default
>   is to use multiple input queues in 7.x and 8.x for supporting hardwar=
e).

As the card and the OS can already process many packets per second for
something fairly complex as routing (http://www.tancsa.com/blast.html),
and TCP chokes swi:net at 100% of a core, isn't this indication there's
certainly more space for improvement even with a single-queue
old-fashioned NICs?


--------------enig1FAE96532F0E09824EF6C434
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAknY6mYACgkQldnAQVacBcjOfwCeOKtS8skAua5SW8DwMiFIdozi
TFMAn0LkN2TD0wVJ9tkz9rnP6x3BSRjR
=8O6z
-----END PGP SIGNATURE-----

--------------enig1FAE96532F0E09824EF6C434--




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?grappq$tsg$1>