Date: Mon, 1 May 2006 11:38:39 +1000 (EST) From: lukem.freebsd@cse.unsw.edu.au To: Robert Watson <rwatson@FreeBSD.org> Cc: Marcos Bedinelli <bedinelli@madhaus.cns.utoronto.ca>, freebsd-net@freebsd.org, Jeremie Le Hen <jeremie@le-hen.org> Subject: Re: [fbsd] Re: [fbsd] Network performance in a dual CPU system Message-ID: <Pine.LNX.4.61.0605011117210.14405@wagner.orchestra.cse.unsw.EDU.AU> In-Reply-To: <20060427154718.J75848@fledge.watson.org> References: <7bb8f24157080b6aaacb897a99259df9@madhaus.cns.utoronto.ca> <20060427093916.GC84148@obiwan.tataz.chchile.org> <20060427145252.I75848@fledge.watson.org> <20060427143814.GD84148@obiwan.tataz.chchile.org> <20060427154718.J75848@fledge.watson.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 27 Apr 2006, Robert Watson wrote: > Yes -- basically, what this setting does is turn a deferred dispatch of the > protocol level processing into a direct function invocation. This reminds me of a problem I saw about a year ago, where the number of entries in the DMA ring was much greater (IIRC 256) than the number of entries in the IP input queue (IIRC hardcoded at 50). So what would end up happening under high loads was that lots of packets would get dumped when you tried to enqueue them onto the IP input queue. If you are finding that direct dispatch is giving you a really big performance increase on some workloads, you might like to check that the reason isn't simply that you have avoided overflowing this queue. > - Increase the time it takes to pull packets out of the card -- we process > each packet to completion rather than pulling them out in sets and batching > them. This pushes drop on overload into the card instead of the IP queue, > which has some benefits and some costs. The nice thing about doing it this way is that it is less prone to performance degradation under overload, since you don't dequeue (and hence do work on) packets which will be later discarded. > The reason for the strong source ordering is that some protocols, TCP in > particular, respond really badly to misordering, which they detect as a > loss and force retransmit for. If we introduce multiple netisrs naively > by simply having the different threads working from the same IP input > queue, then we can potentially pull packets from the same source into > different workers, and process them at different rates, resulting in > misordering being introduced. While we'd process packets with greater > parallelism, and hence possibly faster, we'd toast the end-to-end > protocol properties and make everyone really unhappy. Would it be possible to improve the behaviour of the TCP protocol implementation so that out-of-order reception was acceptable? # Someone else asked a question about polling. Pretty much all modern network interfaces support interrupt moderation of some description. There really is no need to use polling any more, as interfaces do not cause excessive interrupt rates. The performance difference we are seeing with polling is likely because it does better scheduling of packet processing than the current model does. For example, most driver implementations just spin dequeueing packets until their DMA rings are empty, however that doesn't work so well when you have fixed sized queues elsewhere which are filling up. If you look at the polling code, it only dequeues a small number of packets at a time, and allows them to be processed before it continues dequeueing. I would bet that if the packet dispatch model gets improved, we can ditch polling entirely, at least for modern network interfaces. -- Luke Macpherson
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.LNX.4.61.0605011117210.14405>