Date: Tue, 29 Oct 2013 15:03:10 -0700 From: Navdeep Parhar <np@FreeBSD.org> To: Andre Oppermann <andre@freebsd.org>, Luigi Rizzo <rizzo@iet.unipi.it> Cc: Randall Stewart <rrs@lakerest.net>, "freebsd-net@freebsd.org" <net@freebsd.org> Subject: Re: MQ Patch. Message-ID: <5270309E.5090403@FreeBSD.org> In-Reply-To: <527027CE.5040806@freebsd.org> References: <40948D79-E890-4360-A3F2-BEC34A389C7E@lakerest.net> <526FFED9.1070704@freebsd.org> <CA%2BhQ2%2BgTc87M0f5pvFeW_GCZDogrLkT_1S2bKHngNcDEBUeZYQ@mail.gmail.com> <52701D8B.8050907@freebsd.org> <527022AC.4030502@FreeBSD.org> <527027CE.5040806@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On 10/29/13 14:25, Andre Oppermann wrote: > On 29.10.2013 22:03, Navdeep Parhar wrote: >> On 10/29/13 13:41, Andre Oppermann wrote: >>> Let me jump in here and explain roughly the ideas/path I'm exploring >>> in creating and eventually implementing a big picture for drivers, >>> queues, queue management, various QoS and so on: >>> >>> Situation: We're still mostly based on the old 4.4BSD IFQ model with >>> a couple of work-arounds (sndring, drbr) and the bit-rotten ALTQ we >>> have in tree aren't helpful at all. >>> >>> Steps: >>> >>> 1. take the soft-queuing method out of the ifnet layer and make it >>> a property of the driver, so that the upper stack (or actually >>> protocol L3/L2 mapping/encapsulation layer) calls (*if_transmit) >>> without any queuing at that point. It then is up to the driver >>> to decide how it multiplexes multi-core access to its queue(s) >>> and how they are configured. >> >> It would work out much better if the kernel was aware of the number of >> tx queues of a multiq driver and explicitly selected one in if_transmit. >> The driver has no information on the CPU affinity etc. of the >> applications generating the traffic; the kernel does. In general, the >> kernel has a much better "global view" of the system and some of the >> stuff currently in the drivers really should move up into the stack. > > I've been thinking a lot about this and come to the preliminary conclusion > that the upper stack should not tell the driver which queue to use. There > are way to many possible and depending on the use-case, better or worse > performing approaches. Also we have a big problem with cores vs. queues > mismatches either way (more cores than queues or more queues than cores, > though the latter is much less of problem). > > For now I see these primary multi-hardware-queue approaches to be > implemented > first: > > a) the drivers (*if_transmit) takes the flowid from the mbuf header and > selects one of the N hardware DMA rings based on it. Each of the DMA > rings is protected by a lock. Here the assumption is that by having > enough DMA rings the contention on each of them will be relatively low > and ideally a flow and ring sort of sticks to a core that sends lots > of packets into that flow. Of course it is a statistical certainty that > some bouncing will be going on. > > b) the driver assigns the DMA rings to particular cores which by that, > through > a critnest++ can drive them lockless. The drivers (*if_transmit) > will look > up the core it got called on and push the traffic out on that DMA ring. > The problem is the actual upper stacks affinity which is not guaranteed. > This has to consequences: there may be reordering of packets of the same > flow because the protocols send function happens to be called from a > different core the second time. Or the drivers (*if_transmit) has to > switch to the right core to complete the transmit for this flow if the > upper stack migrated/bounced around. It is rather difficult to assure > full affinity from userspace down through the upper stack and then to > the driver. > > c) non-multi-queue capable hardware uses a kernel provided set of functions > to manage the contention for the single resource of a DMA ring. > > The point here is that the driver is the right place to make these > decisions > because the upper stack lacks (and shouldn't care about) the actual > available > hardware and its capabilities. All necessary information is available > to the > driver as well through the appropriate mbuf header fields and the core > it is > called on. > I mildly disagree with most of this, specifically with the part that the driver is the right place to make these decisions. But you did say this was a "preliminary conclusion" so there's hope yet ;-) Let's wait till you have an early implementation and we are all able to experiment with it. To be continued... Regards, Navdeep
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5270309E.5090403>