Date: Tue, 29 Oct 2013 16:20:08 -0400 From: Randall Stewart <rrs@lakerest.net> To: Luigi Rizzo <rizzo@iet.unipi.it> Cc: Andre Oppermann <andre@freebsd.org>, "freebsd-net@freebsd.org" <net@freebsd.org> Subject: Re: MQ Patch. Message-ID: <13BF1F55-EC13-482B-AF7D-59AE039F877D@lakerest.net> In-Reply-To: <CA%2BhQ2%2BgTc87M0f5pvFeW_GCZDogrLkT_1S2bKHngNcDEBUeZYQ@mail.gmail.com> References: <40948D79-E890-4360-A3F2-BEC34A389C7E@lakerest.net> <526FFED9.1070704@freebsd.org> <CA%2BhQ2%2BgTc87M0f5pvFeW_GCZDogrLkT_1S2bKHngNcDEBUeZYQ@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Lugi: comments in line.. On Oct 29, 2013, at 3:58 PM, Luigi Rizzo wrote: > my short, top-post comment is that I'd rather see some more > coordination with Andre, and especially some high level README > or other form of documentation explaining the architecture > you have in mind before this goes in. >=20 > To expand my point of view (and please do not read me as negative, > i am trying to be constructive and avoid future troubles and > volunteer to help with the design and implementation): >=20 > (i'll omit issues re. style and unrelated patches in the diff > because they are premature) >=20 > 1. Having multiple separate software queues attached to a physical = queue > makes sense only if we have a clear and documented plan > for scheduling traffic from these queues into the hw one. > Otherwise it ends up being just another confusing hack > that makes it difficult to reason about device drivers. >=20 > We already have something similar now (with the drbr queue on top > used in some cases when the hw ring overflows), the ALTQ hooks, > and without documentation this does not seem to improve the > current situation. >=20 Well I can't get Adara to give up how it uses these in its product.. I = was lucky to get them to give back the low level work. The problem with ALTQ is that it is really broken if you want to do any = sort of decent performance with queueing. However with a small bit of work = (aka throw away the altq queues themselves and set ALTQ to place the ac_qos number = in here and queue the packet) you could have ALTQ able to transmit at line-rate = and have proper QOS. > 2. QoS is not just priority scheduling or AQM a-la RED/CODEL/PI, > but a coherent framework where you can classify/partition traffic > into separate queues, apply one of several queue management > (taildrop/RED/CODEL/whatever) and scheduling (which queue to serve = next) > policies in an efficient way. >=20 > Linux mostly gets this right (they even support hierarchical = schedulers). Which is also what ALTq attempts to do as well. Again I can't get Adara to give there top level code.. but someone *could* hint hint hook altq = up to this and be able to have a reasonable performance model with altq... >=20 > Dummynet has a reasonable architecture although not hierarchical > and it operates at the IP level (or possibly at layer 2), > which is probably too high (but not necessarily). > We can also recycle the components, i.e. the classifier in ipfw > and the scheduling algorithms. I am happy to help on this. >=20 > ALTQ is too old and complex and inefficient and unmaintained to be = considered. Exactly.. >=20 > And i cannot comment on your code because you don't really explain > what you want to do and how. Codel/PI are only queue management, > not qos; and strict priority is just one (and probably the worse) = policy > one can have. Of course but you need them if you want to prevent buffer-bloat. >=20 > One comment i can make, however, on the fact that 256 queues are > way too few for a proper system. You need the number to be > dynamic and much larger (e.g. using flowid as a key). >=20 > So, to conclude: i fully support any plan to design something that = lets us > implement scheduling (and qos, if you want to call it this way) > in a reasonable way, but what is in your patch now does not really > seem to improve the current situation in any way. >=20 Its a step towards fixing that I am allowed to give. I can see why Company's get frustrated with trying to give anything to the = project. R > cheers > luigi >=20 >=20 >=20 > On Tue, Oct 29, 2013 at 11:30 AM, Andre Oppermann <andre@freebsd.org> = wrote: > On 29.10.2013 11:50, Randall Stewart wrote: > Hi: >=20 > As discussed at vBSDcon with andre/emaste and gnn, I am sending > this patch out to all of you ;-) >=20 > I wasn't at vBSDcon but it's good that you're sending it (again). ;) >=20 >=20 > I have previously sent it to gnn, andre, jhb, rwatson, and several = other > of the usual suspects (as gnn put it) and received dead silence. >=20 > Sorry 'bout that. Too many things going on recently. >=20 >=20 > What does this patch do? >=20 > Well it add the ability to do multi-queue at the driver level. = Basically > any driver that uses the new interface gets under it N queues (default > is 8) for each physical transmit ring it has. The driver picks up > its queue 0 first, then queue 1 .. up to the max. >=20 > To make I understand this correctly there are 8 soft-queues for each = real > transmit ring, correct? And the driver will dequeue the lowest = numbered > queue for as long as there are packets in it. Like a hierarchical = strict > queuing discipline. >=20 > This is prone to head of line blocking and starvation by higher = priority > queues. May become a big problem under adverse traffic patterns. >=20 >=20 > This allows you to prioritize packets. Also in here is the start of = some > work I will be doing for AQM.. think either Pi or Codel ;-) >=20 > Right now thats pretty simple and just (in a few drivers) as the = ability > to limit the amount of data on the ring=85 which can help reduce = buffer > bloat. That needs to be refined into a lot more. >=20 > We actually have two queues, the soft-queue and the hardware ring = which > both can be rather large leading to various issues as you mention. >=20 > I've started work on an FF contract to rethink the whole IFQ* model = and > to propose and benchmark different approaches. After that to convert = all > drivers in the tree to the chosen model(s) and get rid of the legacy. = In > general the choice of model will be done in the driver and no longer = by > the ifnet layer. One or (most likely) more optimized models will be > provided by the kernel for drivers to chose from. The idea that most, > if not all drivers use these standard kernel provided models to avoid > code duplication. However as the pace of new features is quite high > we provide the full discretion for the driver to choose and experiment > with their own ways of dealing with it. This is under the assumption > that once a now model has been found it is later moved to the kernel > side and subsequently used by other drivers as well. >=20 >=20 > This work is donated by Adara Networks and has been discussed in = several > of the past vendor summits. >=20 > I plan on committing this before the IETF unless I hear major = objections. >=20 > There seems to be a couple of white space issues where first there is = a tab > and then actual whitespace for the second one and others all over the = place. >=20 > There seem to be a number of unrelated changes in sys/dev/cesa/cesa.c, > sys/dev/fdt/fdt_common.c, sys/dev/fdt/simplebus.c, = sys/kern/subr_bus.c, > usr.sbin/ofwdump/ofwdump.c. >=20 > It would be good to separate out the soft multi-queue changes from the = ring > depth changes and do each in at least one commit. >=20 > There are two separate changes to sys/dev/oce/, one is renaming of the = lock > macros and the other the change to drbr. >=20 > The changes to sys/kern/subr_bufring.c are not style compliant and we = normally > don't use Linux "wb()" barriers in FreeBSD native code. The atomics_* = should > be used instead. >=20 > Why would we need a multi-consumer dequeue? >=20 > The new bufring functions on a first glance do seem to be safe on = architectures > with a more relaxed memory ordering / cache coherency model than x86. >=20 > The atomic dance in a number of drbr_* functions doesn't seem to make = much sense > and a single spin-lock may result in atomic operations and bus lock = cycles. >=20 > There is a huge amount of includes pollution in sys/net/drbr.h which = we are > currently trying to get rid of and to avoid for the future. >=20 >=20 > I like the general conceptual approach but the implementation feels = bumpy and > not (yet) ready for prime time. In any case I'd like to take forward = conceptual > parts for the FF sponsored IFQ* rework. >=20 > --=20 > Andre >=20 >=20 > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" >=20 >=20 >=20 > --=20 > = -----------------------------------------+------------------------------- > Prof. Luigi RIZZO, rizzo@iet.unipi.it . Dip. di Ing. = dell'Informazione > http://www.iet.unipi.it/~luigi/ . Universita` di Pisa > TEL +39-050-2211611 . via Diotisalvi 2 > Mobile +39-338-6809875 . 56122 PISA (Italy) > = -----------------------------------------+------------------------------- ------------------------------ Randall Stewart 803-317-4952 (cell)
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?13BF1F55-EC13-482B-AF7D-59AE039F877D>