Date: Sun, 29 Mar 2015 03:33:55 +0300 From: Slawa Olhovchenkov <slw@zxy.spb.ru> To: Adrian Chadd <adrian@freebsd.org> Cc: "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org> Subject: Re: irq cpu binding Message-ID: <20150329003354.GK23643@zxy.spb.ru> In-Reply-To: <CAJ-VmokSHHm3kMwz=bp7VbgZwADD2_pEr27NdzUfkGq1U=x_sw@mail.gmail.com> References: <20150328201219.GF23643@zxy.spb.ru> <CAJ-Vmo=wecgoVYcS14gsOnT86p=HEMdao65aXTi7jLfVVyOELg@mail.gmail.com> <20150328221621.GG23643@zxy.spb.ru> <CAJ-Vmomd6Z5Ou7cvV1Kg4m=X2907507hqKMWiz6ssZ45Pi_-Dg@mail.gmail.com> <20150328224634.GH23643@zxy.spb.ru> <CAJ-VmokwGgHGP6AjBcGbyJShBPX6dyJjjNeCBcjxLi1obaiRtQ@mail.gmail.com> <20150328230533.GI23643@zxy.spb.ru> <CAJ-VmongWE_z7Rod8-SoFmyiLqiTbHtSaAwjgAs05L_Z3jrWXA@mail.gmail.com> <20150328234116.GJ23643@zxy.spb.ru> <CAJ-VmokSHHm3kMwz=bp7VbgZwADD2_pEr27NdzUfkGq1U=x_sw@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, Mar 28, 2015 at 04:58:53PM -0700, Adrian Chadd wrote: > Hi, > > * It turns out that fragments were being 100% handled out of order > (compared to non-fragments in the same stream) when doing fragment > reassembly, because the current system was assuming direct dispatch > netisr and not checking any packet contents for whether they're on the > wrong CPU. I checked. It's not noticable unless you go digging, but > it's absolutely happening. That's why I spun a lot of cycles looking > at the IP fragment reassembly path and which methods get called on the > frames as they're reinjected. In case of fragmented packet we have first fragment (may be arrived not first) contained L4 information and dispatchet to correct bucket and other fragments, don't contains this information and dispathed anywere. As I understund IP stack gather all packet before processing. All we need -- do processing on CPU arriving first segment. > * We're going to have modify drivers, because the way drivers > currently assign interrupts, pick CPUs for queues, auto-select how > many queues to use, etc is all completely adhoc and not consistent. So Yes. I don't see problem (except re-binding IRQ by cpuset). All interesting drivers give tunable to control how many queues to use. I don't know how automate this: - one 1-port card - one 2-port card - one port of 2-port card - two 1-port card - two different card .... Manual select is aceptable here. > yeah, we're going to change the drivers and they're going to be > consistent and configurable. That way you can choose how you want to > distribute work and pin or not pin things - and it's not done adhoc > differently in each driver. Even igb, ixgbe and cxgbe differ in how > they implement these three things. > > * For RSS, there'll be a consistent configuration for what the > hardware is doing with hashing, rather than it being driver dependent. > Again, otherwise you may end up with some NICs doing 2-tuple hashing > where others are doing 4-tuple hashing, and behaviour changes > dramatically based on what NIC you're using. What's problem there? I am don't intersting how NIC do hashing (anyway, hashing for direct and reflex traffic is different -- this is not Tilera). All I need -- distributing flow to CPU, for balance load and reduction lock congenstion. > * For applications - I'm not sure yet, but at the minimum the librss > API I have vaguely sketched out and coded up in a git branch lets you > pull out the list of buckets and which CPU it's on. I'm going to > extend that a bit more, but it should be enough for things like nginx > to say "ok, start up one nginx process per RSS bucket, and here's the > CPU set for it to bind to." You said it has worker groups - that's > great; I want that to be auto configured. For applications minimum is (per socket) select/kqueut/accept work only for flow, arrived at CPU matched CPU at time select/kqueut/accept (yes, for correct work application must pined to this CPU). And application don't need know anything about buckets and etc. After this, arrived packet activated IRQ handler, ithread, driver interrup thread, TCP stack, select/accept, read, write, tcp_output -- all on same cpu. I can be wrong, this is save L2/L3 cache. Where I missunderstund?
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20150329003354.GK23643>