Date: Tue, 24 Sep 2013 09:47:24 +0100 From: Joe Holden <lists@rewt.org.uk> To: freebsd-net@freebsd.org Subject: Re: Network stack changes Message-ID: <5241519C.9040908@rewt.org.uk> In-Reply-To: <201309240958.06172.zec@fer.hr> References: <521E41CB.30700@yandex-team.ru> <523F4F14.9090404@yandex-team.ru> <CAEW%2BogZttyScUBQQWht%2BYGfLEDU_APcoRyYeMy_wDseAcZwVnA@mail.gmail.com> <201309240958.06172.zec@fer.hr>
next in thread | previous in thread | raw e-mail | index | archive | help
On 24/09/2013 08:58, Marko Zec wrote: > On Tuesday 24 September 2013 00:46:46 Sami Halabi wrote: >> Hi, >> >>> http://info.iet.unipi.it/~**luigi/papers/20120601-dxr.pdf<http://info.i >>> et.unipi.it/~luigi/papers/20120601-dxr.pdf> >>> http://www.nxlab.fer.hr/dxr/**stable_8_20120824.diff<http://www.nxlab.f >>> er.hr/dxr/stable_8_20120824.diff> >> >> I've tried the diff in 10-current, applied cleanly but had errors >> compiling new kernel... is there any work to make it work? i'd love to >> test it. > > Even if you'd make it compile on current, you could only run synthetic tests > measuring lookup performance using streams of random keys, as outlined in > the paper (btw. the paper at Luigi's site is an older draft, the final > version with slightly revised benchmarks is available here: > http://www.sigcomm.org/sites/default/files/ccr/papers/2012/October/2378956-2378961.pdf) > > I.e. the code only hooks into the routing API for testing purposes, but is > completely disconnected from the forwarding path. > aha! How much work would it be to enable it to be used? > We have a prototype in the works which combines DXR with Netmap in userspace > and is capable of sustaining well above line rate forwarding with > full-sized BGP views using Intel 10G cards on commodity multicore machines. > The work was somewhat stalled during the summer but I plan to wrap it up > and release the code until the end of this year. With recent advances in > netmap it might also be feasible to merge DXR and netmap entirely inside > the kernel but I've not explored that path yet... > mmm, forwarding using netmap would be pretty awesome... > Marko > > >> Sami >> >> >> On Sun, Sep 22, 2013 at 11:12 PM, Alexander V. Chernikov < >> >> melifaro@yandex-team.ru> wrote: >>> On 29.08.2013 15:49, Adrian Chadd wrote: >>>> Hi, >>> >>> Hello Adrian! >>> I'm very sorry for the looong reply. >>> >>>> There's a lot of good stuff to review here, thanks! >>>> >>>> Yes, the ixgbe RX lock needs to die in a fire. It's kinda pointless to >>>> keep locking things like that on a per-packet basis. We should be able >>>> to do this in a cleaner way - we can defer RX into a CPU pinned >>>> taskqueue and convert the interrupt handler to a fast handler that >>>> just schedules that taskqueue. We can ignore the ithread entirely >>>> here. >>>> >>>> What do you think? >>> >>> Well, it sounds good :) But performance numbers and Jack opinion is >>> more important :) >>> >>> Are you going to Malta? >>> >>>> Totally pie in the sky handwaving at this point: >>>> >>>> * create an array of mbuf pointers for completed mbufs; >>>> * populate the mbuf array; >>>> * pass the array up to ether_demux(). >>>> >>>> For vlan handling, it may end up populating its own list of mbufs to >>>> push up to ether_demux(). So maybe we should extend the API to have a >>>> bitmap of packets to actually handle from the array, so we can pass up >>>> a larger array of mbufs, note which ones are for the destination and >>>> then the upcall can mark which frames its consumed. >>>> >>>> I specifically wonder how much work/benefit we may see by doing: >>>> >>>> * batching packets into lists so various steps can batch process >>>> things rather than run to completion; >>>> * batching the processing of a list of frames under a single lock >>>> instance - eg, if the forwarding code could do the forwarding lookup >>>> for 'n' packets under a single lock, then pass that list of frames up >>>> to inet_pfil_hook() to do the work under one lock, etc, etc. >>> >>> I'm thinking the same way, but we're stuck with 'forwarding lookup' due >>> to problem with egress interface pointer, as I mention earlier. However >>> it is interesting to see how much it helps, regardless of locking. >>> >>> Currently I'm thinking that we should try to change radix to something >>> different (it seems that it can be checked fast) and see what happened. >>> Luigi's performance numbers for our radix are too awful, and there is a >>> patch implementing alternative trie: >>> http://info.iet.unipi.it/~**luigi/papers/20120601-dxr.pdf<http://info.i >>> et.unipi.it/~luigi/papers/20120601-dxr.pdf> >>> http://www.nxlab.fer.hr/dxr/**stable_8_20120824.diff<http://www.nxlab.f >>> er.hr/dxr/stable_8_20120824.diff> >>> >>>> Here, the processing would look less like "grab lock and process to >>>> completion" and more like "mark and sweep" - ie, we have a list of >>>> frames that we mark as needing processing and mark as having been >>>> processed at each layer, so we know where to next dispatch them. >>>> >>>> I still have some tool coding to do with PMC before I even think about >>>> tinkering with this as I'd like to measure stuff like per-packet >>>> latency as well as top-level processing overhead (ie, >>>> CPU_CLK_UNHALTED.THREAD_P / lagg0 TX bytes/pkts, RX bytes/pkts, NIC >>>> interrupts on that core, etc.) >>> >>> That will be great to see! >>> >>>> Thanks, >>>> >>>> >>>> >>>> -adrian >>> >>> ______________________________**_________________ >>> freebsd-net@freebsd.org mailing list >>> http://lists.freebsd.org/**mailman/listinfo/freebsd-net<http://lists.fr >>> eebsd.org/mailman/listinfo/freebsd-net> To unsubscribe, send any mail to >>> "freebsd-net-unsubscribe@**freebsd.org<freebsd-net-unsubscribe@freebsd. >>> org> " >> >> -- >> Sami Halabi >> Information Systems Engineer >> NMS Projects Expert >> FreeBSD SysAdmin Expert >> >> >> On Sun, Sep 22, 2013 at 11:12 PM, Alexander V. Chernikov < >> >> melifaro@yandex-team.ru> wrote: >>> On 29.08.2013 15:49, Adrian Chadd wrote: >>>> Hi, >>> >>> Hello Adrian! >>> I'm very sorry for the looong reply. >>> >>>> There's a lot of good stuff to review here, thanks! >>>> >>>> Yes, the ixgbe RX lock needs to die in a fire. It's kinda pointless to >>>> keep locking things like that on a per-packet basis. We should be able >>>> to do this in a cleaner way - we can defer RX into a CPU pinned >>>> taskqueue and convert the interrupt handler to a fast handler that >>>> just schedules that taskqueue. We can ignore the ithread entirely >>>> here. >>>> >>>> What do you think? >>> >>> Well, it sounds good :) But performance numbers and Jack opinion is >>> more important :) >>> >>> Are you going to Malta? >>> >>>> Totally pie in the sky handwaving at this point: >>>> >>>> * create an array of mbuf pointers for completed mbufs; >>>> * populate the mbuf array; >>>> * pass the array up to ether_demux(). >>>> >>>> For vlan handling, it may end up populating its own list of mbufs to >>>> push up to ether_demux(). So maybe we should extend the API to have a >>>> bitmap of packets to actually handle from the array, so we can pass up >>>> a larger array of mbufs, note which ones are for the destination and >>>> then the upcall can mark which frames its consumed. >>>> >>>> I specifically wonder how much work/benefit we may see by doing: >>>> >>>> * batching packets into lists so various steps can batch process >>>> things rather than run to completion; >>>> * batching the processing of a list of frames under a single lock >>>> instance - eg, if the forwarding code could do the forwarding lookup >>>> for 'n' packets under a single lock, then pass that list of frames up >>>> to inet_pfil_hook() to do the work under one lock, etc, etc. >>> >>> I'm thinking the same way, but we're stuck with 'forwarding lookup' due >>> to problem with egress interface pointer, as I mention earlier. However >>> it is interesting to see how much it helps, regardless of locking. >>> >>> Currently I'm thinking that we should try to change radix to something >>> different (it seems that it can be checked fast) and see what happened. >>> Luigi's performance numbers for our radix are too awful, and there is a >>> patch implementing alternative trie: >>> http://info.iet.unipi.it/~**luigi/papers/20120601-dxr.pdf<http://info.i >>> et.unipi.it/~luigi/papers/20120601-dxr.pdf> >>> http://www.nxlab.fer.hr/dxr/**stable_8_20120824.diff<http://www.nxlab.f >>> er.hr/dxr/stable_8_20120824.diff> >>> >>>> Here, the processing would look less like "grab lock and process to >>>> completion" and more like "mark and sweep" - ie, we have a list of >>>> frames that we mark as needing processing and mark as having been >>>> processed at each layer, so we know where to next dispatch them. >>>> >>>> I still have some tool coding to do with PMC before I even think about >>>> tinkering with this as I'd like to measure stuff like per-packet >>>> latency as well as top-level processing overhead (ie, >>>> CPU_CLK_UNHALTED.THREAD_P / lagg0 TX bytes/pkts, RX bytes/pkts, NIC >>>> interrupts on that core, etc.) >>> >>> That will be great to see! >>> >>>> Thanks, >>>> >>>> >>>> >>>> -adrian >>>
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5241519C.9040908>