Date: Wed, 13 Jan 2016 20:25:30 +0800 From: Sepherosa Ziehau <sepherosa@gmail.com> To: Boris Astardzhiev <boris.astardzhiev@gmail.com> Cc: Adrian Chadd <adrian.chadd@gmail.com>, Mark Delany <c2h@romeo.emu.st>, FreeBSD Net <freebsd-net@freebsd.org> Subject: Re: Does FreeBSD have sendmmsg or recvmmsg system calls? Message-ID: <CAMOc5czFpg=zk6si5miG5VX-pY25m4=uW9bZiap=-%2BRwmidOOQ@mail.gmail.com> In-Reply-To: <CAP=KkTwG0SVUmrBuWm33EC-tG4tMTdF5rLZQ_u6G1=-ujnfjkA@mail.gmail.com> References: <20160104101747.58347.qmail@f5-external.bushwire.net> <20160104194044.GD3625@kib.kiev.ua> <20160104210741.32812.qmail@f5-external.bushwire.net> <CAP=KkTwfpjec2Tgnm4PRR3u8t4GEqN9Febm5HRcqapifBG-B6g@mail.gmail.com> <CA%2BhQ2%2Bh4NNz9tgSpjJdv7fXteq5tAR7o3LvjV=u08NHjRLPwmA@mail.gmail.com> <CAP=KkTzFUDsZwDDLD3n97xJW0qLVZMPduZGSX%2BeXC3UuLpVjMg@mail.gmail.com> <20160107161213.GZ3625@kib.kiev.ua> <CA%2BhQ2%2Bg6OB3MmZrW5hzNSnkcqKaKf1XGDraHfWXtSrowxKuL5g@mail.gmail.com> <20160107192840.GF3625@kib.kiev.ua> <20160108172323.W1815@besplex.bde.org> <20160108075815.3243.qmail@f5-external.bushwire.net> <CAJ-VmonYPhcN-gikuYQU_k5GaTAqTijoxR_0ORV4BZqsHMRJSg@mail.gmail.com> <20160108204606.G2420@besplex.bde.org> <CAJ-Vmom26mukSv3JmsmNiAONvpc6f1bQ%2BujO25qefGHY=5przA@mail.gmail.com> <CAP=KkTwG0SVUmrBuWm33EC-tG4tMTdF5rLZQ_u6G1=-ujnfjkA@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Jan 12, 2016 at 10:53 PM, Boris Astardzhiev <boris.astardzhiev@gmail.com> wrote: > Hello again, > > In my spare time I did the following simple libc-only implementation of the > syscalls. > I did some tests in a VM adapting these experiments: > https://blog.cloudflare.com/how-to-receive-a-million-packets/ On Dragonfly, I could do 1.3Mtrans/s (one trans == 18B UDP reception and then send back) w/o the {recv,send}mmsg() API on an 4C/8T ivy-bridge i7 easily. I think only SO_REUSEPORT and cpu hint (Dfly has SO_CPUHINT getsockopt) matter in their test. Thanks, sephe > > Any comments about the diff are greatly appreciated. > > Best regards, > Boris Astardzhiev > > On Fri, Jan 8, 2016 at 7:02 PM, Adrian Chadd <adrian.chadd@gmail.com> wrote: > >> On 8 January 2016 at 03:02, Bruce Evans <brde@optusnet.com.au> wrote: >> > On Fri, 8 Jan 2016, Adrian Chadd wrote: >> > >> >> On 7 January 2016 at 23:58, Mark Delany <c2h@romeo.emu.st> wrote: >> >>> >> >>> On 08Jan16, Bruce Evans allegedly wrote: >> >>>> >> >>>> If the NIC can't reach line rate >> >>> >> >>> >> >>>> Network stack overheads are also enormous. >> >>> >> >>> >> >>> Bruce makes some excellent points. >> >>> >> >>> I challenge anyone to get line rate UDP out of FBSD (or Linux) for a >> >>> 1G NIC yet alone a 10G NIC listening to a single port. It was exactly >> >>> my frustration with UDP performance that led me down the path of >> >>> *mmsg() and netmap. >> >>> >> >>> Frankly this is an opportunity for FBSD as UDP performance appears to >> >>> be a neglected area. >> >> >> >> >> >> I'm there, on 16 threads. >> >> >> >> I'd rather we do it on two or three, as a lot of time is wasted in >> >> producer/consumer locking. but yeah, 500k tx/rx should be doable per >> >> CPU with only locking changes. >> >> .. and I did mean "kernel producer/consumer locking changes." >> >> > >> > Line rate for 1 Gbps is about 1500 kpps (small packets). >> > >> > With I218V2 (em), I see enormous lock contention above 3 or 4 (user) >> > threads, and 8 are slightly slower than 1. 1 doesn't saturate the NIC, >> > and 2 is optimal. >> > >> >> The RSS support in -HEAD lets you get away with parallelising UDP >> streams very nicely. >> >> The framework is pretty simple (!): >> >> * drivers ask the RSS code for the RSS config and RSS hash to use, and >> configure the hardware appropriately; >> * the netisr input paths check the existence of the RSS hash and will >> calculte it in software if reqiured; >> * v4/v6 reassembly is done (at the IP level, /not/ at the protocol >> level) and if it needs a new RSS hash / netisr reinjection, that'll >> happen; >> * the PCB lookup code for listen sockets now allows one listen socket >> per RSS bucket - as the RSS / PCBGROUPS code already extended the PCB >> to have one PCB table per RSS bucket (as well as a global one); >> >> So: >> >> * userland code queries RSS for the CPU and RSS bucket setup; >> * you then create one listen socket per RSS bucket, bind it to the >> local thread (if you want) and tell it "you're in RSS bucket X"; >> * .. and then in the UDP case for local-bound sockets, the >> transmit/receive path does not require modifying the global PCB state, >> so the locking is kept per-RSS bucket, and scales linearly with the >> number of CPUs you have (until you hit the NIC queue limits.) >> >> https://github.com/erikarn/freebsd-rss/ >> >> and: >> >> >> http://adrianchadd.blogspot.com/2014/06/hacking-on-receive-side-scaling-rss-on.html >> >> http://adrianchadd.blogspot.com/2014/07/application-awareness-of-receive-side.html >> >> http://adrianchadd.blogspot.com/2014/08/receive-side-scaling-figuring-out-how.html >> >> http://adrianchadd.blogspot.com/2014/09/receive-side-scaling-testing-udp.html >> >> http://adrianchadd.blogspot.com/2014/10/more-rss-udp-tests-this-time-on-dell.html >> >> >> >> -adrian >> _______________________________________________ >> freebsd-net@freebsd.org mailing list >> https://lists.freebsd.org/mailman/listinfo/freebsd-net >> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" >> > > _______________________________________________ > freebsd-net@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" -- Tomorrow Will Never Die
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAMOc5czFpg=zk6si5miG5VX-pY25m4=uW9bZiap=-%2BRwmidOOQ>