From owner-freebsd-net@freebsd.org Wed Jan 13 12:25:31 2016 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7045AA6B802 for ; Wed, 13 Jan 2016 12:25:31 +0000 (UTC) (envelope-from sepherosa@gmail.com) Received: from mail-io0-x22d.google.com (mail-io0-x22d.google.com [IPv6:2607:f8b0:4001:c06::22d]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 353EB12FA for ; Wed, 13 Jan 2016 12:25:31 +0000 (UTC) (envelope-from sepherosa@gmail.com) Received: by mail-io0-x22d.google.com with SMTP id 77so384982909ioc.2 for ; Wed, 13 Jan 2016 04:25:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=cXbJGqRs5H65iGXDT9EduqEQoQYjKH+kqwZNZ1qwMqg=; b=vOKnkcaEn9ffPMQugZIzQb68KwlksDkWAMcjxU6ACpJz0p6cy7Gu2jJO+c7QyExhiD vHPTFPWklP8xhA3ZsbeaP4IdgRMTBocNFHOxMhd9aoxd3m/dsJB1GjQl5oO/rcBhXdOP 5dXA6DzjytApZ8+prd81YaEoJ9UR5lDSHArRn4plLwLIK+TAoRDh2N7o1LlWRdVjThvj LzzTB2jbHNk+I2i+P8C07Z1RyVu0s4xtuq259mAKvDxTyvtqCQIOhiHXyvzEH7EOe+E3 HZLEji9hsLFEBqWiRoX8tNLq/kc9WFhU79mDuO/DxV1vshv6+u2iIMOV89PfK7/F9/d6 L2Yw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=cXbJGqRs5H65iGXDT9EduqEQoQYjKH+kqwZNZ1qwMqg=; b=WyENCD+cliVt0gt0Bu0umG2VefCE2swKuCknt4wj1fi1Yvtps1NwpuYqAwqIoVJzJz 1mpPMYIfW6gEZAAb9uEMNTkkR6etLnBc3zGnbYoooGAlStDW06s5ycgPBTFPRHjhRSJQ MlOvaitogpC0h4dGRyFvFRHpeKkHo8p+Q+VhhlhfbrtRvpCyE0/Vz6ssRgglGVG5ElUj PTriQjzxI4jKxst6gaWifGUlEgV0wnNt0icLKTnRFSEVbRb2io0mUs7ZAi+sDsPy1Wh/ UeRrnjF8bZrRik5Maw4OURkE0NqS26d8DqmVtVM2fgDbM6fOJeNJsFPbrcplJrt3HEoZ 0yjw== X-Gm-Message-State: ALoCoQkWsGDIC4xSwxWpz+C18yaLddPT2tqMw20dZBPmMpSUrPNAQquCQeRzPwv0euntmBWWkPZxjRtXMpoSYBUwchPJQuYMhw== MIME-Version: 1.0 X-Received: by 10.107.7.22 with SMTP id 22mr71614198ioh.17.1452687930466; Wed, 13 Jan 2016 04:25:30 -0800 (PST) Received: by 10.107.166.3 with HTTP; Wed, 13 Jan 2016 04:25:30 -0800 (PST) In-Reply-To: References: <20160104101747.58347.qmail@f5-external.bushwire.net> <20160104194044.GD3625@kib.kiev.ua> <20160104210741.32812.qmail@f5-external.bushwire.net> <20160107161213.GZ3625@kib.kiev.ua> <20160107192840.GF3625@kib.kiev.ua> <20160108172323.W1815@besplex.bde.org> <20160108075815.3243.qmail@f5-external.bushwire.net> <20160108204606.G2420@besplex.bde.org> Date: Wed, 13 Jan 2016 20:25:30 +0800 Message-ID: Subject: Re: Does FreeBSD have sendmmsg or recvmmsg system calls? From: Sepherosa Ziehau To: Boris Astardzhiev Cc: Adrian Chadd , Mark Delany , FreeBSD Net Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 13 Jan 2016 12:25:31 -0000 On Tue, Jan 12, 2016 at 10:53 PM, Boris Astardzhiev wrote: > Hello again, > > In my spare time I did the following simple libc-only implementation of the > syscalls. > I did some tests in a VM adapting these experiments: > https://blog.cloudflare.com/how-to-receive-a-million-packets/ On Dragonfly, I could do 1.3Mtrans/s (one trans == 18B UDP reception and then send back) w/o the {recv,send}mmsg() API on an 4C/8T ivy-bridge i7 easily. I think only SO_REUSEPORT and cpu hint (Dfly has SO_CPUHINT getsockopt) matter in their test. Thanks, sephe > > Any comments about the diff are greatly appreciated. > > Best regards, > Boris Astardzhiev > > On Fri, Jan 8, 2016 at 7:02 PM, Adrian Chadd wrote: > >> On 8 January 2016 at 03:02, Bruce Evans wrote: >> > On Fri, 8 Jan 2016, Adrian Chadd wrote: >> > >> >> On 7 January 2016 at 23:58, Mark Delany wrote: >> >>> >> >>> On 08Jan16, Bruce Evans allegedly wrote: >> >>>> >> >>>> If the NIC can't reach line rate >> >>> >> >>> >> >>>> Network stack overheads are also enormous. >> >>> >> >>> >> >>> Bruce makes some excellent points. >> >>> >> >>> I challenge anyone to get line rate UDP out of FBSD (or Linux) for a >> >>> 1G NIC yet alone a 10G NIC listening to a single port. It was exactly >> >>> my frustration with UDP performance that led me down the path of >> >>> *mmsg() and netmap. >> >>> >> >>> Frankly this is an opportunity for FBSD as UDP performance appears to >> >>> be a neglected area. >> >> >> >> >> >> I'm there, on 16 threads. >> >> >> >> I'd rather we do it on two or three, as a lot of time is wasted in >> >> producer/consumer locking. but yeah, 500k tx/rx should be doable per >> >> CPU with only locking changes. >> >> .. and I did mean "kernel producer/consumer locking changes." >> >> > >> > Line rate for 1 Gbps is about 1500 kpps (small packets). >> > >> > With I218V2 (em), I see enormous lock contention above 3 or 4 (user) >> > threads, and 8 are slightly slower than 1. 1 doesn't saturate the NIC, >> > and 2 is optimal. >> > >> >> The RSS support in -HEAD lets you get away with parallelising UDP >> streams very nicely. >> >> The framework is pretty simple (!): >> >> * drivers ask the RSS code for the RSS config and RSS hash to use, and >> configure the hardware appropriately; >> * the netisr input paths check the existence of the RSS hash and will >> calculte it in software if reqiured; >> * v4/v6 reassembly is done (at the IP level, /not/ at the protocol >> level) and if it needs a new RSS hash / netisr reinjection, that'll >> happen; >> * the PCB lookup code for listen sockets now allows one listen socket >> per RSS bucket - as the RSS / PCBGROUPS code already extended the PCB >> to have one PCB table per RSS bucket (as well as a global one); >> >> So: >> >> * userland code queries RSS for the CPU and RSS bucket setup; >> * you then create one listen socket per RSS bucket, bind it to the >> local thread (if you want) and tell it "you're in RSS bucket X"; >> * .. and then in the UDP case for local-bound sockets, the >> transmit/receive path does not require modifying the global PCB state, >> so the locking is kept per-RSS bucket, and scales linearly with the >> number of CPUs you have (until you hit the NIC queue limits.) >> >> https://github.com/erikarn/freebsd-rss/ >> >> and: >> >> >> http://adrianchadd.blogspot.com/2014/06/hacking-on-receive-side-scaling-rss-on.html >> >> http://adrianchadd.blogspot.com/2014/07/application-awareness-of-receive-side.html >> >> http://adrianchadd.blogspot.com/2014/08/receive-side-scaling-figuring-out-how.html >> >> http://adrianchadd.blogspot.com/2014/09/receive-side-scaling-testing-udp.html >> >> http://adrianchadd.blogspot.com/2014/10/more-rss-udp-tests-this-time-on-dell.html >> >> >> >> -adrian >> _______________________________________________ >> freebsd-net@freebsd.org mailing list >> https://lists.freebsd.org/mailman/listinfo/freebsd-net >> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" >> > > _______________________________________________ > freebsd-net@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" -- Tomorrow Will Never Die