From owner-freebsd-stable@FreeBSD.ORG Sun Sep 21 22:08:19 2014 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 8ECBBCF4; Sun, 21 Sep 2014 22:08:19 +0000 (UTC) Received: from mail-yh0-x22c.google.com (mail-yh0-x22c.google.com [IPv6:2607:f8b0:4002:c01::22c]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 3E9B2EC7; Sun, 21 Sep 2014 22:08:19 +0000 (UTC) Received: by mail-yh0-f44.google.com with SMTP id v1so1630988yhn.3 for ; Sun, 21 Sep 2014 15:08:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=dmBhZSC78EwNJsKyoqjA6HE7QYe8JY4Ci3LK5SASJHs=; b=AWaHJxqyQGN3s9B1MNeChkleQuUZVwCZYxUvnc3alyWgugoetxUXdrEb3LBr0VFKF0 dWAQptBuv9WkKAXcWTZNtWga+2qSCWrHEUnfHH3lnfq/zVrGRjBZbm2S1yVg+OxOa9Uo E0FX/VQhbckqdWWkZRIOn68To4SjeJ3yu9NWXFejD605vZPC/LWi9ZOx0fiu7uedGT6N zryLaqC6r56dO1iYF0oqmiwbsT0iThwmquTR9Tuy6h6e8rnXGUu1tg6Y/WWM9JWjVssn M6/XRijO+Sem7rm1OR0vf33eE5Tt9YQhUuL+rHtnVcknb2cld0JIdk0098hRZNb8FujY J2+g== MIME-Version: 1.0 X-Received: by 10.236.133.65 with SMTP id p41mr38752yhi.73.1411337298222; Sun, 21 Sep 2014 15:08:18 -0700 (PDT) Sender: kmacybsd@gmail.com Received: by 10.170.82.197 with HTTP; Sun, 21 Sep 2014 15:08:18 -0700 (PDT) In-Reply-To: References: <1411259605.674669006.339g4pd4@frv35.fwdcdn.com> Date: Sun, 21 Sep 2014 15:08:18 -0700 X-Google-Sender-Auth: ROzXCVXLzuSGASnSUoZvHLxFwwY Message-ID: Subject: Re: FreeBSD 10 network performance problems From: "K. Macy" To: Rumen Telbizov Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1 Cc: Tom Elite , "freebsd-stable@freebsd.org" X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 21 Sep 2014 22:08:19 -0000 What you're dealing with is hardly an edge case. Most people don't need to push more than a couple of Gbps in production. Flowtable is hardly "untested." However, it has been a source of friction at times because it can be somewhat brittle, having limits on the number of cache entries that it can store that are frequently too low for people with very large numbers of active flows. Without raising this limit substantially these systems will fail in a rather spectacular fashion. Additionally, flowtable was not written with the intent of being a routing cache. It was developed to support stateful multipath routing for load balancing. In its current incarnation, stripped of much of the code for its initial purpose, it's really just a band-aid around locking problems in routing. That said, the handful of commercial users of FreeBSD that do have large amounts of traffic (10s of Gbps) per system that I personally know of all have flowtable enabled. Unfortunately, at least in terms of what is in HEAD, little has been done to fix the contention that flowtable works around. For your purposes the response that Adrian gave you is the closest to "optimal." I hope that helps. -K On Sun, Sep 21, 2014 at 2:31 PM, Rumen Telbizov wrote: > Thank you for your answers Adrian and Vladislav. > > Adrian: > I read this paper, > http://conferences.sigcomm.org/sigcomm/2009/workshops/presto/papers/p37.pdf, > and I left with the impression that the locking contentions on *rtentry* > have been solved some time around FreeBSD 8 release with the new routing > architecture and flowtable. I was wondering if this is really the case or > maybe I am dealing with an edge-case here. I cc Qing Li and Kip Macy for > further visibility and comments (original report at > http://lists.freebsd.org/pipermail/freebsd-stable/2014-September/080170.html). > > On the other hand https://wiki.freebsd.org/NetworkPerformanceTuning > advises: "*Do not use FLOWTABLE. It is still untested (2012-02-23).*" Is > that still the case? As mentioned previously I tried this kernel option > earlier and it had no effect. > > Additionally, on https://wiki.freebsd.org/NewNetworking I saw that there > are still open items with regards to "*rtentry locking*" and "*Contention > between CPUs when forwarding between multi-queue interfaces*". Not quite > sure if this is what I am dealing with. > > I also wonder if this lock contention is something new or I am dealing > with some strange edge-case. I read that people are able to push 10Gbit/s > on FreeBSD 9.2 (https://calomel.org/network_performance.html). Anybody > else seeing this around 4-5Gbit/s ? > > > Vladislav: > Here are the details that you requested (freshly booted system): > # pciconf -lv | grep -A 4 ix\[0-9\] > ix0@pci0:5:0:0: class=0x020000 card=0x00038086 chip=0x10fb8086 > rev=0x01 hdr=0x00 > vendor = 'Intel Corporation' > device = '82599EB 10-Gigabit SFI/SFP+ Network Connection' > class = network > subclass = ethernet > ix1@pci0:5:0:1: class=0x020000 card=0x00038086 chip=0x10fb8086 > rev=0x01 hdr=0x00 > vendor = 'Intel Corporation' > device = '82599EB 10-Gigabit SFI/SFP+ Network Connection' > class = network > subclass = ethernet > > # netstat -m > 100358/50182/150540 mbufs in use (current/cache/total) > 2048/47288/49336/1526116 mbuf clusters in use (current/cache/total/max) > 2048/47287 mbuf+clusters out of packet secondary zone in use > (current/cache) > 0/7/7/763057 4k (page size) jumbo clusters in use (current/cache/total/max) > 98300/11/98311/226091 9k jumbo clusters in use (current/cache/total/max) > 0/0/0/127176 16k jumbo clusters in use (current/cache/total/max) > 913885K/107248K/1021134K bytes allocated to network (current/cache/total) > 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) > 0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters) > 0/0/0 requests for jumbo clusters delayed (4k/9k/16k) > 0/0/0 requests for jumbo clusters denied (4k/9k/16k) > 0 requests for sfbufs denied > 0 requests for sfbufs delayed > 0 requests for I/O initiated by sendfile > > # ngctl list | wc -l > 2 > > # sysctl -a | egrep 'net.(inet.(tcp|udp)|graph|isr)' > net.inet.tcp.rfc1323: 1 > net.inet.tcp.mssdflt: 536 > net.inet.tcp.keepidle: 7200000 > net.inet.tcp.keepintvl: 75000 > net.inet.tcp.sendspace: 32768 > net.inet.tcp.recvspace: 65536 > net.inet.tcp.keepinit: 75000 > net.inet.tcp.delacktime: 100 > net.inet.tcp.v6mssdflt: 1220 > net.inet.tcp.cc.algorithm: newreno > net.inet.tcp.cc.available: newreno > net.inet.tcp.hostcache.cachelimit: 15360 > net.inet.tcp.hostcache.hashsize: 512 > net.inet.tcp.hostcache.bucketlimit: 30 > net.inet.tcp.hostcache.count: 6 > net.inet.tcp.hostcache.expire: 3600 > net.inet.tcp.hostcache.prune: 300 > net.inet.tcp.hostcache.purge: 0 > net.inet.tcp.log_in_vain: 0 > net.inet.tcp.blackhole: 0 > net.inet.tcp.delayed_ack: 1 > net.inet.tcp.drop_synfin: 0 > net.inet.tcp.rfc3042: 1 > net.inet.tcp.rfc3390: 1 > net.inet.tcp.experimental.initcwnd10: 1 > net.inet.tcp.rfc3465: 1 > net.inet.tcp.abc_l_var: 2 > net.inet.tcp.ecn.enable: 0 > net.inet.tcp.ecn.maxretries: 1 > net.inet.tcp.insecure_rst: 0 > net.inet.tcp.recvbuf_auto: 1 > net.inet.tcp.recvbuf_inc: 16384 > net.inet.tcp.recvbuf_max: 2097152 > net.inet.tcp.path_mtu_discovery: 1 > net.inet.tcp.tso: 1 > net.inet.tcp.sendbuf_auto: 1 > net.inet.tcp.sendbuf_inc: 8192 > net.inet.tcp.sendbuf_max: 2097152 > net.inet.tcp.reass.maxsegments: 95400 > net.inet.tcp.reass.cursegments: 0 > net.inet.tcp.reass.overflows: 0 > net.inet.tcp.sack.enable: 1 > net.inet.tcp.sack.maxholes: 128 > net.inet.tcp.sack.globalmaxholes: 65536 > net.inet.tcp.sack.globalholes: 0 > net.inet.tcp.minmss: 216 > net.inet.tcp.log_debug: 0 > net.inet.tcp.tcbhashsize: 262144 > net.inet.tcp.do_tcpdrain: 1 > net.inet.tcp.pcbcount: 17 > net.inet.tcp.icmp_may_rst: 1 > net.inet.tcp.isn_reseed_interval: 0 > net.inet.tcp.soreceive_stream: 0 > net.inet.tcp.syncookies: 1 > net.inet.tcp.syncookies_only: 0 > net.inet.tcp.syncache.bucketlimit: 30 > net.inet.tcp.syncache.cachelimit: 15375 > net.inet.tcp.syncache.count: 0 > net.inet.tcp.syncache.hashsize: 512 > net.inet.tcp.syncache.rexmtlimit: 3 > net.inet.tcp.syncache.rst_on_sock_fail: 1 > net.inet.tcp.msl: 30000 > net.inet.tcp.rexmit_min: 30 > net.inet.tcp.rexmit_slop: 200 > net.inet.tcp.always_keepalive: 1 > net.inet.tcp.fast_finwait2_recycle: 0 > net.inet.tcp.finwait2_timeout: 60000 > net.inet.tcp.keepcnt: 8 > net.inet.tcp.rexmit_drop_options: 0 > net.inet.tcp.per_cpu_timers: 0 > net.inet.tcp.timer_race: 0 > net.inet.tcp.maxtcptw: 27767 > net.inet.tcp.nolocaltimewait: 0 > net.inet.udp.checksum: 1 > net.inet.udp.maxdgram: 9216 > net.inet.udp.recvspace: 42080 > net.inet.udp.log_in_vain: 0 > net.inet.udp.blackhole: 0 > net.isr.dispatch: direct > net.isr.maxthreads: 1 > net.isr.bindthreads: 0 > net.isr.maxqlimit: 10240 > net.isr.defaultqlimit: 256 > net.isr.maxprot: 16 > net.isr.numthreads: 1 > net.graph.threads: 12 > net.graph.maxalloc: 4096 > net.graph.maxdata: 512 > net.graph.abi_version: 12 > net.graph.msg_version: 8 > net.graph.maxdgram: 20480 > net.graph.recvspace: 20480 > net.graph.family: 32 > net.graph.data.proto: 1 > net.graph.control.proto: 2 > > Once again, I am ready to provide additional metrics and run more tests > upon request. > > Thank you, > Rumen Telbizov >