From owner-freebsd-stable@FreeBSD.ORG  Sun Sep 21 22:08:19 2014
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 8ECBBCF4;
 Sun, 21 Sep 2014 22:08:19 +0000 (UTC)
Received: from mail-yh0-x22c.google.com (mail-yh0-x22c.google.com
 [IPv6:2607:f8b0:4002:c01::22c])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 3E9B2EC7;
 Sun, 21 Sep 2014 22:08:19 +0000 (UTC)
Received: by mail-yh0-f44.google.com with SMTP id v1so1630988yhn.3
 for <multiple recipients>; Sun, 21 Sep 2014 15:08:18 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:sender:in-reply-to:references:date:message-id:subject
 :from:to:cc:content-type;
 bh=dmBhZSC78EwNJsKyoqjA6HE7QYe8JY4Ci3LK5SASJHs=;
 b=AWaHJxqyQGN3s9B1MNeChkleQuUZVwCZYxUvnc3alyWgugoetxUXdrEb3LBr0VFKF0
 dWAQptBuv9WkKAXcWTZNtWga+2qSCWrHEUnfHH3lnfq/zVrGRjBZbm2S1yVg+OxOa9Uo
 E0FX/VQhbckqdWWkZRIOn68To4SjeJ3yu9NWXFejD605vZPC/LWi9ZOx0fiu7uedGT6N
 zryLaqC6r56dO1iYF0oqmiwbsT0iThwmquTR9Tuy6h6e8rnXGUu1tg6Y/WWM9JWjVssn
 M6/XRijO+Sem7rm1OR0vf33eE5Tt9YQhUuL+rHtnVcknb2cld0JIdk0098hRZNb8FujY
 J2+g==
MIME-Version: 1.0
X-Received: by 10.236.133.65 with SMTP id p41mr38752yhi.73.1411337298222; Sun,
 21 Sep 2014 15:08:18 -0700 (PDT)
Sender: kmacybsd@gmail.com
Received: by 10.170.82.197 with HTTP; Sun, 21 Sep 2014 15:08:18 -0700 (PDT)
In-Reply-To: <CAENR+_UwjMGoOqKkhCeL4zmLnSgTiABoeR-7x71MBvOnCF8z+A@mail.gmail.com>
References: <CAENR+_VDVvnY0zWKVXHOjz2vWw27s+yVrz9ZFokZ=p6P6oFNvw@mail.gmail.com>
 <1411259605.674669006.339g4pd4@frv35.fwdcdn.com>
 <CAENR+_UwjMGoOqKkhCeL4zmLnSgTiABoeR-7x71MBvOnCF8z+A@mail.gmail.com>
Date: Sun, 21 Sep 2014 15:08:18 -0700
X-Google-Sender-Auth: ROzXCVXLzuSGASnSUoZvHLxFwwY
Message-ID: <CAHM0Q_MXmn2P=vfFgaj5pZSqcTSm3h+KPAotc4K_8qpQUFh1dQ@mail.gmail.com>
Subject: Re: FreeBSD 10 network performance problems
From: "K. Macy" <kmacy@freebsd.org>
To: Rumen Telbizov <telbizov@gmail.com>
Content-Type: text/plain; charset=UTF-8
X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1
Cc: Tom Elite <qingli@freebsd.org>,
 "freebsd-stable@freebsd.org" <freebsd-stable@freebsd.org>
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable/>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 21 Sep 2014 22:08:19 -0000

What you're dealing with is hardly an edge case. Most people don't need to
push more than a couple of Gbps in production.

Flowtable is hardly "untested." However, it has been a source of friction
at times because it can be somewhat brittle, having limits on the number of
cache entries that it can store that are frequently too low for people with
very large numbers of active flows. Without raising this limit
substantially these systems will fail in a rather spectacular fashion.
Additionally, flowtable was not written with the intent of being a routing
cache. It was developed to support stateful multipath routing for load
balancing. In its current incarnation, stripped of much of the code for its
initial purpose, it's really just a band-aid around locking problems in
routing. That said, the handful of commercial users of FreeBSD that do have
large amounts of traffic (10s of Gbps) per system that I personally know of
all have flowtable enabled.

Unfortunately, at least in terms of what is in HEAD, little has been done
to fix the contention that flowtable works around. For your purposes the
response that Adrian gave you is the closest to "optimal."

I hope that helps.
-K


On Sun, Sep 21, 2014 at 2:31 PM, Rumen Telbizov <telbizov@gmail.com> wrote:

> Thank you for your answers Adrian and Vladislav.
>
> Adrian:
> I read this paper,
> http://conferences.sigcomm.org/sigcomm/2009/workshops/presto/papers/p37.pdf,
> and I left with the impression that the locking contentions on *rtentry*
> have been solved some time around FreeBSD 8 release with the new routing
> architecture and flowtable. I was wondering if this is really the case or
> maybe I am dealing with an edge-case here. I cc Qing Li and Kip Macy for
> further visibility and comments (original report at
> http://lists.freebsd.org/pipermail/freebsd-stable/2014-September/080170.html).
>
> On the other hand https://wiki.freebsd.org/NetworkPerformanceTuning
> advises: "*Do not use FLOWTABLE. It is still untested (2012-02-23).*" Is
> that still the case? As mentioned previously I tried this kernel option
> earlier and it had no effect.
>
> Additionally, on https://wiki.freebsd.org/NewNetworking I saw that there
> are still open items with regards to "*rtentry locking*" and "*Contention
> between CPUs when forwarding between multi-queue interfaces*". Not quite
> sure if this is what I am dealing with.
>
> I also wonder if this lock contention is something new or I am dealing
> with some strange edge-case. I read that people are able to push 10Gbit/s
> on FreeBSD 9.2 (https://calomel.org/network_performance.html). Anybody
> else seeing this around 4-5Gbit/s ?
>
>
> Vladislav:
> Here are the details that you requested (freshly booted system):
> # pciconf -lv | grep -A 4 ix\[0-9\]
> ix0@pci0:5:0:0:    class=0x020000 card=0x00038086 chip=0x10fb8086
> rev=0x01 hdr=0x00
>     vendor     = 'Intel Corporation'
>     device     = '82599EB 10-Gigabit SFI/SFP+ Network Connection'
>     class      = network
>     subclass   = ethernet
> ix1@pci0:5:0:1:    class=0x020000 card=0x00038086 chip=0x10fb8086
> rev=0x01 hdr=0x00
>     vendor     = 'Intel Corporation'
>     device     = '82599EB 10-Gigabit SFI/SFP+ Network Connection'
>     class      = network
>     subclass   = ethernet
>
> # netstat -m
> 100358/50182/150540 mbufs in use (current/cache/total)
> 2048/47288/49336/1526116 mbuf clusters in use (current/cache/total/max)
> 2048/47287 mbuf+clusters out of packet secondary zone in use
> (current/cache)
> 0/7/7/763057 4k (page size) jumbo clusters in use (current/cache/total/max)
> 98300/11/98311/226091 9k jumbo clusters in use (current/cache/total/max)
> 0/0/0/127176 16k jumbo clusters in use (current/cache/total/max)
> 913885K/107248K/1021134K bytes allocated to network (current/cache/total)
> 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
> 0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
> 0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
> 0/0/0 requests for jumbo clusters denied (4k/9k/16k)
> 0 requests for sfbufs denied
> 0 requests for sfbufs delayed
> 0 requests for I/O initiated by sendfile
>
> # ngctl list | wc -l
>        2
>
> # sysctl -a | egrep 'net.(inet.(tcp|udp)|graph|isr)'
> net.inet.tcp.rfc1323: 1
> net.inet.tcp.mssdflt: 536
> net.inet.tcp.keepidle: 7200000
> net.inet.tcp.keepintvl: 75000
> net.inet.tcp.sendspace: 32768
> net.inet.tcp.recvspace: 65536
> net.inet.tcp.keepinit: 75000
> net.inet.tcp.delacktime: 100
> net.inet.tcp.v6mssdflt: 1220
> net.inet.tcp.cc.algorithm: newreno
> net.inet.tcp.cc.available: newreno
> net.inet.tcp.hostcache.cachelimit: 15360
> net.inet.tcp.hostcache.hashsize: 512
> net.inet.tcp.hostcache.bucketlimit: 30
> net.inet.tcp.hostcache.count: 6
> net.inet.tcp.hostcache.expire: 3600
> net.inet.tcp.hostcache.prune: 300
> net.inet.tcp.hostcache.purge: 0
> net.inet.tcp.log_in_vain: 0
> net.inet.tcp.blackhole: 0
> net.inet.tcp.delayed_ack: 1
> net.inet.tcp.drop_synfin: 0
> net.inet.tcp.rfc3042: 1
> net.inet.tcp.rfc3390: 1
> net.inet.tcp.experimental.initcwnd10: 1
> net.inet.tcp.rfc3465: 1
> net.inet.tcp.abc_l_var: 2
> net.inet.tcp.ecn.enable: 0
> net.inet.tcp.ecn.maxretries: 1
> net.inet.tcp.insecure_rst: 0
> net.inet.tcp.recvbuf_auto: 1
> net.inet.tcp.recvbuf_inc: 16384
> net.inet.tcp.recvbuf_max: 2097152
> net.inet.tcp.path_mtu_discovery: 1
> net.inet.tcp.tso: 1
> net.inet.tcp.sendbuf_auto: 1
> net.inet.tcp.sendbuf_inc: 8192
> net.inet.tcp.sendbuf_max: 2097152
> net.inet.tcp.reass.maxsegments: 95400
> net.inet.tcp.reass.cursegments: 0
> net.inet.tcp.reass.overflows: 0
> net.inet.tcp.sack.enable: 1
> net.inet.tcp.sack.maxholes: 128
> net.inet.tcp.sack.globalmaxholes: 65536
> net.inet.tcp.sack.globalholes: 0
> net.inet.tcp.minmss: 216
> net.inet.tcp.log_debug: 0
> net.inet.tcp.tcbhashsize: 262144
> net.inet.tcp.do_tcpdrain: 1
> net.inet.tcp.pcbcount: 17
> net.inet.tcp.icmp_may_rst: 1
> net.inet.tcp.isn_reseed_interval: 0
> net.inet.tcp.soreceive_stream: 0
> net.inet.tcp.syncookies: 1
> net.inet.tcp.syncookies_only: 0
> net.inet.tcp.syncache.bucketlimit: 30
> net.inet.tcp.syncache.cachelimit: 15375
> net.inet.tcp.syncache.count: 0
> net.inet.tcp.syncache.hashsize: 512
> net.inet.tcp.syncache.rexmtlimit: 3
> net.inet.tcp.syncache.rst_on_sock_fail: 1
> net.inet.tcp.msl: 30000
> net.inet.tcp.rexmit_min: 30
> net.inet.tcp.rexmit_slop: 200
> net.inet.tcp.always_keepalive: 1
> net.inet.tcp.fast_finwait2_recycle: 0
> net.inet.tcp.finwait2_timeout: 60000
> net.inet.tcp.keepcnt: 8
> net.inet.tcp.rexmit_drop_options: 0
> net.inet.tcp.per_cpu_timers: 0
> net.inet.tcp.timer_race: 0
> net.inet.tcp.maxtcptw: 27767
> net.inet.tcp.nolocaltimewait: 0
> net.inet.udp.checksum: 1
> net.inet.udp.maxdgram: 9216
> net.inet.udp.recvspace: 42080
> net.inet.udp.log_in_vain: 0
> net.inet.udp.blackhole: 0
> net.isr.dispatch: direct
> net.isr.maxthreads: 1
> net.isr.bindthreads: 0
> net.isr.maxqlimit: 10240
> net.isr.defaultqlimit: 256
> net.isr.maxprot: 16
> net.isr.numthreads: 1
> net.graph.threads: 12
> net.graph.maxalloc: 4096
> net.graph.maxdata: 512
> net.graph.abi_version: 12
> net.graph.msg_version: 8
> net.graph.maxdgram: 20480
> net.graph.recvspace: 20480
> net.graph.family: 32
> net.graph.data.proto: 1
> net.graph.control.proto: 2
>
> Once again, I am ready to provide additional metrics and run more tests
> upon request.
>
> Thank you,
> Rumen Telbizov
>