Date: Sun, 21 Sep 2014 15:08:18 -0700 From: "K. Macy" <kmacy@freebsd.org> To: Rumen Telbizov <telbizov@gmail.com> Cc: Tom Elite <qingli@freebsd.org>, "freebsd-stable@freebsd.org" <freebsd-stable@freebsd.org> Subject: Re: FreeBSD 10 network performance problems Message-ID: <CAHM0Q_MXmn2P=vfFgaj5pZSqcTSm3h%2BKPAotc4K_8qpQUFh1dQ@mail.gmail.com> In-Reply-To: <CAENR%2B_UwjMGoOqKkhCeL4zmLnSgTiABoeR-7x71MBvOnCF8z%2BA@mail.gmail.com> References: <CAENR%2B_VDVvnY0zWKVXHOjz2vWw27s%2ByVrz9ZFokZ=p6P6oFNvw@mail.gmail.com> <1411259605.674669006.339g4pd4@frv35.fwdcdn.com> <CAENR%2B_UwjMGoOqKkhCeL4zmLnSgTiABoeR-7x71MBvOnCF8z%2BA@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
What you're dealing with is hardly an edge case. Most people don't need to push more than a couple of Gbps in production. Flowtable is hardly "untested." However, it has been a source of friction at times because it can be somewhat brittle, having limits on the number of cache entries that it can store that are frequently too low for people with very large numbers of active flows. Without raising this limit substantially these systems will fail in a rather spectacular fashion. Additionally, flowtable was not written with the intent of being a routing cache. It was developed to support stateful multipath routing for load balancing. In its current incarnation, stripped of much of the code for its initial purpose, it's really just a band-aid around locking problems in routing. That said, the handful of commercial users of FreeBSD that do have large amounts of traffic (10s of Gbps) per system that I personally know of all have flowtable enabled. Unfortunately, at least in terms of what is in HEAD, little has been done to fix the contention that flowtable works around. For your purposes the response that Adrian gave you is the closest to "optimal." I hope that helps. -K On Sun, Sep 21, 2014 at 2:31 PM, Rumen Telbizov <telbizov@gmail.com> wrote: > Thank you for your answers Adrian and Vladislav. > > Adrian: > I read this paper, > http://conferences.sigcomm.org/sigcomm/2009/workshops/presto/papers/p37.pdf, > and I left with the impression that the locking contentions on *rtentry* > have been solved some time around FreeBSD 8 release with the new routing > architecture and flowtable. I was wondering if this is really the case or > maybe I am dealing with an edge-case here. I cc Qing Li and Kip Macy for > further visibility and comments (original report at > http://lists.freebsd.org/pipermail/freebsd-stable/2014-September/080170.html). > > On the other hand https://wiki.freebsd.org/NetworkPerformanceTuning > advises: "*Do not use FLOWTABLE. It is still untested (2012-02-23).*" Is > that still the case? As mentioned previously I tried this kernel option > earlier and it had no effect. > > Additionally, on https://wiki.freebsd.org/NewNetworking I saw that there > are still open items with regards to "*rtentry locking*" and "*Contention > between CPUs when forwarding between multi-queue interfaces*". Not quite > sure if this is what I am dealing with. > > I also wonder if this lock contention is something new or I am dealing > with some strange edge-case. I read that people are able to push 10Gbit/s > on FreeBSD 9.2 (https://calomel.org/network_performance.html). Anybody > else seeing this around 4-5Gbit/s ? > > > Vladislav: > Here are the details that you requested (freshly booted system): > # pciconf -lv | grep -A 4 ix\[0-9\] > ix0@pci0:5:0:0: class=0x020000 card=0x00038086 chip=0x10fb8086 > rev=0x01 hdr=0x00 > vendor = 'Intel Corporation' > device = '82599EB 10-Gigabit SFI/SFP+ Network Connection' > class = network > subclass = ethernet > ix1@pci0:5:0:1: class=0x020000 card=0x00038086 chip=0x10fb8086 > rev=0x01 hdr=0x00 > vendor = 'Intel Corporation' > device = '82599EB 10-Gigabit SFI/SFP+ Network Connection' > class = network > subclass = ethernet > > # netstat -m > 100358/50182/150540 mbufs in use (current/cache/total) > 2048/47288/49336/1526116 mbuf clusters in use (current/cache/total/max) > 2048/47287 mbuf+clusters out of packet secondary zone in use > (current/cache) > 0/7/7/763057 4k (page size) jumbo clusters in use (current/cache/total/max) > 98300/11/98311/226091 9k jumbo clusters in use (current/cache/total/max) > 0/0/0/127176 16k jumbo clusters in use (current/cache/total/max) > 913885K/107248K/1021134K bytes allocated to network (current/cache/total) > 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) > 0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters) > 0/0/0 requests for jumbo clusters delayed (4k/9k/16k) > 0/0/0 requests for jumbo clusters denied (4k/9k/16k) > 0 requests for sfbufs denied > 0 requests for sfbufs delayed > 0 requests for I/O initiated by sendfile > > # ngctl list | wc -l > 2 > > # sysctl -a | egrep 'net.(inet.(tcp|udp)|graph|isr)' > net.inet.tcp.rfc1323: 1 > net.inet.tcp.mssdflt: 536 > net.inet.tcp.keepidle: 7200000 > net.inet.tcp.keepintvl: 75000 > net.inet.tcp.sendspace: 32768 > net.inet.tcp.recvspace: 65536 > net.inet.tcp.keepinit: 75000 > net.inet.tcp.delacktime: 100 > net.inet.tcp.v6mssdflt: 1220 > net.inet.tcp.cc.algorithm: newreno > net.inet.tcp.cc.available: newreno > net.inet.tcp.hostcache.cachelimit: 15360 > net.inet.tcp.hostcache.hashsize: 512 > net.inet.tcp.hostcache.bucketlimit: 30 > net.inet.tcp.hostcache.count: 6 > net.inet.tcp.hostcache.expire: 3600 > net.inet.tcp.hostcache.prune: 300 > net.inet.tcp.hostcache.purge: 0 > net.inet.tcp.log_in_vain: 0 > net.inet.tcp.blackhole: 0 > net.inet.tcp.delayed_ack: 1 > net.inet.tcp.drop_synfin: 0 > net.inet.tcp.rfc3042: 1 > net.inet.tcp.rfc3390: 1 > net.inet.tcp.experimental.initcwnd10: 1 > net.inet.tcp.rfc3465: 1 > net.inet.tcp.abc_l_var: 2 > net.inet.tcp.ecn.enable: 0 > net.inet.tcp.ecn.maxretries: 1 > net.inet.tcp.insecure_rst: 0 > net.inet.tcp.recvbuf_auto: 1 > net.inet.tcp.recvbuf_inc: 16384 > net.inet.tcp.recvbuf_max: 2097152 > net.inet.tcp.path_mtu_discovery: 1 > net.inet.tcp.tso: 1 > net.inet.tcp.sendbuf_auto: 1 > net.inet.tcp.sendbuf_inc: 8192 > net.inet.tcp.sendbuf_max: 2097152 > net.inet.tcp.reass.maxsegments: 95400 > net.inet.tcp.reass.cursegments: 0 > net.inet.tcp.reass.overflows: 0 > net.inet.tcp.sack.enable: 1 > net.inet.tcp.sack.maxholes: 128 > net.inet.tcp.sack.globalmaxholes: 65536 > net.inet.tcp.sack.globalholes: 0 > net.inet.tcp.minmss: 216 > net.inet.tcp.log_debug: 0 > net.inet.tcp.tcbhashsize: 262144 > net.inet.tcp.do_tcpdrain: 1 > net.inet.tcp.pcbcount: 17 > net.inet.tcp.icmp_may_rst: 1 > net.inet.tcp.isn_reseed_interval: 0 > net.inet.tcp.soreceive_stream: 0 > net.inet.tcp.syncookies: 1 > net.inet.tcp.syncookies_only: 0 > net.inet.tcp.syncache.bucketlimit: 30 > net.inet.tcp.syncache.cachelimit: 15375 > net.inet.tcp.syncache.count: 0 > net.inet.tcp.syncache.hashsize: 512 > net.inet.tcp.syncache.rexmtlimit: 3 > net.inet.tcp.syncache.rst_on_sock_fail: 1 > net.inet.tcp.msl: 30000 > net.inet.tcp.rexmit_min: 30 > net.inet.tcp.rexmit_slop: 200 > net.inet.tcp.always_keepalive: 1 > net.inet.tcp.fast_finwait2_recycle: 0 > net.inet.tcp.finwait2_timeout: 60000 > net.inet.tcp.keepcnt: 8 > net.inet.tcp.rexmit_drop_options: 0 > net.inet.tcp.per_cpu_timers: 0 > net.inet.tcp.timer_race: 0 > net.inet.tcp.maxtcptw: 27767 > net.inet.tcp.nolocaltimewait: 0 > net.inet.udp.checksum: 1 > net.inet.udp.maxdgram: 9216 > net.inet.udp.recvspace: 42080 > net.inet.udp.log_in_vain: 0 > net.inet.udp.blackhole: 0 > net.isr.dispatch: direct > net.isr.maxthreads: 1 > net.isr.bindthreads: 0 > net.isr.maxqlimit: 10240 > net.isr.defaultqlimit: 256 > net.isr.maxprot: 16 > net.isr.numthreads: 1 > net.graph.threads: 12 > net.graph.maxalloc: 4096 > net.graph.maxdata: 512 > net.graph.abi_version: 12 > net.graph.msg_version: 8 > net.graph.maxdgram: 20480 > net.graph.recvspace: 20480 > net.graph.family: 32 > net.graph.data.proto: 1 > net.graph.control.proto: 2 > > Once again, I am ready to provide additional metrics and run more tests > upon request. > > Thank you, > Rumen Telbizov >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAHM0Q_MXmn2P=vfFgaj5pZSqcTSm3h%2BKPAotc4K_8qpQUFh1dQ>