Date: Fri, 6 May 2016 21:11:29 -0300 From: Ze Claudio Pastore <zclaudio@bsd.com.br> To: Ryan Stone <rysto32@gmail.com> Cc: freebsd-net <freebsd-net@freebsd.org> Subject: Re: Regression? VLAN packet drop after upgrading from r281235 Message-ID: <CAEGk6G7Bgri-TvL-MDMcTuK3vfY5w2Nw=O8immWQeyaetaohpA@mail.gmail.com> In-Reply-To: <CAEGk6G4SxNfb8Ph=Cq0rRATPvFwFqF9jgg%2BsMvMUhc8z554osw@mail.gmail.com> References: <CAEGk6G4rq=yE14rDcxhJZZ0drstr=fse%2B9aemVYqdt68Gg=bpQ@mail.gmail.com> <CAFMmRNyY67RGyb8%2BaS=HCLEpzki3n0JiT5QYXO5xnjz5vyYxMA@mail.gmail.com> <CAEGk6G4SxNfb8Ph=Cq0rRATPvFwFqF9jgg%2BsMvMUhc8z554osw@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
OK I submitted a Bug Report, if someone else get's a similar problem. https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D209351 2016-04-27 18:10 GMT-03:00 Z=C3=A9 Claudio Pastore <zclaudio@bsd.com.br>: > Hello Ryan, > > 2016-04-27 17:28 GMT-03:00 Ryan Stone <rysto32@gmail.com>: > >> From a quick look at the vlan code, I can identify a few cases that migh= t >> cause that counter to increment: >> >> 1) Error from the underlying ixgbe device. Does "netstat -dI ix0" show >> that the driver has been dropping packets? >> > > No, it does not increase drop counters on ix port, only on the vlan devic= e. > > >> >> 2) Link down events on the underlying NIC. I believe that link flaps >> will be logged to /var/log/messages and dmesg; do you see anything there >> that might correspond to the time of the packet drops? >> > > No, dmesg is clean, only a couple down/up link when I actually did > disconnect the port, and no other message on /var/log/messages that grabs > my attention. > > >> >> 3) If VLAN_HWTAGGING is disabled through ifconfig on the port, then in >> theory a low memory event could cause the packet to be dropped. Does >> "netstat -m" show that "requests for mbufs denied" increasing? >> > > Here is the ifconfig -v output for the vlan6 on the 10.1-STABLE system > > vlan6: flags=3D8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu = 1500 > options=3D303<RXCSUM,TXCSUM,TSO4,TSO6> > ether a0:36:9f:2a:6d:ae > inet6 fe80::a236:9fff:fe2a:6dae%vlan6 prefixlen 64 scopeid 0x19 > inet6 2804:1054:bad:b1fe::1 prefixlen 64 > nd6 options=3D21<PERFORMNUD,AUTO_LINKLOCAL> > media: Ethernet autoselect (10Gbase-SR <full-duplex>) > status: active > vlan: 3005 parent interface: ix3 > groups: vlan > > And here it is on the 10.3-STABLE system, I dont know why the only > difference is no options were printed on the newer system, everything els= e > is the same. > > vlan6: flags=3D8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu = 1500 > ether a0:36:9f:2a:6d:ae > inet6 fe80::a236:9fff:fe2a:6dae%vlan6 prefixlen 64 scopeid 0x19 > inet6 2804:1054:bad:b1fe::1 prefixlen 64 > nd6 options=3D21<PERFORMNUD,AUTO_LINKLOCAL> > media: Ethernet autoselect (10Gbase-SR <full-duplex>) > status: active > vlan: 3005 parent interface: ix3 > groups: vlan > > This is the netstat -m output when system has packet loss. Denied and > delayed counters are zeroed. > > % netstat -m > 12365/21040/33405 mbufs in use (current/cache/total) > 12310/14530/26840/505076 mbuf clusters in use (current/cache/total/max) > 12310/14508 mbuf+clusters out of packet secondary zone in use > (current/cache) > 0/225/225/252538 4k (page size) jumbo clusters in use > (current/cache/total/max) > 0/0/0/74826 9k jumbo clusters in use (current/cache/total/max) > 0/0/0/42089 16k jumbo clusters in use (current/cache/total/max) > 27711K/35220K/62931K bytes allocated to network (current/cache/total) > 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) > 0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters) > 0/0/0 requests for jumbo clusters delayed (4k/9k/16k) > 0/0/0 requests for jumbo clusters denied (4k/9k/16k) > 0 requests for sfbufs denied > 0 requests for sfbufs delayed > 0 requests for I/O initiated by sendfile > > > >> >> On Wed, Apr 27, 2016 at 2:41 PM, Z=C3=A9 Claudio Pastore <zclaudio@bsd.c= om.br> >> wrote: >> >>> Hello, >>> >>> On a BGP border router I help manage, we run FreeBSD 10.1-STABLE, >>> version r281235 and it works fine for several years now. >>> >>> We have around 4Gbit/s and 1.8Mpps routed on peak while per port >>> interface >>> we peak at 300Kpps. >>> >>> Our quality metrics are measured with: >>> >>> ping -s 1472 -i 0.1 <our-other-ibgp-router> >>> >>> As well as iperf bidirecional. >>> >>> This metric is similar to what Speedy Test and SIMET tests are done and >>> our >>> customers reference. >>> >>> Systems working w/o problem: >>> - 10.1-STABLE / r281235 >>> >>> Systems tested with drops: >>> - 10.2-STABLE / r292035M >>> - 10.3-STABLE / r298705 >>> - 11.0-CURRENT / r295683 (downloaded snapshot from ftp.freebsd.org) >>> - 11.0-CURRENT Melifaro Routing Branch / r297731M >>> >>> While testing, when errors happen I can see output errs on the vlan por= t >>> on >>> the output from "netstat -w1 -I vlan6" >>> >>> input vlan6 output >>> packets errs idrops bytes packets errs bytes colls >>> 1 0 0 66 30557 2 33310968 0 >>> 1 0 0 105 31458 3 33912219 0 >>> 2 0 0 2954 32001 8 34983986 0 >>> 1 0 0 1512 33150 6 35942558 0 >>> 1 0 0 1512 33654 4 37311862 0 >>> 1 0 0 1512 34825 3 38213793 0 >>> 3 0 0 1683 35376 4 39488912 0 >>> 5 0 0 7280 32423 3 35551869 0 >>> >>> Problems may happen under high load (~200Kpps) or low load (~30Kpps) on= a >>> vlan port. The observed frame loss never happens on untagged ports, onl= y >>> vlan related. The observed loss happens with packets sized 900 bytes an= d >>> above but noticeably loss rate is higher with packets close to 1400 (14= 72 >>> is my reference size). >>> >>> Loss rate on all listed systems different from r281235 is 9-19% with >>> ping(1) and iperf, while it's 0% on r281235. >>> >>> First I believed it to be a Intel driver error on systems newer than >>> 10.1. >>> My reference card are dual port 82599EB 10-Gigabit SFI/SFP+ Network >>> Connection (2x2 on x8 PCIe bus, total 4x10G). But yesterday I replaced >>> Intel by Chelsio T5 and the problem is still exactly the same, so it's >>> not >>> related to card vendor. >>> >>> I always test the very same hardware, I have two SSD drives in this >>> router, >>> one for the 10.1 which just runs fine and the other disk to test the >>> various versions of FreeBSD. >>> >>> Only minor loader and sysctl confs are tweaked: >>> >>> kern.hz=3D2000 >>> net.inet.ip.redirect=3D1 # do not send IP redirects >>> net.inet.ip.accept_sourceroute=3D0 # drop source routed packets si= nce >>> they ca >>> net.inet.ip.sourceroute=3D0 # if source routed packets are >>> accepted th >>> net.inet.tcp.drop_synfin=3D1 # SYN/FIN packets get dropped o= n >>> initial c >>> net.inet.udp.blackhole=3D1 # drop udp packets destined for >>> closed soc >>> net.inet.tcp.blackhole=3D2 # drop tcp packets destined for >>> closed por >>> security.bsd.see_other_uids=3D0 >>> >>> Can anyone suggest what might be a fix/tuning for this behavior? Was >>> there >>> any relevant change on vlan code from particular revisions close to the >>> one >>> I run on 10.1 and later which would lead to such a big difference? >>> _______________________________________________ >>> freebsd-net@freebsd.org mailing list >>> https://lists.freebsd.org/mailman/listinfo/freebsd-net >>> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" >>> >> >> >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAEGk6G7Bgri-TvL-MDMcTuK3vfY5w2Nw=O8immWQeyaetaohpA>