From owner-freebsd-bugs@freebsd.org Tue Mar 7 09:08:24 2017 Return-Path: Delivered-To: freebsd-bugs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id E965FCFAE40 for ; Tue, 7 Mar 2017 09:08:24 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id CF8EA1844 for ; Tue, 7 Mar 2017 09:08:24 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id v2798OVL021274 for ; Tue, 7 Mar 2017 09:08:24 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-bugs@FreeBSD.org Subject: [Bug 217606] Bridge stops working after some days Date: Tue, 07 Mar 2017 09:08:24 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 11.0-RELEASE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: aiko@torrentkino.de X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-bugs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version rep_platform op_sys bug_status bug_severity priority component assigned_to reporter Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Mar 2017 09:08:25 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D217606 Bug ID: 217606 Summary: Bridge stops working after some days Product: Base System Version: 11.0-RELEASE Hardware: amd64 OS: Any Status: New Severity: Affects Some People Priority: --- Component: kern Assignee: freebsd-bugs@FreeBSD.org Reporter: aiko@torrentkino.de Hello, we recently upgraded our Bridging FWs from 10.1-RELEASE-pxx to 11.0-RELEASE= -p8. And since then they stop passing through traffic after some time. In this c= ase after ~4 days. One of them stopped yesterday evening. (We have a failover mechanism to reduce the impact.) $ uptime 9:26AM up 4 days, 19:22, 2 users, load averages: 0.12, 0.06, 0.01 bridge0 consists of ix0/ix1: ix0: po= rt 0xecc0-0xecdf mem 0xd9e80000-0xd9efffff,0xd9ff8000-0xd9ffbfff irq 48 at dev= ice 0.0 numa-domain 0 on pci2 ix1: po= rt 0xece0-0xecff mem 0xd9f00000-0xd9f7ffff,0xd9ffc000-0xd9ffffff irq 52 at dev= ice 0.1 numa-domain 0 on pci2 In case of error I see the following for IPv4. The bridge does IPv6 as well. Same problem. ix0: A load balancer is asking for its default GW. No reply... $ tcpdump -i ix0 \( arp \) 09:37:47.330361 ARP, Request who-has A.A.A.A tell B.B.B.B, length 46 ix1: The default GW actually sends a reply. I can see it on ix1. $ tcpdump -i ix1 \( arp \) 09:38:59.328956 ARP, Request who-has A.A.A.A tell B.B.B.B, length 46 09:38:59.329374 ARP, Reply A.A.A.A is-at 00:00:0a:0b:0c:0d (oui Cisco), len= gth 46 A tcpdump for bridge0 show the same as ix1. Some numbers of the currently not working system: $ netstat -m 82409/6901/89310 mbufs in use (current/cache/total) 38692/4094/42786/1015426 mbuf clusters in use (current/cache/total/max) 38692/4065 mbuf+clusters out of packet secondary zone in use (current/cache) 0/192/192/507713 4k (page size) jumbo clusters in use (current/cache/total/= max) 0/0/0/150433 9k jumbo clusters in use (current/cache/total/max) 0/0/0/84618 16k jumbo clusters in use (current/cache/total/max) 97986K/10681K/108667K bytes allocated to network (current/cache/total) 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) 0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters) 0/0/0 requests for jumbo clusters delayed (4k/9k/16k) 0/0/0 requests for jumbo clusters denied (4k/9k/16k) 0 sendfile syscalls 0 sendfile syscalls completed without I/O request 0 requests for I/O initiated by sendfile 0 pages read by sendfile as part of a request 0 pages were valid at time of a sendfile request 0 pages were requested for read ahead by applications 0 pages were read ahead by sendfile 0 times sendfile encountered an already busy page 0 requests for sfbufs denied 0 requests for sfbufs delayed $ netstat -b -d -h -i bridge0 Name Mtu Network Address Ipkts Ierrs Idrop Ibytes= =20=20=20 Opkts Oerrs Obytes Coll Drop ix0 1.5K 00:00:00:00:00:0a 12G 0 0 11T= =20=20=20=20 7.9G 0 1.1T 0 335k ix1 1.5K 00:00:00:00:00:0b 7.9G 0 0 1.2T= =20=20=20=20 12G 0 11T 0 0 bridg 1.5K 00:00:00:00:00:0c 20G 0 0 12T= =20=20=20=20 20G 335k 12T 0 0 What I did so far: # Disable Ethernet Flow-Control # https://wiki.freebsd.org/10gFreeBSD/Router dev.ix.0.fc=3D0 dev.ix.1.fc=3D0 # Disable TSO cloned_interfaces=3D"bridge0" ifconfig_bridge0=3D"addm ix0 addm ix1 up" ifconfig_ix0=3D"up -tso" ifconfig_ix1=3D"up -tso" I found the following bug reports: 2004: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D185633 2016: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D212749 And since this system uses PF and Scrubbing. I applied this patch manually: https://reviews.freebsd.org/D7780 But I have no success so far. Shutting down ix0/ix1 and bringing them up makes brigde0 responsive again. = But time now works against me. Netstat after that procedure: $ netstat -m 33281/56284/89565 mbufs in use (current/cache/total) 33280/9756/43036/2015426 mbuf clusters in use (current/cache/total/max) 33280/9730 mbuf+clusters out of packet secondary zone in use (current/cache) 0/192/192/507713 4k (page size) jumbo clusters in use (current/cache/total/= max) 0/0/0/150433 9k jumbo clusters in use (current/cache/total/max) 0/0/0/84618 16k jumbo clusters in use (current/cache/total/max) 74880K/34351K/109231K bytes allocated to network (current/cache/total) 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) 0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters) 0/0/0 requests for jumbo clusters delayed (4k/9k/16k) 0/0/0 requests for jumbo clusters denied (4k/9k/16k) 0 sendfile syscalls 0 sendfile syscalls completed without I/O request 0 requests for I/O initiated by sendfile 0 pages read by sendfile as part of a request 0 pages were valid at time of a sendfile request 0 pages were requested for read ahead by applications 0 pages were read ahead by sendfile 0 times sendfile encountered an already busy page 0 requests for sfbufs denied 0 requests for sfbufs delayed Kind regards, Aiko --=20 You are receiving this mail because: You are the assignee for the bug.=