From owner-freebsd-net@FreeBSD.ORG Thu Sep 2 08:56:24 2010 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id ECD0E1065743 for ; Thu, 2 Sep 2010 08:56:24 +0000 (UTC) (envelope-from melissa-freebsd@littlebluecar.co.uk) Received: from filter.blacknosugar.com (filter.blacknosugar.com [212.13.204.214]) by mx1.freebsd.org (Postfix) with ESMTP id 7B0EE8FC20 for ; Thu, 2 Sep 2010 08:56:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=littlebluecar.co.uk; s=dkim; h=Subject:Mime-Version:To:Message-Id:Date:Content-Transfer-Encoding:Content-Type:From; bh=CnrUwDQhwQL2MZWyMHxY+krLTX5loQV9gcdYnCoyyZ8=; b=bj5nVVcq8ZSotqdQwvwNLNE2BY5MlyMqRYKZlubHwfEL2taWWk7sB5uGYnS1kwc+uOnP26ZVTk7vSTzJrJPEhO1vOrxUCzmHwZ+tBOhkAfPA286mSx5epNX9nEXocmml; Received: from bowser.blacknosugar.com ([78.86.203.16] helo=[192.168.1.47]) by filter.blacknosugar.com with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.71 (FreeBSD)) (envelope-from ) id 1Or4vg-000HP4-1h for freebsd-net@freebsd.org; Thu, 02 Sep 2010 09:13:54 +0100 From: Melissa Jenkins Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Date: Thu, 2 Sep 2010 09:13:46 +0100 Message-Id: <5C261F16-6530-47EE-B1C1-BA38CD6D8B01@littlebluecar.co.uk> To: freebsd-net@freebsd.org Mime-Version: 1.0 (Apple Message framework v1081) X-Mailer: Apple Mail (2.1081) X-SA-Exim-Connect-IP: 78.86.203.16 X-SA-Exim-Mail-From: melissa-freebsd@littlebluecar.co.uk X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on filter X-Spam-Level: X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED,BAYES_00 autolearn=ham version=3.3.1 X-SA-Exim-Version: 4.2 X-SA-Exim-Scanned: Yes (on filter.blacknosugar.com) Subject: NFE adapter 'hangs' X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 02 Sep 2010 08:56:25 -0000 Hiya, I've been having trouble with two different machines (FBSD 8.0p3 & FBSD = 7.0p5) using the NFE network adapter. The machines are, respectively, = Sun X2200 (AMD64) and a Sun X2100M2 (AMD64) and both are running the = amd64 kernel.=20 Basically what appears to happen is that traffic stops flowing through = the interface and 'No buffer space available' error messages are = produced when trying to send icmp packets. All establish connections = appear to hang. The machines are running as packet routers, and nfe0 is acting as the = 'lan' side. PF is being used for filtering, NAT, BINAT and RDR. The = same PF configuration works correctly on two other servers using = different network adapters. One of them is configured with pfsync & = CARP, but the other one isn't. The problem seems to happen under fairly light number of sessions ( < = 100 active states in PF) though the more states the quicker it occurs. = It is possible it's related to packet rates as putting on high bandwidth = clients seems to produce the problem very quickly (several minutes) This = is reinforced by the fact that the problem first manifested when we = upgraded one of the leased lines. Executing ifconfig nfe0 down && ifconfig nfe0 up will restart traffic = flow. =20 Neither box is very highly loaded, generally around ~ 1.5 Mb/s. This = doesn't appear to be related to the amount of traffic as I have tried = re-routing 95% of traffic around the server without any improvement in = performance. The traffic profile is fairly random - a mix of TCP and = UDP, mostly flowing OUT of nfe0. It is all L3 and there are less than = 5 hosts on the segment attached to the nfe interface. Both boxes are in different locations and are connected to different = types of Cisco switches. Both appear to autonegotiate correctly and the = switch ports show no status changes. It appears that PFSync, CARP & a GRE tunnel works correctly over the NFE = interface for long periods of time (weeks +) And that it is something to = do adding other traffic to the mix that is resulting in the interface = 'hanging'. If I move the traffic from NFE to the other BGE interface (the one = shared with the LOM) everything is stable and works correctly. I have = not been able to reproduce this using test loads, and the interface = worked correctly with iperf testing prior to deployment. I = unfortunately (legal reasons) can't provide a traffic trace up to the = time it occurs though everything looks normal to me. The FreeBSD 7 X2100 lists the following from PCI conf: nfe0@pci0:0:8:0: class=3D0x068000 card=3D0x534c108e = chip=3D0x037310de rev=3D0xa3 hdr=3D0x00 vendor =3D 'Nvidia Corp' device =3D 'MCP55 Ethernet' class =3D bridge nfe1@pci0:0:9:0: class=3D0x068000 card=3D0x534c108e = chip=3D0x037310de rev=3D0xa3 hdr=3D0x00 vendor =3D 'Nvidia Corp' device =3D 'MCP55 Ethernet' class =3D bridge The FreeBSD 8 X2200 lists the same thing: nfe0@pci0:0:8:0: class=3D0x068000 card=3D0x534b108e = chip=3D0x037310de rev=3D0xa3 hdr=3D0x00 vendor =3D 'Nvidia Corp' device =3D 'MCP55 Ethernet' class =3D bridge nfe1@pci0:0:9:0: class=3D0x068000 card=3D0x534b108e = chip=3D0x037310de rev=3D0xa3 hdr=3D0x00 vendor =3D 'Nvidia Corp' device =3D 'MCP55 Ethernet' class =3D bridge Here are the two obvious tests (both from the FreeBSD 7 box), but the = icmp response & the mbuf stats are very much the same on both boxes. ping 172.31.3.129 PING 172.31.3.129 (172.31.3.129): 56 data bytes ping: sendto: No buffer space available ping: sendto: No buffer space available ^C -- 172.31.3.129 ping statistics --- 2 packets transmitted, 0 packets received, 100.0% packet loss netstat -m 852/678/1530 mbufs in use (current/cache/total) 818/448/1266/25600 mbuf clusters in use (current/cache/total/max) 817/317 mbuf+clusters out of packet secondary zone in use = (current/cache) 0/362/362/12800 4k (page size) jumbo clusters in use = (current/cache/total/max) 0/0/0/6400 9k jumbo clusters in use (current/cache/total/max) 0/0/0/3200 16k jumbo clusters in use (current/cache/total/max) 1879K/2513K/4392K bytes allocated to network (current/cache/total) 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) 0/0/0 requests for jumbo clusters denied (4k/9k/16k) 0/0/0 sfbufs in use (current/peak/max) 0 requests for sfbufs denied 0 requests for sfbufs delayed 0 requests for I/O initiated by sendfile 0 calls to protocol drain routines =46rom the other machine, after the problem has occurred & and ifconfig = down/up cycle has been done (ie when the interface is working) vmstat -z=20 mbuf_packet: 256, 0, 1033, 1783, 330792410, = 0 mbuf: 256, 0, 5, 1664, 395145472, = 0 mbuf_cluster: 2048, 25600, 2818, 1690, 13234653, = 0 mbuf_jumbo_page: 4096, 12800, 0, 336, 297749, = 0 mbuf_jumbo_9k: 9216, 6400, 0, 0, 0, = 0 mbuf_jumbo_16k: 16384, 3200, 0, 0, 0, = 0 mbuf_ext_refcnt: 4, 0, 0, 0, 0, = 0 Although I failed to keep a copy I don't believe there is a kmem problem I'm at a complete loss as to what to try next :( =20 All suggestions very gratefully received!!! The 7.0 box is live so = can't really be played with but I can occasionally run tests on the = other box Thank you :) Mel