Date: Thu, 2 Sep 2010 09:13:46 +0100 From: Melissa Jenkins <melissa-freebsd@littlebluecar.co.uk> To: freebsd-net@freebsd.org Subject: NFE adapter 'hangs' Message-ID: <5C261F16-6530-47EE-B1C1-BA38CD6D8B01@littlebluecar.co.uk>
next in thread | raw e-mail | index | archive | help
Hiya, I've been having trouble with two different machines (FBSD 8.0p3 & FBSD = 7.0p5) using the NFE network adapter. The machines are, respectively, = Sun X2200 (AMD64) and a Sun X2100M2 (AMD64) and both are running the = amd64 kernel.=20 Basically what appears to happen is that traffic stops flowing through = the interface and 'No buffer space available' error messages are = produced when trying to send icmp packets. All establish connections = appear to hang. The machines are running as packet routers, and nfe0 is acting as the = 'lan' side. PF is being used for filtering, NAT, BINAT and RDR. The = same PF configuration works correctly on two other servers using = different network adapters. One of them is configured with pfsync & = CARP, but the other one isn't. The problem seems to happen under fairly light number of sessions ( < = 100 active states in PF) though the more states the quicker it occurs. = It is possible it's related to packet rates as putting on high bandwidth = clients seems to produce the problem very quickly (several minutes) This = is reinforced by the fact that the problem first manifested when we = upgraded one of the leased lines. Executing ifconfig nfe0 down && ifconfig nfe0 up will restart traffic = flow. =20 Neither box is very highly loaded, generally around ~ 1.5 Mb/s. This = doesn't appear to be related to the amount of traffic as I have tried = re-routing 95% of traffic around the server without any improvement in = performance. The traffic profile is fairly random - a mix of TCP and = UDP, mostly flowing OUT of nfe0. It is all L3 and there are less than = 5 hosts on the segment attached to the nfe interface. Both boxes are in different locations and are connected to different = types of Cisco switches. Both appear to autonegotiate correctly and the = switch ports show no status changes. It appears that PFSync, CARP & a GRE tunnel works correctly over the NFE = interface for long periods of time (weeks +) And that it is something to = do adding other traffic to the mix that is resulting in the interface = 'hanging'. If I move the traffic from NFE to the other BGE interface (the one = shared with the LOM) everything is stable and works correctly. I have = not been able to reproduce this using test loads, and the interface = worked correctly with iperf testing prior to deployment. I = unfortunately (legal reasons) can't provide a traffic trace up to the = time it occurs though everything looks normal to me. The FreeBSD 7 X2100 lists the following from PCI conf: nfe0@pci0:0:8:0: class=3D0x068000 card=3D0x534c108e = chip=3D0x037310de rev=3D0xa3 hdr=3D0x00 vendor =3D 'Nvidia Corp' device =3D 'MCP55 Ethernet' class =3D bridge nfe1@pci0:0:9:0: class=3D0x068000 card=3D0x534c108e = chip=3D0x037310de rev=3D0xa3 hdr=3D0x00 vendor =3D 'Nvidia Corp' device =3D 'MCP55 Ethernet' class =3D bridge The FreeBSD 8 X2200 lists the same thing: nfe0@pci0:0:8:0: class=3D0x068000 card=3D0x534b108e = chip=3D0x037310de rev=3D0xa3 hdr=3D0x00 vendor =3D 'Nvidia Corp' device =3D 'MCP55 Ethernet' class =3D bridge nfe1@pci0:0:9:0: class=3D0x068000 card=3D0x534b108e = chip=3D0x037310de rev=3D0xa3 hdr=3D0x00 vendor =3D 'Nvidia Corp' device =3D 'MCP55 Ethernet' class =3D bridge Here are the two obvious tests (both from the FreeBSD 7 box), but the = icmp response & the mbuf stats are very much the same on both boxes. ping 172.31.3.129 PING 172.31.3.129 (172.31.3.129): 56 data bytes ping: sendto: No buffer space available ping: sendto: No buffer space available ^C -- 172.31.3.129 ping statistics --- 2 packets transmitted, 0 packets received, 100.0% packet loss netstat -m 852/678/1530 mbufs in use (current/cache/total) 818/448/1266/25600 mbuf clusters in use (current/cache/total/max) 817/317 mbuf+clusters out of packet secondary zone in use = (current/cache) 0/362/362/12800 4k (page size) jumbo clusters in use = (current/cache/total/max) 0/0/0/6400 9k jumbo clusters in use (current/cache/total/max) 0/0/0/3200 16k jumbo clusters in use (current/cache/total/max) 1879K/2513K/4392K bytes allocated to network (current/cache/total) 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) 0/0/0 requests for jumbo clusters denied (4k/9k/16k) 0/0/0 sfbufs in use (current/peak/max) 0 requests for sfbufs denied 0 requests for sfbufs delayed 0 requests for I/O initiated by sendfile 0 calls to protocol drain routines =46rom the other machine, after the problem has occurred & and ifconfig = down/up cycle has been done (ie when the interface is working) vmstat -z=20 mbuf_packet: 256, 0, 1033, 1783, 330792410, = 0 mbuf: 256, 0, 5, 1664, 395145472, = 0 mbuf_cluster: 2048, 25600, 2818, 1690, 13234653, = 0 mbuf_jumbo_page: 4096, 12800, 0, 336, 297749, = 0 mbuf_jumbo_9k: 9216, 6400, 0, 0, 0, = 0 mbuf_jumbo_16k: 16384, 3200, 0, 0, 0, = 0 mbuf_ext_refcnt: 4, 0, 0, 0, 0, = 0 Although I failed to keep a copy I don't believe there is a kmem problem I'm at a complete loss as to what to try next :( =20 All suggestions very gratefully received!!! The 7.0 box is live so = can't really be played with but I can occasionally run tests on the = other box Thank you :) Mel
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5C261F16-6530-47EE-B1C1-BA38CD6D8B01>