Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 2 Sep 2010 09:13:46 +0100
From:      Melissa Jenkins <melissa-freebsd@littlebluecar.co.uk>
To:        freebsd-net@freebsd.org
Subject:   NFE adapter 'hangs'
Message-ID:  <5C261F16-6530-47EE-B1C1-BA38CD6D8B01@littlebluecar.co.uk>

next in thread | raw e-mail | index | archive | help
Hiya,

I've been having trouble with two different machines (FBSD 8.0p3 & FBSD =
7.0p5) using the NFE network adapter.  The machines are, respectively, =
Sun X2200 (AMD64) and a Sun X2100M2 (AMD64) and both are running the =
amd64 kernel.=20

Basically what appears to happen is that traffic stops flowing through =
the interface and 'No buffer space available' error messages are =
produced when trying to send icmp packets. All establish connections =
appear to hang.

The machines are running as packet routers, and nfe0 is acting as the =
'lan' side.  PF is being used for filtering, NAT, BINAT and RDR.  The =
same PF configuration works correctly on two other servers using =
different network adapters. One of them is configured with pfsync & =
CARP, but the other one isn't.

The problem seems to happen under fairly light number of sessions ( < =
100 active states in PF) though the more states the quicker it occurs.  =
It is possible it's related to packet rates as putting on high bandwidth =
clients seems to produce the problem very quickly (several minutes) This =
is reinforced by the fact that the problem first manifested when we =
upgraded one of the leased lines.

Executing ifconfig nfe0 down && ifconfig nfe0 up will restart traffic =
flow. =20

Neither box is very highly loaded, generally around ~ 1.5 Mb/s.  This =
doesn't appear to be related to the amount of traffic as I have tried =
re-routing 95% of traffic around the server without any improvement in =
performance.  The traffic profile is fairly random - a mix of TCP and =
UDP, mostly flowing OUT of nfe0.  It is all L3 and there are  less than =
5 hosts on the segment attached to the nfe interface.

Both boxes are in different locations and are connected to different =
types of Cisco switches.  Both appear to autonegotiate correctly and the =
switch ports show no status changes.

It appears that PFSync, CARP & a GRE tunnel works correctly over the NFE =
interface for long periods of time (weeks +) And that it is something to =
do adding other traffic to the mix that is resulting in the interface =
'hanging'.

If I move the traffic from NFE to the other BGE interface (the one =
shared with the LOM) everything is stable and works correctly.  I have =
not been able to reproduce this using test loads, and the interface =
worked correctly with iperf testing prior to deployment.  I =
unfortunately (legal reasons) can't provide a traffic trace up to the =
time it occurs though everything looks normal to me.

The FreeBSD 7 X2100 lists the following from PCI conf:
nfe0@pci0:0:8:0:        class=3D0x068000 card=3D0x534c108e =
chip=3D0x037310de rev=3D0xa3 hdr=3D0x00
   vendor     =3D 'Nvidia Corp'
   device     =3D 'MCP55 Ethernet'
   class      =3D bridge
nfe1@pci0:0:9:0:        class=3D0x068000 card=3D0x534c108e =
chip=3D0x037310de rev=3D0xa3 hdr=3D0x00
   vendor     =3D 'Nvidia Corp'
   device     =3D 'MCP55 Ethernet'
   class      =3D bridge

The FreeBSD 8 X2200 lists the same thing:
nfe0@pci0:0:8:0:        class=3D0x068000 card=3D0x534b108e =
chip=3D0x037310de rev=3D0xa3 hdr=3D0x00
   vendor     =3D 'Nvidia Corp'
   device     =3D 'MCP55 Ethernet'
   class      =3D bridge
nfe1@pci0:0:9:0:        class=3D0x068000 card=3D0x534b108e =
chip=3D0x037310de rev=3D0xa3 hdr=3D0x00
   vendor     =3D 'Nvidia Corp'
   device     =3D 'MCP55 Ethernet'
   class      =3D bridge


Here are the two obvious tests (both from the FreeBSD 7 box), but the =
icmp response & the mbuf stats are very much the same on both boxes.

ping 172.31.3.129
PING 172.31.3.129 (172.31.3.129): 56 data bytes
ping: sendto: No buffer space available
ping: sendto: No buffer space available
^C

-- 172.31.3.129 ping statistics ---
2 packets transmitted, 0 packets received, 100.0% packet loss

netstat -m
852/678/1530 mbufs in use (current/cache/total)
818/448/1266/25600 mbuf clusters in use (current/cache/total/max)
817/317 mbuf+clusters out of packet secondary zone in use =
(current/cache)
0/362/362/12800 4k (page size) jumbo clusters in use =
(current/cache/total/max)
0/0/0/6400 9k jumbo clusters in use (current/cache/total/max)
0/0/0/3200 16k jumbo clusters in use (current/cache/total/max)
1879K/2513K/4392K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0/0/0 sfbufs in use (current/peak/max)
0 requests for sfbufs denied
0 requests for sfbufs delayed
0 requests for I/O initiated by sendfile
0 calls to protocol drain routines

=46rom the other machine, after the problem has occurred & and ifconfig =
down/up cycle has been done (ie when the interface is working)
vmstat -z=20
mbuf_packet:              256,        0,     1033,     1783, 330792410,  =
      0
mbuf:                     256,        0,        5,     1664, 395145472,  =
      0
mbuf_cluster:            2048,    25600,     2818,     1690, 13234653,   =
     0
mbuf_jumbo_page:         4096,    12800,        0,      336,   297749,   =
     0
mbuf_jumbo_9k:           9216,     6400,        0,        0,        0,   =
     0
mbuf_jumbo_16k:         16384,     3200,        0,        0,        0,   =
     0
mbuf_ext_refcnt:            4,        0,        0,        0,        0,   =
     0


Although I failed to keep a copy I don't believe there is a kmem problem

I'm at a complete loss as to what to try next :( =20

All suggestions very gratefully received!!!  The 7.0 box is live so =
can't really be played with but I can occasionally run tests on the =
other box

Thank you :)
Mel





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5C261F16-6530-47EE-B1C1-BA38CD6D8B01>