Date: Thu, 20 Mar 2014 09:26:38 -0400 From: Garrett Wollman <wollman@bimajority.org> To: freebsd-net@freebsd.org, freebsd-stable@freebsd.org Cc: jackv@freebsd.org Subject: Network stack returning EFBIG? Message-ID: <21290.60558.750106.630804@hergotha.csail.mit.edu>
next in thread | raw e-mail | index | archive | help
I recently put a new server running 9.2 (with a local patches for NFS) into production, and it's immediately started to fail in an odd way. Since I pounded this server pretty heavily and never saw the error in testing, I'm more than a little bit taken aback. We have identical hardware in production with 9.1, and I have the same kernel running just peachy on a machine with Chelsio T4 NICs. The problem machine has ixgbe(4): ix0: <Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 2.5.15> port 0x9c00-0x9c1f mem 0xdef80000-0xdeffffff,0xdef7c000-0xdef7ffff irq 24 at device 0.0 on pci2 ix0: Using MSIX interrupts with 7 vectors ix0: Ethernet address: 04:7d:7b:a5:87:32 ix0: PCI Express Bus: Speed 5.0GT/s Width x4 ix1: <Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 2.5.15> port 0x9880-0x989f mem 0xdee80000-0xdeefffff,0xdee7c000-0xdee7ffff irq 34 at device 0.1 on pci2 ix1: Using MSIX interrupts with 7 vectors ix1: Ethernet address: 04:7d:7b:a5:87:33 ix1: PCI Express Bus: Speed 5.0GT/s Width x4 (pciconf tells me these are "82599EB 10-Gigabit SFI/SFP+ Network Connection". It's a bug that the driver doesn't tell me that.) These are glued together in a lagg(4) using LACP. Since we put this server into production, random network system calls have started failing with [EFBIG] or maybe sometimes [EIO]. I've observed this with a simple ping, but various daemons also log the errors: Mar 20 09:22:04 nfs-prod-4 sshd[42487]: fatal: Write failed: File too large [preauth] Mar 20 09:23:44 nfs-prod-4 nrpe[42492]: Error: Could not complete SSL handshake. 5 The machine eventually becomes unreachable and has to be rebooted from the console. So, can anyone tell me how this is possible, and what changed between 9.1 and 9.2 to cause it? -GAWollman
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?21290.60558.750106.630804>