From owner-freebsd-net@FreeBSD.ORG Thu Mar 20 13:26:42 2014 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 7B8BFA6A; Thu, 20 Mar 2014 13:26:42 +0000 (UTC) Received: from hergotha.csail.mit.edu (wollman-1-pt.tunnel.tserv4.nyc4.ipv6.he.net [IPv6:2001:470:1f06:ccb::2]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 1C00F372; Thu, 20 Mar 2014 13:26:41 +0000 (UTC) Received: from hergotha.csail.mit.edu (localhost [127.0.0.1]) by hergotha.csail.mit.edu (8.14.7/8.14.7) with ESMTP id s2KDQd8J079804; Thu, 20 Mar 2014 09:26:39 -0400 (EDT) (envelope-from wollman@hergotha.csail.mit.edu) Received: (from wollman@localhost) by hergotha.csail.mit.edu (8.14.7/8.14.4/Submit) id s2KDQduJ079801; Thu, 20 Mar 2014 09:26:39 -0400 (EDT) (envelope-from wollman) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <21290.60558.750106.630804@hergotha.csail.mit.edu> Date: Thu, 20 Mar 2014 09:26:38 -0400 From: Garrett Wollman To: freebsd-net@freebsd.org, freebsd-stable@freebsd.org Subject: Network stack returning EFBIG? X-Mailer: VM 7.17 under 21.4 (patch 22) "Instant Classic" XEmacs Lucid X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3 (hergotha.csail.mit.edu [127.0.0.1]); Thu, 20 Mar 2014 09:26:39 -0400 (EDT) X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED autolearn=disabled version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on hergotha.csail.mit.edu Cc: jackv@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Mar 2014 13:26:42 -0000 I recently put a new server running 9.2 (with a local patches for NFS) into production, and it's immediately started to fail in an odd way. Since I pounded this server pretty heavily and never saw the error in testing, I'm more than a little bit taken aback. We have identical hardware in production with 9.1, and I have the same kernel running just peachy on a machine with Chelsio T4 NICs. The problem machine has ixgbe(4): ix0: port 0x9c00-0x9c1f mem 0xdef80000-0xdeffffff,0xdef7c000-0xdef7ffff irq 24 at device 0.0 on pci2 ix0: Using MSIX interrupts with 7 vectors ix0: Ethernet address: 04:7d:7b:a5:87:32 ix0: PCI Express Bus: Speed 5.0GT/s Width x4 ix1: port 0x9880-0x989f mem 0xdee80000-0xdeefffff,0xdee7c000-0xdee7ffff irq 34 at device 0.1 on pci2 ix1: Using MSIX interrupts with 7 vectors ix1: Ethernet address: 04:7d:7b:a5:87:33 ix1: PCI Express Bus: Speed 5.0GT/s Width x4 (pciconf tells me these are "82599EB 10-Gigabit SFI/SFP+ Network Connection". It's a bug that the driver doesn't tell me that.) These are glued together in a lagg(4) using LACP. Since we put this server into production, random network system calls have started failing with [EFBIG] or maybe sometimes [EIO]. I've observed this with a simple ping, but various daemons also log the errors: Mar 20 09:22:04 nfs-prod-4 sshd[42487]: fatal: Write failed: File too large [preauth] Mar 20 09:23:44 nfs-prod-4 nrpe[42492]: Error: Could not complete SSL handshake. 5 The machine eventually becomes unreachable and has to be rebooted from the console. So, can anyone tell me how this is possible, and what changed between 9.1 and 9.2 to cause it? -GAWollman