From owner-freebsd-net@FreeBSD.ORG Thu Mar 20 15:22:39 2014 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 9B8F573; Thu, 20 Mar 2014 15:22:39 +0000 (UTC) Received: from mail.adm.hostpoint.ch (mail.adm.hostpoint.ch [IPv6:2a00:d70:0:a::e0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 5A8092CC; Thu, 20 Mar 2014 15:22:39 +0000 (UTC) Received: from [2001:1620:2013:1:4535:ed23:3991:6e11] (port=54749) by mail.adm.hostpoint.ch with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.80.1 (FreeBSD)) (envelope-from ) id 1WQenl-000NhV-2M; Thu, 20 Mar 2014 16:22:37 +0100 Content-Type: text/plain; charset=windows-1252 Mime-Version: 1.0 (Mac OS X Mail 7.2 \(1874\)) Subject: Re: Network stack returning EFBIG? From: Markus Gebert In-Reply-To: <201403201351.s2KDpghe080116@hergotha.csail.mit.edu> Date: Thu, 20 Mar 2014 16:21:56 +0100 Content-Transfer-Encoding: quoted-printable Message-Id: References: <201403201351.s2KDpghe080116@hergotha.csail.mit.edu> To: wollman@bimajority.org X-Mailer: Apple Mail (2.1874) Cc: jfv@freebsd.org, freebsd-net@freebsd.org, freebsd-stable@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Mar 2014 15:22:39 -0000 On 20.03.2014, at 14:51, wollman@bimajority.org wrote: > In article <21290.60558.750106.630804@hergotha.csail.mit.edu>, I = wrote: >=20 >> Since we put this server into production, random network system calls >> have started failing with [EFBIG] or maybe sometimes [EIO]. I've >> observed this with a simple ping, but various daemons also log the >> errors: >> Mar 20 09:22:04 nfs-prod-4 sshd[42487]: fatal: Write failed: File too >> large [preauth] >> Mar 20 09:23:44 nfs-prod-4 nrpe[42492]: Error: Could not complete SSL >> handshake. 5 >=20 > I found at least one call stack where this happens and it does get > returned all the way to userspace: >=20 > 17 15547 _bus_dmamap_load_buffer:return=20 > kernel`_bus_dmamap_load_mbuf_sg+0x5f > kernel`bus_dmamap_load_mbuf_sg+0x38 > kernel`ixgbe_xmit+0xcf > kernel`ixgbe_mq_start_locked+0x94 > kernel`ixgbe_mq_start+0x12a > if_lagg.ko`lagg_transmit+0xc4 > kernel`ether_output_frame+0x33 > kernel`ether_output+0x4fe > kernel`ip_output+0xd74 > kernel`tcp_output+0xfea > kernel`tcp_usr_send+0x325 > kernel`sosend_generic+0x3f6 > kernel`soo_write+0x5e > kernel`dofilewrite+0x85 > kernel`kern_writev+0x6c > kernel`sys_write+0x64 > kernel`amd64_syscall+0x5ea > kernel`0xffffffff808443c7 This looks pretty similar to what we=92ve seen when we got EFBIG: 3 28502 _bus_dmamap_load_buffer:return=20 kernel`_bus_dmamap_load_mbuf_sg+0x5f kernel`bus_dmamap_load_mbuf_sg+0x38 kernel`ixgbe_xmit+0xcf kernel`ixgbe_mq_start_locked+0x94 kernel`ixgbe_mq_start+0x12a kernel`ether_output_frame+0x33 kernel`ether_output+0x4fe kernel`ip_output+0xd74 kernel`rip_output+0x229 kernel`sosend_generic+0x3f6 kernel`kern_sendit+0x1a3 kernel`sendit+0xdc kernel`sys_sendto+0x4d kernel`amd64_syscall+0x5ea kernel`0xffffffff80d35667 In our case it looks like some of the ixgbe tx queues get stuck, and = some don=92t. You can test, wether your server shows the same symptoms = with this command: # for CPU in {0..7}; do echo "CPU${CPU}"; cpuset -l ${CPU} ping -i 0.5 = -c 2 -W 1 10.0.0.1 | grep sendto; done We also use 82599EB based ixgbe controllers on affected systems. Also see these two threads on freebsd-net: http://lists.freebsd.org/pipermail/freebsd-net/2014-February/037967.html http://lists.freebsd.org/pipermail/freebsd-net/2014-March/038061.html I have started the second one, and there are some more details of what = we were seeing in case you=92re interested. Then there is: http://www.freebsd.org/cgi/query-pr.cgi?pr=3D183390 and: https://bugs.freenas.org/issues/4560 Markus=