Date: Thu, 20 Mar 2014 16:21:56 +0100 From: Markus Gebert <markus.gebert@hostpoint.ch> To: wollman@bimajority.org Cc: jfv@freebsd.org, freebsd-net@freebsd.org, freebsd-stable@freebsd.org Subject: Re: Network stack returning EFBIG? Message-ID: <B0A8C00F-67C7-40B2-94DE-9449574FF63F@hostpoint.ch> In-Reply-To: <201403201351.s2KDpghe080116@hergotha.csail.mit.edu> References: <201403201351.s2KDpghe080116@hergotha.csail.mit.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
On 20.03.2014, at 14:51, wollman@bimajority.org wrote: > In article <21290.60558.750106.630804@hergotha.csail.mit.edu>, I = wrote: >=20 >> Since we put this server into production, random network system calls >> have started failing with [EFBIG] or maybe sometimes [EIO]. I've >> observed this with a simple ping, but various daemons also log the >> errors: >> Mar 20 09:22:04 nfs-prod-4 sshd[42487]: fatal: Write failed: File too >> large [preauth] >> Mar 20 09:23:44 nfs-prod-4 nrpe[42492]: Error: Could not complete SSL >> handshake. 5 >=20 > I found at least one call stack where this happens and it does get > returned all the way to userspace: >=20 > 17 15547 _bus_dmamap_load_buffer:return=20 > kernel`_bus_dmamap_load_mbuf_sg+0x5f > kernel`bus_dmamap_load_mbuf_sg+0x38 > kernel`ixgbe_xmit+0xcf > kernel`ixgbe_mq_start_locked+0x94 > kernel`ixgbe_mq_start+0x12a > if_lagg.ko`lagg_transmit+0xc4 > kernel`ether_output_frame+0x33 > kernel`ether_output+0x4fe > kernel`ip_output+0xd74 > kernel`tcp_output+0xfea > kernel`tcp_usr_send+0x325 > kernel`sosend_generic+0x3f6 > kernel`soo_write+0x5e > kernel`dofilewrite+0x85 > kernel`kern_writev+0x6c > kernel`sys_write+0x64 > kernel`amd64_syscall+0x5ea > kernel`0xffffffff808443c7 This looks pretty similar to what we=92ve seen when we got EFBIG: 3 28502 _bus_dmamap_load_buffer:return=20 kernel`_bus_dmamap_load_mbuf_sg+0x5f kernel`bus_dmamap_load_mbuf_sg+0x38 kernel`ixgbe_xmit+0xcf kernel`ixgbe_mq_start_locked+0x94 kernel`ixgbe_mq_start+0x12a kernel`ether_output_frame+0x33 kernel`ether_output+0x4fe kernel`ip_output+0xd74 kernel`rip_output+0x229 kernel`sosend_generic+0x3f6 kernel`kern_sendit+0x1a3 kernel`sendit+0xdc kernel`sys_sendto+0x4d kernel`amd64_syscall+0x5ea kernel`0xffffffff80d35667 In our case it looks like some of the ixgbe tx queues get stuck, and = some don=92t. You can test, wether your server shows the same symptoms = with this command: # for CPU in {0..7}; do echo "CPU${CPU}"; cpuset -l ${CPU} ping -i 0.5 = -c 2 -W 1 10.0.0.1 | grep sendto; done We also use 82599EB based ixgbe controllers on affected systems. Also see these two threads on freebsd-net: http://lists.freebsd.org/pipermail/freebsd-net/2014-February/037967.html http://lists.freebsd.org/pipermail/freebsd-net/2014-March/038061.html I have started the second one, and there are some more details of what = we were seeing in case you=92re interested. Then there is: http://www.freebsd.org/cgi/query-pr.cgi?pr=3D183390 and: https://bugs.freenas.org/issues/4560 Markus=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?B0A8C00F-67C7-40B2-94DE-9449574FF63F>