From owner-freebsd-stable@FreeBSD.ORG Fri Mar 21 10:33:12 2014 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 3FD5F51B; Fri, 21 Mar 2014 10:33:12 +0000 (UTC) Received: from mail.adm.hostpoint.ch (mail.adm.hostpoint.ch [IPv6:2a00:d70:0:a::e0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id C531E78F; Fri, 21 Mar 2014 10:33:11 +0000 (UTC) Received: from [2001:1620:2013:1:98ae:107d:2646:4979] (port=52493) by mail.adm.hostpoint.ch with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.80.1 (FreeBSD)) (envelope-from ) id 1WQwl9-0009wR-7b; Fri, 21 Mar 2014 11:33:07 +0100 Content-Type: text/plain; charset=windows-1252 Mime-Version: 1.0 (Mac OS X Mail 7.2 \(1874\)) Subject: Re: Network stack returning EFBIG? From: Markus Gebert In-Reply-To: <429006400.647323.1395369915529.JavaMail.root@uoguelph.ca> Date: Fri, 21 Mar 2014 11:32:27 +0100 Content-Transfer-Encoding: quoted-printable Message-Id: References: <429006400.647323.1395369915529.JavaMail.root@uoguelph.ca> To: Rick Macklem X-Mailer: Apple Mail (2.1874) Cc: jfv@freebsd.org, freebsd-net@freebsd.org, freebsd-stable@freebsd.org, wollman@bimajority.org, Christopher Forgeron X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Mar 2014 10:33:12 -0000 On 21.03.2014, at 03:45, Rick Macklem wrote: > Markus Gebert wrote: >>=20 >> On 20.03.2014, at 14:51, wollman@bimajority.org wrote: >>=20 >>> In article <21290.60558.750106.630804@hergotha.csail.mit.edu>, I >>> wrote: >>>=20 >>>> Since we put this server into production, random network system >>>> calls >>>> have started failing with [EFBIG] or maybe sometimes [EIO]. I've >>>> observed this with a simple ping, but various daemons also log the >>>> errors: >>>> Mar 20 09:22:04 nfs-prod-4 sshd[42487]: fatal: Write failed: File >>>> too >>>> large [preauth] >>>> Mar 20 09:23:44 nfs-prod-4 nrpe[42492]: Error: Could not complete >>>> SSL >>>> handshake. 5 >>>=20 >>> I found at least one call stack where this happens and it does get >>> returned all the way to userspace: >>>=20 >>> 17 15547 _bus_dmamap_load_buffer:return >>> kernel`_bus_dmamap_load_mbuf_sg+0x5f >>> kernel`bus_dmamap_load_mbuf_sg+0x38 >>> kernel`ixgbe_xmit+0xcf >>> kernel`ixgbe_mq_start_locked+0x94 >>> kernel`ixgbe_mq_start+0x12a >>> if_lagg.ko`lagg_transmit+0xc4 >>> kernel`ether_output_frame+0x33 >>> kernel`ether_output+0x4fe >>> kernel`ip_output+0xd74 >>> kernel`tcp_output+0xfea >>> kernel`tcp_usr_send+0x325 >>> kernel`sosend_generic+0x3f6 >>> kernel`soo_write+0x5e >>> kernel`dofilewrite+0x85 >>> kernel`kern_writev+0x6c >>> kernel`sys_write+0x64 >>> kernel`amd64_syscall+0x5ea >>> kernel`0xffffffff808443c7 >>=20 >> This looks pretty similar to what we=92ve seen when we got EFBIG: >>=20 >> 3 28502 _bus_dmamap_load_buffer:return >> kernel`_bus_dmamap_load_mbuf_sg+0x5f >> kernel`bus_dmamap_load_mbuf_sg+0x38 >> kernel`ixgbe_xmit+0xcf >> kernel`ixgbe_mq_start_locked+0x94 >> kernel`ixgbe_mq_start+0x12a >> kernel`ether_output_frame+0x33 >> kernel`ether_output+0x4fe >> kernel`ip_output+0xd74 >> kernel`rip_output+0x229 >> kernel`sosend_generic+0x3f6 >> kernel`kern_sendit+0x1a3 >> kernel`sendit+0xdc >> kernel`sys_sendto+0x4d >> kernel`amd64_syscall+0x5ea >> kernel`0xffffffff80d35667 >>=20 >> In our case it looks like some of the ixgbe tx queues get stuck, and >> some don=92t. You can test, wether your server shows the same = symptoms >> with this command: >>=20 >> # for CPU in {0..7}; do echo "CPU${CPU}"; cpuset -l ${CPU} ping -i >> 0.5 -c 2 -W 1 10.0.0.1 | grep sendto; done >>=20 >> We also use 82599EB based ixgbe controllers on affected systems. >>=20 >> Also see these two threads on freebsd-net: >>=20 >> = http://lists.freebsd.org/pipermail/freebsd-net/2014-February/037967.html >> http://lists.freebsd.org/pipermail/freebsd-net/2014-March/038061.html >>=20 >> I have started the second one, and there are some more details of >> what we were seeing in case you=92re interested. >>=20 >> Then there is: >>=20 >> http://www.freebsd.org/cgi/query-pr.cgi?pr=3D183390 >> and: >> https://bugs.freenas.org/issues/4560 >>=20 > Well, the "before" printf() from my patch is indicating a packet > = 65535 > and that will definitely result in a EFBIG. (There is no way that = m_defrag() > can squeeze > 64K into 32 MCLBYTES mbufs.) Makes sense. > Note that the EFBIG will be returned by the call that dequeues this = packet > and tries to transmit it (not necessarily the one that = generated/queued the > packet). This was pointed out by Ryan in a previous discussion of = this. I remember that email, and it also explains why a ping could fail when = it happens to be on the same queue. On the other hand, would it explain = why every single ping on certain queues starts to fail, while other = queues are unaffected? Of course it could be that whatever triggers the = problem, resends the huge segment immediately over the same TCP = connection, and blocks one queue for some time by repeating this over = and over quickly enough to kill every single ping packet. However this = sounds unlikely to me. And once we saw the problem, I umounted all NFS = shares and therefore eliminated all sources of huge packets, and the = problem persisted. So, in my opinion, there must be more to it than just = a packet too big once in a while. > The code snippet from sys/netinet/tcp_output.c looks pretty = straightforward: > /* > 772 * Limit a burst to t_tsomax minus IP, > 773 * TCP and options length to keep ip->ip_len > 774 * from overflowing or exceeding the maximum > 775 * length allowed by the network interface. > 776 */ > 777 if (len > tp->t_tsomax - hdrlen) { > 778 len =3D tp->t_tsomax - hdrlen; > 779 sendalot =3D 1; > 780 } > If it is a TSO segment of > 65535, at a glance it would seem that this = "if" > is busted. Just to see, you could try replacing line# 777-778 with > if (len > IP_MAXPACKET - hdrlen) { > len =3D IP_MAXPACKET - hdrlen; > which was what it was in 9.1. (Maybe t_tsomax isn't set correctly or = somehow > screws up the calculation? I cannot answer your question, but this is an interesting catch. I=92ll = get this and your printfs in our 9.2 kernel as soon as I can. Markus > rick >=20 >>=20 >> Markus >> _______________________________________________ >> freebsd-net@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-net >> To unsubscribe, send any mail to >> "freebsd-net-unsubscribe@freebsd.org" >>=20 > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"