From owner-freebsd-stable@FreeBSD.ORG  Fri Mar 21 10:33:12 2014
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 3FD5F51B;
 Fri, 21 Mar 2014 10:33:12 +0000 (UTC)
Received: from mail.adm.hostpoint.ch (mail.adm.hostpoint.ch
 [IPv6:2a00:d70:0:a::e0])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id C531E78F;
 Fri, 21 Mar 2014 10:33:11 +0000 (UTC)
Received: from [2001:1620:2013:1:98ae:107d:2646:4979] (port=52493)
 by mail.adm.hostpoint.ch with esmtpsa (TLSv1:AES128-SHA:128)
 (Exim 4.80.1 (FreeBSD)) (envelope-from <markus.gebert@hostpoint.ch>)
 id 1WQwl9-0009wR-7b; Fri, 21 Mar 2014 11:33:07 +0100
Content-Type: text/plain; charset=windows-1252
Mime-Version: 1.0 (Mac OS X Mail 7.2 \(1874\))
Subject: Re: Network stack returning EFBIG?
From: Markus Gebert <markus.gebert@hostpoint.ch>
In-Reply-To: <429006400.647323.1395369915529.JavaMail.root@uoguelph.ca>
Date: Fri, 21 Mar 2014 11:32:27 +0100
Content-Transfer-Encoding: quoted-printable
Message-Id: <E0FD7069-2F95-49F1-B3EC-49F7922477DC@hostpoint.ch>
References: <429006400.647323.1395369915529.JavaMail.root@uoguelph.ca>
To: Rick Macklem <rmacklem@uoguelph.ca>
X-Mailer: Apple Mail (2.1874)
Cc: jfv@freebsd.org, freebsd-net@freebsd.org, freebsd-stable@freebsd.org,
 wollman@bimajority.org, Christopher Forgeron <csforgeron@gmail.com>
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable/>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 21 Mar 2014 10:33:12 -0000


On 21.03.2014, at 03:45, Rick Macklem <rmacklem@uoguelph.ca> wrote:

> Markus Gebert wrote:
>>=20
>> On 20.03.2014, at 14:51, wollman@bimajority.org wrote:
>>=20
>>> In article <21290.60558.750106.630804@hergotha.csail.mit.edu>, I
>>> wrote:
>>>=20
>>>> Since we put this server into production, random network system
>>>> calls
>>>> have started failing with [EFBIG] or maybe sometimes [EIO].  I've
>>>> observed this with a simple ping, but various daemons also log the
>>>> errors:
>>>> Mar 20 09:22:04 nfs-prod-4 sshd[42487]: fatal: Write failed: File
>>>> too
>>>> large [preauth]
>>>> Mar 20 09:23:44 nfs-prod-4 nrpe[42492]: Error: Could not complete
>>>> SSL
>>>> handshake. 5
>>>=20
>>> I found at least one call stack where this happens and it does get
>>> returned all the way to userspace:
>>>=20
>>> 17  15547   _bus_dmamap_load_buffer:return
>>>             kernel`_bus_dmamap_load_mbuf_sg+0x5f
>>>             kernel`bus_dmamap_load_mbuf_sg+0x38
>>>             kernel`ixgbe_xmit+0xcf
>>>             kernel`ixgbe_mq_start_locked+0x94
>>>             kernel`ixgbe_mq_start+0x12a
>>>             if_lagg.ko`lagg_transmit+0xc4
>>>             kernel`ether_output_frame+0x33
>>>             kernel`ether_output+0x4fe
>>>             kernel`ip_output+0xd74
>>>             kernel`tcp_output+0xfea
>>>             kernel`tcp_usr_send+0x325
>>>             kernel`sosend_generic+0x3f6
>>>             kernel`soo_write+0x5e
>>>             kernel`dofilewrite+0x85
>>>             kernel`kern_writev+0x6c
>>>             kernel`sys_write+0x64
>>>             kernel`amd64_syscall+0x5ea
>>>             kernel`0xffffffff808443c7
>>=20
>> This looks pretty similar to what we=92ve seen when we got EFBIG:
>>=20
>> 3  28502   _bus_dmamap_load_buffer:return
>>              kernel`_bus_dmamap_load_mbuf_sg+0x5f
>>              kernel`bus_dmamap_load_mbuf_sg+0x38
>>              kernel`ixgbe_xmit+0xcf
>>              kernel`ixgbe_mq_start_locked+0x94
>>              kernel`ixgbe_mq_start+0x12a
>>              kernel`ether_output_frame+0x33
>>              kernel`ether_output+0x4fe
>>              kernel`ip_output+0xd74
>>              kernel`rip_output+0x229
>>              kernel`sosend_generic+0x3f6
>>              kernel`kern_sendit+0x1a3
>>              kernel`sendit+0xdc
>>              kernel`sys_sendto+0x4d
>>              kernel`amd64_syscall+0x5ea
>>              kernel`0xffffffff80d35667
>>=20
>> In our case it looks like some of the ixgbe tx queues get stuck, and
>> some don=92t. You can test, wether your server shows the same =
symptoms
>> with this command:
>>=20
>> # for CPU in {0..7}; do echo "CPU${CPU}"; cpuset -l ${CPU} ping -i
>> 0.5 -c 2 -W 1 10.0.0.1 | grep sendto; done
>>=20
>> We also use 82599EB based ixgbe controllers on affected systems.
>>=20
>> Also see these two threads on freebsd-net:
>>=20
>> =
http://lists.freebsd.org/pipermail/freebsd-net/2014-February/037967.html
>> http://lists.freebsd.org/pipermail/freebsd-net/2014-March/038061.html
>>=20
>> I have started the second one, and there are some more details of
>> what we were seeing in case you=92re interested.
>>=20
>> Then there is:
>>=20
>> http://www.freebsd.org/cgi/query-pr.cgi?pr=3D183390
>> and:
>> https://bugs.freenas.org/issues/4560
>>=20
> Well, the "before" printf() from my patch is indicating a packet > =
65535
> and that will definitely result in a EFBIG. (There is no way that =
m_defrag()
> can squeeze > 64K into 32 MCLBYTES mbufs.)

Makes sense.


> Note that the EFBIG will be returned by the call that dequeues this =
packet
> and tries to transmit it (not necessarily the one that =
generated/queued the
> packet). This was pointed out by Ryan in a previous discussion of =
this.

I remember that email, and it also explains why a ping could fail when =
it happens to be on the same queue. On the other hand, would it explain =
why every single ping on certain queues starts to fail, while other =
queues are unaffected? Of course it could be that whatever triggers the =
problem, resends the huge segment immediately over the same TCP =
connection, and blocks one queue for some time by repeating this over =
and over quickly enough to kill every single ping packet. However this =
sounds unlikely to me. And once we saw the problem, I umounted all NFS =
shares and therefore eliminated all sources of huge packets, and the =
problem persisted. So, in my opinion, there must be more to it than just =
a packet too big once in a while.


> The code snippet from sys/netinet/tcp_output.c looks pretty =
straightforward:
>       /*
> 772 	* Limit a burst to t_tsomax minus IP,
> 773 	* TCP and options length to keep ip->ip_len
> 774 	* from overflowing or exceeding the maximum
> 775 	* length allowed by the network interface.
> 776 	*/
> 777 	if (len > tp->t_tsomax - hdrlen) {
> 778 	   len =3D tp->t_tsomax - hdrlen;
> 779 	   sendalot =3D 1;
> 780 	}
> If it is a TSO segment of > 65535, at a glance it would seem that this =
"if"
> is busted. Just to see, you could try replacing line# 777-778 with
>       if (len > IP_MAXPACKET - hdrlen) {
>           len =3D IP_MAXPACKET - hdrlen;
> which was what it was in 9.1. (Maybe t_tsomax isn't set correctly or =
somehow
> screws up the calculation?


I cannot answer your question, but this is an interesting catch. I=92ll =
get this and your printfs in our 9.2 kernel as soon as I can.


Markus


> rick
>=20
>>=20
>> Markus
>> _______________________________________________
>> freebsd-net@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-net
>> To unsubscribe, send any mail to
>> "freebsd-net-unsubscribe@freebsd.org"
>>=20
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"