From owner-freebsd-net@FreeBSD.ORG Fri Feb 28 17:09:03 2014 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 6878A57B; Fri, 28 Feb 2014 17:09:03 +0000 (UTC) Received: from mail.adm.hostpoint.ch (mail.adm.hostpoint.ch [IPv6:2a00:d70:0:a::e0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id EDE6A10E1; Fri, 28 Feb 2014 17:09:02 +0000 (UTC) Received: from 46-127-132-15.dynamic.hispeed.ch ([46.127.132.15]:50131 helo=[172.16.1.156]) by mail.adm.hostpoint.ch with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.77 (FreeBSD)) (envelope-from ) id 1WJQvk-0004bq-Ay; Fri, 28 Feb 2014 18:09:00 +0100 Content-Type: text/plain; charset=windows-1252 Mime-Version: 1.0 (Mac OS X Mail 7.2 \(1874\)) Subject: Re: Network loss From: Markus Gebert In-Reply-To: <1673358278.14528789.1393542781747.JavaMail.root@uoguelph.ca> Date: Fri, 28 Feb 2014 18:08:59 +0100 Content-Transfer-Encoding: quoted-printable Message-Id: <2FDC6123-5891-4DDA-AC41-FE4B639C0042@hostpoint.ch> References: <1673358278.14528789.1393542781747.JavaMail.root@uoguelph.ca> To: Rick Macklem X-Mailer: Apple Mail (2.1874) Cc: Johan Kooijman , freebsd-net@freebsd.org, Jack Vogel , John Baldwin X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 28 Feb 2014 17:09:03 -0000 On 28.02.2014, at 00:13, Rick Macklem wrote: > Markus Gebert wrote: >>=20 >> On 27.02.2014, at 02:00, Rick Macklem wrote: >>=20 >>> John Baldwin wrote: >>>> On Tuesday, February 25, 2014 2:19:01 am Johan Kooijman wrote: >>>>> Hi all, >>>>>=20 >>>>> I have a weird situation here where I can't get my head around. >>>>>=20 >>>>> One FreeBSD 9.2-STABLE ZFS/NFS box, multiple Linux clients. Once >>>>> in >>>>> a while >>>>> the Linux clients loose their NFS connection: >>>>>=20 >>>>> Feb 25 06:24:09 hv3 kernel: nfs: server 10.0.24.1 not responding, >>>>> timed out >>>>>=20 >>>>> Not all boxes, just one out of the cluster. The weird part is >>>>> that >>>>> when I >>>>> try to ping a Linux client from the FreeBSD box, I have between >>>>> 10 >>>>> and 30% >>>>> packetloss - all day long, no specific timeframe. If I ping the >>>>> Linux >>>>> clients - no loss. If I ping back from the Linux clients to FBSD >>>>> box - no >>>>> loss. >>>>>=20 >>>>> The errors I get when pinging a Linux client is this one: >>>>> ping: sendto: File too large >>=20 >> We were facing similar problems when upgrading to 9.2 and have stayed >> with 9.1 on affected systems for now. We=92ve seen this on HP G8 >> blades with 82599EB controllers: >>=20 >> ix0@pci0:4:0:0: class=3D0x020000 card=3D0x18d0103c = chip=3D0x10f88086 >> rev=3D0x01 hdr=3D0x00 >> vendor =3D 'Intel Corporation' >> device =3D '82599EB 10 Gigabit Dual Port Backplane Connection' >> class =3D network >> subclass =3D ethernet >>=20 >> We didn=92t find a way to trigger the problem reliably. But when it >> occurs, it usually affects only one interface. Symptoms include: >>=20 >> - socket functions return the 'File too large' error mentioned by >> Johan >> - socket functions return 'No buffer space=92 available >> - heavy to full packet loss on the affected interface >> - =93stuck=94 TCP connection, i.e. ESTABLISHED TCP connections that >> should have timed out stick around forever (socket on the other side >> could have been closed ours ago) >> - userland programs using the corresponding sockets usually got stuck >> too (can=92t find kernel traces right now, but always in network >> related syscalls) >>=20 >> Network is only lightly loaded on the affected systems (usually 5-20 >> mbit, capped at 200 mbit, per server), and netstat never showed any >> indication of ressource shortage (like mbufs). >>=20 >> What made the problem go away temporariliy was to ifconfig down/up >> the affected interface. >>=20 >> We tested a 9.2 kernel with the 9.1 ixgbe driver, which was not >> really stable. Also, we tested a few revisions between 9.1 and 9.2 >> to find out when the problem started. Unfortunately, the ixgbe >> driver turned out to be mostly unstable on our systems between these >> releases, worse than on 9.2. The instability was introduced shortly >> after to 9.1 and fixed only very shortly before 9.2 release. So no >> luck there. We ended up using 9.1 with backports of 9.2 features we >> really need. >>=20 >> What we can=92t tell is wether it=92s the 9.2 kernel or the 9.2 ixgbe >> driver or a combination of both that causes these problems. >> Unfortunately we ran out of time (and ideas). >>=20 >>=20 >>>> EFBIG is sometimes used for drivers when a packet takes too many >>>> scatter/gather entries. Since you mentioned NFS, one thing you >>>> can >>>> try is to >>>> disable TSO on the intertface you are using for NFS to see if that >>>> "fixes" it. >>>>=20 >>> And please email if you try it and let us know if it helps. >>>=20 >>> I've think I've figured out how 64K NFS read replies can do this, >>> but I'll admit "ping" is a mystery? (Doesn't it just send a single >>> packet that would be in a single mbuf?) >>>=20 >>> I think the EFBIG is replied by bus_dmamap_load_mbuf_sg(), but I >>> don't know if it can happen for an mbuf chain with < 32 entries? >>=20 >> We don=92t use the nfs server on our systems, but they=92re >> (new)nfsclients. So I don=92t think our problem is nfs related, = unless >> the default rsize/wsize for client mounts is not 8K, which I thought >> it was. Can you confirm this, Rick? >>=20 > Well, if you don't specify any mount options, it will be > min(64K, what-the-server-specifies). >=20 > "nfsstat -m" should show you what it actually is using, for 9.2 or > later. Thanks for your answer and the command. I knew there is an new option to = nfsstat, but I couldn=92t find it in the 9.1 man page, of course ;-). > 8K would be used if you specified "udp=94. I see. We=92re using tcp, so that would not be relevant then. Guess my = look at the mount vfsop was too quick... > For the client, it would be write requests that could be 64K. > You could try "wsize=3D32768,rsize=3D32768" (it is actually the > wsize that matters for this case, but you might as well set > rsize at the same time). With these options specified, you > know what the maximum value is (it will still be reduced for > udp or if the server wants it smaller). Ok. I checked this on a system that is still running under 9.2 (we had = to downgrade most production systems), and bingo, 64K wsize. So our nfs = server (a NetApp Cluster) seems to prefer 64K. This means NFS still is a potential trigger of the problem. Let=92s = pretend it is, despite I think I already tried disabling TSO. Would this = explain all the symptoms we were seeing? Why would ifconfig down/up = help? Do you have a theory on that? Markus