From owner-freebsd-stable@FreeBSD.ORG Wed Sep 1 16:05:49 2010 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1F1091065695 for ; Wed, 1 Sep 2010 16:05:49 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id C71E68FC19 for ; Wed, 1 Sep 2010 16:05:48 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ap0EANEWfkyDaFvO/2dsb2JhbACDGI90jkWtCZIOhEZzBIoU X-IronPort-AV: E=Sophos;i="4.56,304,1280721600"; d="scan'208";a="92488191" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 01 Sep 2010 12:05:44 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id EA27CB3EA8; Wed, 1 Sep 2010 12:05:47 -0400 (EDT) Date: Wed, 1 Sep 2010 12:05:47 -0400 (EDT) From: Rick Macklem To: Steve Polyack Message-ID: <1767168849.374184.1283357147943.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <4C7E743A.1040506@comcast.net> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_374183_1556044666.1283357147941" X-Originating-IP: [24.65.230.102] X-Mailer: Zimbra 6.0.7_GA_2476.RHEL4 (ZimbraWebClient - SAF3 (Mac)/6.0.7_GA_2473.RHEL4_64) Cc: yanefbsd@gmail.com, freebsd-stable@freebsd.org Subject: Re: NFS 75 second stall X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 01 Sep 2010 16:05:49 -0000 ------=_Part_374183_1556044666.1283357147941 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit > On 07/01/10 15:23, Garrett Cooper wrote: > > On Thu, Jul 1, 2010 at 11:51 AM, alan bryan > > wrote: > >> > >> --- On Thu, 7/1/10, Garrett Cooper wrote: > >> > >>> From: Garrett Cooper > >>> Subject: Re: NFS 75 second stall > >>> To: "alan bryan" > >>> Cc: freebsd-stable@freebsd.org > >>> Date: Thursday, July 1, 2010, 11:13 AM > >>> On Thu, Jul 1, 2010 at 11:01 AM, alan > >>> bryan > >>> wrote: > >>>> Setup: > >>>> > >>>> server - FreeBSD 8-stable from today. 2 UFS dirs > >>> exported via NFS. > >>>> client - FreeBSD 8.0-Release. Running a test php > >>> script that copies around various files to/from 2 separate > >>> NFS mounts. > >>>> Situation: > >>>> > >>>> script is started (forked to do 20 simultaneous runs) > >>> and 20 1GB files are copied to the NFS dir which works > >>> fine. When it then switches to reading those files back > >>> and simultaneously writing to the other NFS mount I see a > >>> hang of 75 seconds. If I do an "ls -l" on the NFS mount it > >>> hangs too. After 75 seconds the client has reported: > >>>> nfs server 192.168.10.133:/usr/local/export1: not > >>> responding > >>>> nfs server 192.168.10.133:/usr/local/export1: is alive > >>> again > >>>> nfs server 192.168.10.133:/usr/local/export1: not > >>> responding > >>>> nfs server 192.168.10.133:/usr/local/export1: is alive > >>> again > >>>> and then things start working again. The server was > >>> originally FreeBSD 8.0-Release also but was upgraded to the > >>> latest stable to see if this issue could be avoided. > >>>> # nfsstat -s -W -w 1 > >>>> GtAttr Lookup Rdlink Read Write Rename > >>> Access Rddir > >>>> 0 0 0 222 257 > >>> 0 0 0 > >>>> 0 0 0 178 135 > >>> 0 0 0 > >>>> 0 0 0 85 127 > >>> 0 0 0 > >>>> 0 0 0 0 0 > >>> 0 0 0 > >>>> 0 0 0 0 0 > >>> 0 0 0 > >>>> 0 0 0 0 0 > >>> 0 0 0 > >>>> 0 0 0 0 0 > >>> 0 0 0 > >>>> 0 0 0 0 0 > >>> 0 0 0 > >>>> ... for 75 rows of all zeros > >>>> > >>>> 0 0 0 272 266 > >>> 0 0 0 > >>>> 0 0 0 167 165 > >>> 0 0 0 > >>>> I also tried runs with 15 simultaneous processes and > >>> 25. 15 processes gave only about a 5 second stall but 25 > >>> gave again the same 75 second stall. > >>>> Further, I tested with 2 mounts to the same server but > >>> from ZFS filesytems with the exact same stall/timeout > >>> periods. So, it doesn't appear to matter what the > >>> underlying filesystem is - it's something in NFS or > >>> networking code. > >>>> Any ideas on what's going on here? What's causing > >>> the complete stall period of zero NFS activity? Any flaws > >>> with my testing methods? > >>>> Thanks for any and all help/ideas. > >>> What network driver are you using? Have you tried > >>> tcpdumping the packets? > >>> -Garrett > >>> > >> I'm using igb currently but have also used em. I have not tried > >> tcpdumping the packets yet on this test. Any suggestions on things > >> to look out for (I'm not that familiar with that whole process). > >> > >> Which brings up another point - I'm using TCP connections for NFS, > >> not UDP. > > Is the net.inet.tcp.tso sysctl enabled or not? What about > > rxcsum and txcsum? > > Thanks, > > -Garrett > > We're occaisionally seeing these same types of stalls (+ repeated "is > not responding" "is alive again" messages in quick succession). We're > seeing it only on our 8.1-RELEASE systems against a variety of NFS > servers (6.3-RELEASE, 7.2-RELEASE, and 8-STABLE from before the > release > of 8.1). We also see it happen with a variety of client hardware and > network adapters (em, bce, bge); the only common denominator is > 8.1-RELEASE on the clients. > You could try the attached patch. It won't fix anything, but it should print out what the errno is that is causing a TCP reconnect and might give us a hint w.r.t. what is going on. rick ------=_Part_374183_1556044666.1283357147941 Content-Type: text/x-patch; name=clnt_rc.patch Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename=clnt_rc.patch LS0tIHJwYy9jbG50X3JjLmMuc2F2CTIwMTAtMDktMDEgMTA6NTY6NTYuMDAwMDAwMDAwIC0wNDAw CisrKyBycGMvY2xudF9yYy5jCTIwMTAtMDktMDEgMTE6MDA6NDkuMDAwMDAwMDAwIC0wNDAwCkBA IC0yNjQsNiArMjY0LDcgQEAKIAkJCW10eF91bmxvY2soJnJjLT5yY19sb2NrKTsKIAkJCXN0YXQg PSBjbG50X3JlY29ubmVjdF9jb25uZWN0KGNsKTsKIAkJCWlmIChzdGF0ID09IFJQQ19TWVNURU1F UlJPUikgeworcHJpbnRmKCJyZWNvbiBlcnI9JWRcbiIsIHJwY19jcmVhdGVlcnIuY2ZfZXJyb3Iu cmVfZXJybm8pOwogCQkJCWVycm9yID0gdHNsZWVwKCZmYWtlX3djaGFuLAogCQkJCSAgICByYy0+ cmNfaW50ciA/IFBDQVRDSCB8IFBCRFJZIDogMCwgInJwY2NvbiIsCiAJCQkJICAgIGh6KTsK ------=_Part_374183_1556044666.1283357147941--