From owner-freebsd-stable@FreeBSD.ORG Wed Sep 1 15:56:50 2010 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 640EE1065674 for ; Wed, 1 Sep 2010 15:56:50 +0000 (UTC) (envelope-from korvus@comcast.net) Received: from mx04.pub.collaborativefusion.com (mx04.pub.collaborativefusion.com [206.210.72.84]) by mx1.freebsd.org (Postfix) with ESMTP id 1A5058FC08 for ; Wed, 1 Sep 2010 15:56:48 +0000 (UTC) Received: from [192.168.2.164] ([206.210.89.202]) by mx04.pub.collaborativefusion.com (StrongMail Enterprise 4.1.1.4(4.1.1.4-47689)); Wed, 01 Sep 2010 11:25:47 -0400 X-VirtualServerGroup: Default X-MailingID: 00000::00000::00000::00000::::2215 X-SMHeaderMap: mid="X-MailingID" X-Destination-ID: freebsd-stable@freebsd.org X-SMFBL: ZnJlZWJzZC1zdGFibGVAZnJlZWJzZC5vcmc= Message-ID: <4C7E743A.1040506@comcast.net> Date: Wed, 01 Sep 2010 11:41:46 -0400 From: Steve Polyack User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.7) Gecko/20100805 Thunderbird/3.1.1 MIME-Version: 1.0 To: freebsd-stable@freebsd.org, Rick Macklem References: <538823.39365.qm@web50508.mail.re2.yahoo.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Subject: Re: NFS 75 second stall X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 01 Sep 2010 15:56:50 -0000 On 07/01/10 15:23, Garrett Cooper wrote: > On Thu, Jul 1, 2010 at 11:51 AM, alan bryan wrote: >> >> --- On Thu, 7/1/10, Garrett Cooper wrote: >> >>> From: Garrett Cooper >>> Subject: Re: NFS 75 second stall >>> To: "alan bryan" >>> Cc: freebsd-stable@freebsd.org >>> Date: Thursday, July 1, 2010, 11:13 AM >>> On Thu, Jul 1, 2010 at 11:01 AM, alan >>> bryan >>> wrote: >>>> Setup: >>>> >>>> server - FreeBSD 8-stable from today. 2 UFS dirs >>> exported via NFS. >>>> client - FreeBSD 8.0-Release. Running a test php >>> script that copies around various files to/from 2 separate >>> NFS mounts. >>>> Situation: >>>> >>>> script is started (forked to do 20 simultaneous runs) >>> and 20 1GB files are copied to the NFS dir which works >>> fine. When it then switches to reading those files back >>> and simultaneously writing to the other NFS mount I see a >>> hang of 75 seconds. If I do an "ls -l" on the NFS mount it >>> hangs too. After 75 seconds the client has reported: >>>> nfs server 192.168.10.133:/usr/local/export1: not >>> responding >>>> nfs server 192.168.10.133:/usr/local/export1: is alive >>> again >>>> nfs server 192.168.10.133:/usr/local/export1: not >>> responding >>>> nfs server 192.168.10.133:/usr/local/export1: is alive >>> again >>>> and then things start working again. The server was >>> originally FreeBSD 8.0-Release also but was upgraded to the >>> latest stable to see if this issue could be avoided. >>>> # nfsstat -s -W -w 1 >>>> GtAttr Lookup Rdlink Read Write Rename >>> Access Rddir >>>> 0 0 0 222 257 >>> 0 0 0 >>>> 0 0 0 178 135 >>> 0 0 0 >>>> 0 0 0 85 127 >>> 0 0 0 >>>> 0 0 0 0 0 >>> 0 0 0 >>>> 0 0 0 0 0 >>> 0 0 0 >>>> 0 0 0 0 0 >>> 0 0 0 >>>> 0 0 0 0 0 >>> 0 0 0 >>>> 0 0 0 0 0 >>> 0 0 0 >>>> ... for 75 rows of all zeros >>>> >>>> 0 0 0 272 266 >>> 0 0 0 >>>> 0 0 0 167 165 >>> 0 0 0 >>>> I also tried runs with 15 simultaneous processes and >>> 25. 15 processes gave only about a 5 second stall but 25 >>> gave again the same 75 second stall. >>>> Further, I tested with 2 mounts to the same server but >>> from ZFS filesytems with the exact same stall/timeout >>> periods. So, it doesn't appear to matter what the >>> underlying filesystem is - it's something in NFS or >>> networking code. >>>> Any ideas on what's going on here? What's causing >>> the complete stall period of zero NFS activity? Any flaws >>> with my testing methods? >>>> Thanks for any and all help/ideas. >>> What network driver are you using? Have you tried >>> tcpdumping the packets? >>> -Garrett >>> >> I'm using igb currently but have also used em. I have not tried tcpdumping the packets yet on this test. Any suggestions on things to look out for (I'm not that familiar with that whole process). >> >> Which brings up another point - I'm using TCP connections for NFS, not UDP. > Is the net.inet.tcp.tso sysctl enabled or not? What about rxcsum and txcsum? > Thanks, > -Garrett We're occaisionally seeing these same types of stalls (+ repeated "is not responding" "is alive again" messages in quick succession). We're seeing it only on our 8.1-RELEASE systems against a variety of NFS servers (6.3-RELEASE, 7.2-RELEASE, and 8-STABLE from before the release of 8.1). We also see it happen with a variety of client hardware and network adapters (em, bce, bge); the only common denominator is 8.1-RELEASE on the clients.