From owner-freebsd-stable@FreeBSD.ORG  Wed Sep  1 15:56:50 2010
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 640EE1065674
	for <freebsd-stable@freebsd.org>; Wed,  1 Sep 2010 15:56:50 +0000 (UTC)
	(envelope-from korvus@comcast.net)
Received: from mx04.pub.collaborativefusion.com
	(mx04.pub.collaborativefusion.com [206.210.72.84])
	by mx1.freebsd.org (Postfix) with ESMTP id 1A5058FC08
	for <freebsd-stable@freebsd.org>; Wed,  1 Sep 2010 15:56:48 +0000 (UTC)
Received: from [192.168.2.164] ([206.210.89.202])
	by mx04.pub.collaborativefusion.com (StrongMail Enterprise
	4.1.1.4(4.1.1.4-47689)); Wed, 01 Sep 2010 11:25:47 -0400
X-VirtualServerGroup: Default
X-MailingID: 00000::00000::00000::00000::::2215
X-SMHeaderMap: mid="X-MailingID"
X-Destination-ID: freebsd-stable@freebsd.org
X-SMFBL: ZnJlZWJzZC1zdGFibGVAZnJlZWJzZC5vcmc=
Message-ID: <4C7E743A.1040506@comcast.net>
Date: Wed, 01 Sep 2010 11:41:46 -0400
From: Steve Polyack <korvus@comcast.net>
User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US;
	rv:1.9.2.7) Gecko/20100805 Thunderbird/3.1.1
MIME-Version: 1.0
To: freebsd-stable@freebsd.org, Rick Macklem <rmacklem@uoguelph.ca>
References: <AANLkTilNvy3FYUNjjiJ85eWrF7jTAvJJ9E7Q2eqhhQj6@mail.gmail.com>	<538823.39365.qm@web50508.mail.re2.yahoo.com>
	<AANLkTillzgI775xETcZcmyj4TyTVihZJ5tSznxOoWE_r@mail.gmail.com>
In-Reply-To: <AANLkTillzgI775xETcZcmyj4TyTVihZJ5tSznxOoWE_r@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: 
Subject: Re: NFS 75 second stall
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 01 Sep 2010 15:56:50 -0000

  On 07/01/10 15:23, Garrett Cooper wrote:
> On Thu, Jul 1, 2010 at 11:51 AM, alan bryan<alan.bryan@yahoo.com>  wrote:
>>
>> --- On Thu, 7/1/10, Garrett Cooper<yanefbsd@gmail.com>  wrote:
>>
>>> From: Garrett Cooper<yanefbsd@gmail.com>
>>> Subject: Re: NFS 75 second stall
>>> To: "alan bryan"<alan.bryan@yahoo.com>
>>> Cc: freebsd-stable@freebsd.org
>>> Date: Thursday, July 1, 2010, 11:13 AM
>>> On Thu, Jul 1, 2010 at 11:01 AM, alan
>>> bryan<alan.bryan@yahoo.com>
>>> wrote:
>>>> Setup:
>>>>
>>>> server - FreeBSD 8-stable from today.  2 UFS dirs
>>> exported via NFS.
>>>> client - FreeBSD 8.0-Release.  Running a test php
>>> script that copies around various files to/from 2 separate
>>> NFS mounts.
>>>> Situation:
>>>>
>>>> script is started (forked to do 20 simultaneous runs)
>>> and 20 1GB files are copied to the NFS dir which works
>>> fine.  When it then switches to reading those files back
>>> and simultaneously writing to the other NFS mount I see a
>>> hang of 75 seconds.  If I do an "ls -l" on the NFS mount it
>>> hangs too.  After 75 seconds the client has reported:
>>>> nfs server 192.168.10.133:/usr/local/export1: not
>>> responding
>>>> nfs server 192.168.10.133:/usr/local/export1: is alive
>>> again
>>>> nfs server 192.168.10.133:/usr/local/export1: not
>>> responding
>>>> nfs server 192.168.10.133:/usr/local/export1: is alive
>>> again
>>>> and then things start working again.  The server was
>>> originally FreeBSD 8.0-Release also but was upgraded to the
>>> latest stable to see if this issue could be avoided.
>>>> # nfsstat -s -W -w 1
>>>>   GtAttr Lookup Rdlink   Read  Write Rename
>>> Access  Rddir
>>>>        0      0      0    222    257
>>>    0      0      0
>>>>        0      0      0    178    135
>>>    0      0      0
>>>>        0      0      0     85    127
>>>      0      0      0
>>>>        0      0      0      0      0
>>>      0      0      0
>>>>        0      0      0      0      0
>>>      0      0      0
>>>>        0      0      0      0      0
>>>      0      0      0
>>>>        0      0      0      0      0
>>>      0      0      0
>>>>        0      0      0      0      0
>>>      0      0      0
>>>> ... for 75 rows of all zeros
>>>>
>>>>        0      0      0    272    266
>>>    0      0      0
>>>>        0      0      0    167    165
>>>    0      0      0
>>>> I also tried runs with 15 simultaneous processes and
>>> 25.  15 processes gave only about a 5 second stall but 25
>>> gave again the same 75 second stall.
>>>> Further, I tested with 2 mounts to the same server but
>>> from ZFS filesytems with the exact same stall/timeout
>>> periods.  So, it doesn't appear to matter what the
>>> underlying filesystem is - it's something in NFS or
>>> networking code.
>>>> Any ideas on what's going on here?  What's causing
>>> the complete stall period of zero NFS activity?   Any flaws
>>> with my testing methods?
>>>> Thanks for any and all help/ideas.
>>> What network driver are you using? Have you tried
>>> tcpdumping the packets?
>>> -Garrett
>>>
>> I'm using igb currently but have also used em.  I have not tried tcpdumping the packets yet on this test.  Any suggestions on things to look out for (I'm not that familiar with that whole process).
>>
>> Which brings up another point - I'm using TCP connections for NFS, not UDP.
>      Is the net.inet.tcp.tso sysctl enabled or not? What about rxcsum and txcsum?
> Thanks,
> -Garrett

We're occaisionally seeing these same types of stalls (+ repeated "is 
not responding" "is alive again" messages in quick succession).  We're 
seeing it only on our 8.1-RELEASE systems against a variety of NFS 
servers (6.3-RELEASE, 7.2-RELEASE, and 8-STABLE from before the release 
of 8.1).  We also see it happen with a variety of client hardware and 
network adapters (em, bce, bge); the only common denominator is 
8.1-RELEASE on the clients.