Date: Thu, 1 Jul 2010 12:23:49 -0700 From: Garrett Cooper <yanefbsd@gmail.com> To: alan bryan <alan.bryan@yahoo.com> Cc: freebsd-stable@freebsd.org Subject: Re: NFS 75 second stall Message-ID: <AANLkTillzgI775xETcZcmyj4TyTVihZJ5tSznxOoWE_r@mail.gmail.com> In-Reply-To: <538823.39365.qm@web50508.mail.re2.yahoo.com> References: <AANLkTilNvy3FYUNjjiJ85eWrF7jTAvJJ9E7Q2eqhhQj6@mail.gmail.com> <538823.39365.qm@web50508.mail.re2.yahoo.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Jul 1, 2010 at 11:51 AM, alan bryan <alan.bryan@yahoo.com> wrote:
>
>
> --- On Thu, 7/1/10, Garrett Cooper <yanefbsd@gmail.com> wrote:
>
>> From: Garrett Cooper <yanefbsd@gmail.com>
>> Subject: Re: NFS 75 second stall
>> To: "alan bryan" <alan.bryan@yahoo.com>
>> Cc: freebsd-stable@freebsd.org
>> Date: Thursday, July 1, 2010, 11:13 AM
>> On Thu, Jul 1, 2010 at 11:01 AM, alan
>> bryan <alan.bryan@yahoo.com>
>> wrote:
>> > Setup:
>> >
>> > server - FreeBSD 8-stable from today. 2 UFS dirs
>> exported via NFS.
>> > client - FreeBSD 8.0-Release. Running a test php
>> script that copies around various files to/from 2 separate
>> NFS mounts.
>> >
>> > Situation:
>> >
>> > script is started (forked to do 20 simultaneous runs)
>> and 20 1GB files are copied to the NFS dir which works
>> fine. When it then switches to reading those files back
>> and simultaneously writing to the other NFS mount I see a
>> hang of 75 seconds. If I do an "ls -l" on the NFS mount it
>> hangs too. After 75 seconds the client has reported:
>> >
>> > nfs server 192.168.10.133:/usr/local/export1: not
>> responding
>> > nfs server 192.168.10.133:/usr/local/export1: is alive
>> again
>> > nfs server 192.168.10.133:/usr/local/export1: not
>> responding
>> > nfs server 192.168.10.133:/usr/local/export1: is alive
>> again
>> >
>> > and then things start working again. The server was
>> originally FreeBSD 8.0-Release also but was upgraded to the
>> latest stable to see if this issue could be avoided.
>> >
>> > # nfsstat -s -W -w 1
>> > GtAttr Lookup Rdlink Read Write Rename
>> Access Rddir
>> > 0 0 0 222 257
>> 0 0 0
>> > 0 0 0 178 135
>> 0 0 0
>> > 0 0 0 85 127
>> 0 0 0
>> > 0 0 0 0 0
>> 0 0 0
>> > 0 0 0 0 0
>> 0 0 0
>> > 0 0 0 0 0
>> 0 0 0
>> > 0 0 0 0 0
>> 0 0 0
>> > 0 0 0 0 0
>> 0 0 0
>> >
>> > ... for 75 rows of all zeros
>> >
>> > 0 0 0 272 266
>> 0 0 0
>> > 0 0 0 167 165
>> 0 0 0
>> >
>> > I also tried runs with 15 simultaneous processes and
>> 25. 15 processes gave only about a 5 second stall but 25
>> gave again the same 75 second stall.
>> >
>> > Further, I tested with 2 mounts to the same server but
>> from ZFS filesytems with the exact same stall/timeout
>> periods. So, it doesn't appear to matter what the
>> underlying filesystem is - it's something in NFS or
>> networking code.
>> >
>> > Any ideas on what's going on here? What's causing
>> the complete stall period of zero NFS activity? Any flaws
>> with my testing methods?
>> >
>> > Thanks for any and all help/ideas.
>>
>> What network driver are you using? Have you tried
>> tcpdumping the packets?
>> -Garrett
>>
>
> I'm using igb currently but have also used em. I have not tried tcpdumping the packets yet on this test. Any suggestions on things to look out for (I'm not that familiar with that whole process).
>
> Which brings up another point - I'm using TCP connections for NFS, not UDP.
Is the net.inet.tcp.tso sysctl enabled or not? What about rxcsum and txcsum?
Thanks,
-Garrett
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?AANLkTillzgI775xETcZcmyj4TyTVihZJ5tSznxOoWE_r>
