Date: Thu, 1 Jul 2010 13:36:12 -0700 (PDT) From: alan bryan <alan.bryan@yahoo.com> To: Garrett Cooper <yanefbsd@gmail.com> Cc: freebsd-stable@freebsd.org Subject: Re: NFS 75 second stall Message-ID: <119072.59868.qm@web50504.mail.re2.yahoo.com> In-Reply-To: <AANLkTikxnw7sQ_cWCekS-qI3mP1Ui3dPjK1KAVqRg239@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
=0A=0A--- On Thu, 7/1/10, Garrett Cooper <yanefbsd@gmail.com> wrote:=0A=0A>= From: Garrett Cooper <yanefbsd@gmail.com>=0A> Subject: Re: NFS 75 second s= tall=0A> To: "alan bryan" <alan.bryan@yahoo.com>=0A> Cc: freebsd-stable@fre= ebsd.org=0A> Date: Thursday, July 1, 2010, 1:28 PM=0A> On Thu, Jul 1, 2010 = at 1:18 PM, alan=0A> bryan <alan.bryan@yahoo.com>=0A> wrote:=0A> >=0A> >=0A= > > --- On Thu, 7/1/10, Garrett Cooper <yanefbsd@gmail.com>=0A> wrote:=0A> = >=0A> >> From: Garrett Cooper <yanefbsd@gmail.com>=0A> >> Subject: Re: NFS = 75 second stall=0A> >> To: "alan bryan" <alan.bryan@yahoo.com>=0A> >> Cc: f= reebsd-stable@freebsd.org=0A> >> Date: Thursday, July 1, 2010, 12:23 PM=0A>= >> On Thu, Jul 1, 2010 at 11:51 AM, alan=0A> >> bryan <alan.bryan@yahoo.co= m>=0A> >> wrote:=0A> >> >=0A> >> >=0A> >> > --- On Thu, 7/1/10, Garrett Coo= per <yanefbsd@gmail.com>=0A> >> wrote:=0A> >> >=0A> >> >> From: Garrett Coo= per <yanefbsd@gmail.com>=0A> >> >> Subject: Re: NFS 75 second stall=0A> >> = >> To: "alan bryan" <alan.bryan@yahoo.com>=0A> >> >> Cc: freebsd-stable@fre= ebsd.org=0A> >> >> Date: Thursday, July 1, 2010, 11:13 AM=0A> >> >> On Thu,= Jul 1, 2010 at 11:01 AM, alan=0A> >> >> bryan <alan.bryan@yahoo.com>=0A> >= > >> wrote:=0A> >> >> > Setup:=0A> >> >> >=0A> >> >> > server - FreeBSD 8-s= table from=0A> today.=A0 2 UFS=0A> >> dirs=0A> >> >> exported via NFS.=0A> = >> >> > client - FreeBSD 8.0-Release.=0A> =A0Running a=0A> >> test php=0A> = >> >> script that copies around various files=0A> to/from 2=0A> >> separate= =0A> >> >> NFS mounts.=0A> >> >> >=0A> >> >> > Situation:=0A> >> >> >=0A> >= > >> > script is started (forked to do 20=0A> >> simultaneous runs)=0A> >> = >> and 20 1GB files are copied to the NFS=0A> dir which=0A> >> works=0A> >>= >> fine.=A0 When it then switches to reading=0A> those=0A> >> files back= =0A> >> >> and simultaneously writing to the other=0A> NFS mount=0A> >> I s= ee a=0A> >> >> hang of 75 seconds.=A0 If I do an "ls -l"=0A> on the=0A> >> = NFS mount it=0A> >> >> hangs too.=A0 After 75 seconds the client=0A> has=0A= > >> reported:=0A> >> >> >=0A> >> >> > nfs server=0A> 192.168.10.133:/usr/l= ocal/export1:=0A> >> not=0A> >> >> responding=0A> >> >> > nfs server=0A> 19= 2.168.10.133:/usr/local/export1:=0A> >> is alive=0A> >> >> again=0A> >> >> = > nfs server=0A> 192.168.10.133:/usr/local/export1:=0A> >> not=0A> >> >> re= sponding=0A> >> >> > nfs server=0A> 192.168.10.133:/usr/local/export1:=0A> = >> is alive=0A> >> >> again=0A> >> >> >=0A> >> >> > and then things start w= orking=0A> again.=A0 The=0A> >> server was=0A> >> >> originally FreeBSD 8.0= -Release also but=0A> was=0A> >> upgraded to the=0A> >> >> latest stable to= see if this issue could=0A> be=0A> >> avoided.=0A> >> >> >=0A> >> >> > # n= fsstat -s -W -w 1=0A> >> >> > =A0GtAttr Lookup Rdlink=A0=A0=A0Read=A0=0A> W= rite=0A> >> Rename=0A> >> >> Access=A0 Rddir=0A> >> >> > =A0 =A0 =A0 0=A0 = =A0 =A0 0=A0 =A0 =A0 0=A0 =A0=0A> 222=0A> >> 257=0A> >> >> =A0 0=A0 =A0 =A0= 0=A0 =A0 =A0 0=0A> >> >> > =A0 =A0 =A0 0=A0 =A0 =A0 0=A0 =A0 =A0 0=A0 =A0= =0A> 178=0A> >> 135=0A> >> >> =A0 0=A0 =A0 =A0 0=A0 =A0 =A0 0=0A> >> >> > = =A0 =A0 =A0 0=A0 =A0 =A0 0=A0 =A0 =A0 0=A0=0A> =A0=A0=A085=0A> >> =A0 127= =0A> >> >> =A0 =A0 0=A0 =A0 =A0 0=A0 =A0 =A0 0=0A> >> >> > =A0 =A0 =A0 0=A0= =A0 =A0 0=A0 =A0 =A0 0=A0 =A0=0A> =A0 0=0A> >> =A0 0=0A> >> >> =A0 =A0 0= =A0 =A0 =A0 0=A0 =A0 =A0 0=0A> >> >> > =A0 =A0 =A0 0=A0 =A0 =A0 0=A0 =A0 = =A0 0=A0 =A0=0A> =A0 0=0A> >> =A0 0=0A> >> >> =A0 =A0 0=A0 =A0 =A0 0=A0 =A0= =A0 0=0A> >> >> > =A0 =A0 =A0 0=A0 =A0 =A0 0=A0 =A0 =A0 0=A0 =A0=0A> =A0 0= =0A> >> =A0 0=0A> >> >> =A0 =A0 0=A0 =A0 =A0 0=A0 =A0 =A0 0=0A> >> >> > =A0= =A0 =A0 0=A0 =A0 =A0 0=A0 =A0 =A0 0=A0 =A0=0A> =A0 0=0A> >> =A0 0=0A> >> >= > =A0 =A0 0=A0 =A0 =A0 0=A0 =A0 =A0 0=0A> >> >> > =A0 =A0 =A0 0=A0 =A0 =A0 = 0=A0 =A0 =A0 0=A0 =A0=0A> =A0 0=0A> >> =A0 0=0A> >> >> =A0 =A0 0=A0 =A0 =A0= 0=A0 =A0 =A0 0=0A> >> >> >=0A> >> >> > ... for 75 rows of all zeros=0A> >>= >> >=0A> >> >> > =A0 =A0 =A0 0=A0 =A0 =A0 0=A0 =A0 =A0 0=A0 =A0=0A> 272=0A= > >> 266=0A> >> >> =A0 0=A0 =A0 =A0 0=A0 =A0 =A0 0=0A> >> >> > =A0 =A0 =A0 = 0=A0 =A0 =A0 0=A0 =A0 =A0 0=A0 =A0=0A> 167=0A> >> 165=0A> >> >> =A0 0=A0 = =A0 =A0 0=A0 =A0 =A0 0=0A> >> >> >=0A> >> >> > I also tried runs with 15=0A= > simultaneous=0A> >> processes and=0A> >> >> 25. =A015 processes gave only= about a 5=0A> second=0A> >> stall but 25=0A> >> >> gave again the same 75 = second stall.=0A> >> >> >=0A> >> >> > Further, I tested with 2 mounts to=0A= > the same=0A> >> server but=0A> >> >> from ZFS filesytems with the exact s= ame=0A> >> stall/timeout=0A> >> >> periods. =A0So, it doesn't appear to=0A>= matter what=0A> >> the=0A> >> >> underlying filesystem is - it's something= =0A> in NFS=0A> >> or=0A> >> >> networking code.=0A> >> >> >=0A> >> >> > An= y ideas on what's going on here?=0A> =A0What's=0A> >> causing=0A> >> >> the= complete stall period of zero NFS=0A> activity?=0A> >> Any flaws=0A> >> >>= with my testing methods?=0A> >> >> >=0A> >> >> > Thanks for any and all he= lp/ideas.=0A> >> >>=0A> >> >> What network driver are you using? Have=0A> y= ou tried=0A> >> >> tcpdumping the packets?=0A> >> >> -Garrett=0A> >> >>=0A>= >> >=0A> >> > I'm using igb currently but have also used=0A> em. =A0I=0A> = >> have not tried tcpdumping the packets yet on this=0A> test.=0A> >> =A0An= y suggestions on things to look out for (I'm=0A> not that=0A> >> familiar w= ith that whole process).=0A> >> >=0A> >> > Which brings up another point - = I'm using=0A> TCP=0A> >> connections for NFS, not UDP.=0A> >>=0A> >> =A0 = =A0 Is the net.inet.tcp.tso sysctl enabled or=0A> >> not? What about rxcsum= and txcsum?=0A> >> Thanks,=0A> >> -Garrett=0A> >>=0A> >=0A> > I haven't in= tentionally/explicitly set any of this so=0A> it's "default":=0A> >=0A> > #= sysctl net.inet.tcp.tso=0A> > net.inet.tcp.tso: 1=0A> >=0A> >=0A> > igb0:= =0A> flags=3D8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST>=0A> metric 0 mtu = 1500=0A> > =A0 =A0 =A0=0A> =A0options=3D13b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWT= AGGING,JUMBO_MTU,TSO4>=0A> > =A0 =A0 =A0 =A0ether 00:30:48:c3:26:94=0A> > = =A0 =A0 =A0 =A0inet 192.168.10.133 netmask 0xffffff00=0A> broadcast 192.168= .10.255=0A> > =A0 =A0 =A0 =A0media: Ethernet autoselect (1000baseT=0A> <ful= l-duplex>)=0A> > =A0 =A0 =A0 =A0status: active=0A> =0A> Devise all of the a= vailable permutations that you need to=0A> use to test=0A> this out; there = are a total of 3 variables, so 9=0A> permutations, but=0A> you've already `= tested one', so that makes the permutation=0A> count 8.=0A> Example:=0A> = =0A> TXCSUM=3Doff, RXCSUM=3Don, TSO=3Don=0A> TXCSUM=3Don, RXCSUM=3Doff, TSO= =3Don=0A> TXCSUM=3Don, RXCSUM=3Doff, TSO=3Doff=0A> =0A> ...=0A> =0A> Try ex= ecuting the permutations on the client first, keeping=0A> the server=0A> co= nstant, then make the client constant and make the server=0A> variable,=0A>= and finally do both to the server and client.=0A> =0A> Be sure to take mea= surements for each permutation to ensure=0A> that=0A> things make functiona= l sense.=0A> =0A> The reason why I'm suggesting this is that there were=0A>= issues with=0A> em(4) [and igb(4) too I think since it uses common code],= =0A> with various=0A> hardware offload bits on 8.0-RELEASE (IIRC disabling = txcsum=0A> did the=0A> trick, but you may have to do more than that in orde= r to=0A> get things to=0A> work).=0A> =0A> Here's a similar thread with a d= ifferent driver:=0A> http://lists.freebsd.org/pipermail/freebsd-current/200= 9-June/008264.html=0A> (just to illustrate the thought process used to dete= rmine=0A> the source=0A> of failure).=0A> =0A> Thanks,=0A> -Garrett=0A> =0A= =0AThanks for the detailed test plan!=0A=0AIs it also fair to then assume t= hat if I update the NFS client machine to the latest 8-Stable that should a= lso fix this issue? (Both will then be running the latest 8-stable code). = These are not in production so I can test or upgrade with no issues.=0A=0A= Thanks again.=0A--Alan=0A=0A=0A=0A
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?119072.59868.qm>