Date: Wed, 14 Oct 2009 14:40:04 GMT From: Burt Rosenberg <burt@cs.miami.edu> To: freebsd-net@FreeBSD.org Subject: Re: kern/130628: [nfs] NFS / rpc.lockd deadlock on 7.1-R Message-ID: <200910141440.n9EEe4nn088580@freefall.freebsd.org>
next in thread | raw e-mail | index | archive | help
The following reply was made to PR kern/130628; it has been noted by GNATS. From: Burt Rosenberg <burt@cs.miami.edu> To: bug-followup@freebsd.org, Joe Marcus Clarke <marcus@marcuscom.com> Cc: Subject: Re: kern/130628: [nfs] NFS / rpc.lockd deadlock on 7.1-R Date: Wed, 14 Oct 2009 10:31:45 -0400 --000e0cd6c8b6adc3e40475e605f0 Content-Type: text/plain; charset=ISO-8859-1 The patch which helped, but did not entirely fix the lock is not in 7.2-p4, i386. Furthermore, we now have a deadlock on an NFS mount between a free bsd 7.2-p3 and a Linux 2.6.18-164.el5 SMP i686 athlon i386, in this situation there is a cisco ASA 5220 between linux and freebsd boxes, and we run tcp nfs. On Thu, Sep 3, 2009 at 2:40 PM, Burt Rosenberg <burt@cs.miami.edu> wrote: > It seems that : > > http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/130628 > > appears in 7.2-R-p3; With this kernel, against Fedora 8 distros: > > Linux prism09.cs.miami.edu 2.6.26.8-57.fc8 #1 SMP Thu Dec 18 18:59:49 EST > 2008 x86_64 x86_64 x86_64 GNU/Linux > > which are using NFS (tcp) to mount homedirs form the freebsd server to the > fedora client, > server will become unresponsive from the network during graphical login of > a client. > > Applying the patch given in the article > http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/130628 seems at present to > fix the problem. Under a 7.2-R-p3, we can manifest the problem in a few > minutes, and under said kernel with patches as described in the article, and > as provided by diffs against the current source, we have not yet seen the > problem. > > When the problem appears, the sever cannot be pinged, an other network > connections are halted. > > On the server, for instance, top shows: > > Proc, state, pri > -------------------- > pc.lockd *tcpin -68 > nfsd - 4 > rpcbind select 44 > ntpd select 44 > nfsd select 44 > ... etc... > > > Also, > > ./lockd restart > Stopping lockd. > Waiting for PIDS: 1114, 1114, 1114, 1114,.... > > kill -9 1114 also ineffective. > > So it seems to be something spinning in lockd. > > I think this is a serious issue and would like to see it resolved. Our > setup is available if you would like to send instrumented code. I attach > diffs. > > > > --000e0cd6c8b6adc3e40475e605f0 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable The patch which helped, but did not entirely fix the lock is not in 7.2-p4,= i386.<br><br>Furthermore, we now have a deadlock on an NFS mount between a= free bsd 7.2-p3 and a Linux 2.6.18-164.el5 SMP i686 athlon i386, <br><br> in this situation there is a=A0 cisco ASA 5220 between linux and freebsd bo= xes, and we run tcp nfs.<br><br><br><br><div class=3D"gmail_quote">On Thu, = Sep 3, 2009 at 2:40 PM, Burt Rosenberg <span dir=3D"ltr"><<a href=3D"mai= lto:burt@cs.miami.edu">burt@cs.miami.edu</a>></span> wrote:<br> <blockquote class=3D"gmail_quote" style=3D"border-left: 1px solid rgb(204, = 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">It seems that :<b= r>=A0<br> <a href=3D"http://www.freebsd.org/cgi/query-pr.cgi?pr=3Dkern/1306= 28" target=3D"_blank">http://www.freebsd.org/cgi/query-pr.cgi?pr=3Dkern/130= 628</a><br> <br>appears in 7.2-R-p3; With this kernel, against Fedora 8 distros:<br> <br>Linux <a href=3D"http://prism09.cs.miami.edu/" target=3D"_blank">prism0= 9.cs.miami.edu</a> 2.6.26.8-57.fc8 #1 SMP Thu Dec 18 18:59:49 EST 2008 x86_= 64 x86_64 x86_64 GNU/Linux<br><br>which are using NFS (tcp) to mount homedi= rs form the freebsd server to the fedora client, <br> server will become unresponsive from the network during graphical login of = a client.<br><br>Applying the patch given in the article <a href=3D"http://= www.freebsd.org/cgi/query-pr.cgi?pr=3Dkern/130628" target=3D"_blank">http:/= /www.freebsd.org/cgi/query-pr.cgi?pr=3Dkern/130628</a> seems at present to = fix the problem. Under a 7.2-R-p3, we can manifest the problem in a few min= utes, and under said kernel with patches as described in the article, and a= s provided by diffs against the current source, we have not yet seen the pr= oblem.<br> <br>When the problem appears, the sever cannot be pinged, an other network = connections are halted. <br><br>On the server, for instance, top shows:<br>= <br style=3D"font-family: courier new,monospace;"><span style=3D"font-famil= y: courier new,monospace;">Proc, state, pri</span><br style=3D"font-family:= courier new,monospace;"> <span style=3D"font-family: courier new,monospace;">--------------------</s= pan><br style=3D"font-family: courier new,monospace;"><span style=3D"font-f= amily: courier new,monospace;">pc.lockd=A0=A0 *tcpin=A0=A0 -68 </span><br s= tyle=3D"font-family: courier new,monospace;"> <span style=3D"font-family: courier new,monospace;">nfsd=A0=A0=A0=A0=A0=A0= =A0=A0=A0 -=A0=A0=A0=A0=A0=A0 4</span><br style=3D"font-family: courier new= ,monospace;"><span style=3D"font-family: courier new,monospace;">rpcbind=A0= =A0=A0=A0 select=A0=A0 44</span><br style=3D"font-family: courier new,monos= pace;"> <span style=3D"font-family: courier new,monospace;">ntpd=A0=A0=A0=A0=A0=A0= =A0 select=A0=A0 44</span><br style=3D"font-family: courier new,monospace;"= ><span style=3D"font-family: courier new,monospace;">nfsd=A0=A0=A0=A0=A0=A0= =A0 select=A0=A0 44</span><br style=3D"font-family: courier new,monospace;"= > <span style=3D"font-family: courier new,monospace;">... etc...</span><br><b= r><br>Also,<br><br><span style=3D"font-family: courier new,monospace;">./lo= ckd restart</span><br style=3D"font-family: courier new,monospace;"><span s= tyle=3D"font-family: courier new,monospace;">Stopping lockd.</span><br styl= e=3D"font-family: courier new,monospace;"> <span style=3D"font-family: courier new,monospace;">Waiting for PIDS: 1114,= 1114, 1114, 1114,....</span><br style=3D"font-family: courier new,monospac= e;"><br>kill -9 1114 also ineffective.<br><br>So it seems to be something s= pinning in lockd.<br> <br>I think this is a serious issue and would like to see it resolved. Our = setup is available if you would like to send instrumented code. I attach di= ffs.<br><br><br><br> </blockquote></div><br> --000e0cd6c8b6adc3e40475e605f0--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200910141440.n9EEe4nn088580>