From owner-freebsd-net@FreeBSD.ORG Wed Oct 14 14:40:04 2009 Return-Path: Delivered-To: freebsd-net@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EC20C106566B for ; Wed, 14 Oct 2009 14:40:04 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id C0D368FC1D for ; Wed, 14 Oct 2009 14:40:04 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.3/8.14.3) with ESMTP id n9EEe4p5088581 for ; Wed, 14 Oct 2009 14:40:04 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.3/8.14.3/Submit) id n9EEe4nn088580; Wed, 14 Oct 2009 14:40:04 GMT (envelope-from gnats) Date: Wed, 14 Oct 2009 14:40:04 GMT Message-Id: <200910141440.n9EEe4nn088580@freefall.freebsd.org> To: freebsd-net@FreeBSD.org From: Burt Rosenberg Cc: Subject: Re: kern/130628: [nfs] NFS / rpc.lockd deadlock on 7.1-R X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Burt Rosenberg List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 14 Oct 2009 14:40:05 -0000 The following reply was made to PR kern/130628; it has been noted by GNATS. From: Burt Rosenberg To: bug-followup@freebsd.org, Joe Marcus Clarke Cc: Subject: Re: kern/130628: [nfs] NFS / rpc.lockd deadlock on 7.1-R Date: Wed, 14 Oct 2009 10:31:45 -0400 --000e0cd6c8b6adc3e40475e605f0 Content-Type: text/plain; charset=ISO-8859-1 The patch which helped, but did not entirely fix the lock is not in 7.2-p4, i386. Furthermore, we now have a deadlock on an NFS mount between a free bsd 7.2-p3 and a Linux 2.6.18-164.el5 SMP i686 athlon i386, in this situation there is a cisco ASA 5220 between linux and freebsd boxes, and we run tcp nfs. On Thu, Sep 3, 2009 at 2:40 PM, Burt Rosenberg wrote: > It seems that : > > http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/130628 > > appears in 7.2-R-p3; With this kernel, against Fedora 8 distros: > > Linux prism09.cs.miami.edu 2.6.26.8-57.fc8 #1 SMP Thu Dec 18 18:59:49 EST > 2008 x86_64 x86_64 x86_64 GNU/Linux > > which are using NFS (tcp) to mount homedirs form the freebsd server to the > fedora client, > server will become unresponsive from the network during graphical login of > a client. > > Applying the patch given in the article > http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/130628 seems at present to > fix the problem. Under a 7.2-R-p3, we can manifest the problem in a few > minutes, and under said kernel with patches as described in the article, and > as provided by diffs against the current source, we have not yet seen the > problem. > > When the problem appears, the sever cannot be pinged, an other network > connections are halted. > > On the server, for instance, top shows: > > Proc, state, pri > -------------------- > pc.lockd *tcpin -68 > nfsd - 4 > rpcbind select 44 > ntpd select 44 > nfsd select 44 > ... etc... > > > Also, > > ./lockd restart > Stopping lockd. > Waiting for PIDS: 1114, 1114, 1114, 1114,.... > > kill -9 1114 also ineffective. > > So it seems to be something spinning in lockd. > > I think this is a serious issue and would like to see it resolved. Our > setup is available if you would like to send instrumented code. I attach > diffs. > > > > --000e0cd6c8b6adc3e40475e605f0 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable The patch which helped, but did not entirely fix the lock is not in 7.2-p4,= i386.

Furthermore, we now have a deadlock on an NFS mount between a= free bsd 7.2-p3 and a Linux 2.6.18-164.el5 SMP i686 athlon i386,

in this situation there is a=A0 cisco ASA 5220 between linux and freebsd bo= xes, and we run tcp nfs.



On Thu, = Sep 3, 2009 at 2:40 PM, Burt Rosenberg <burt@cs.miami.edu> wrote:
It seems that :=A0
http://www.freebsd.org/cgi/query-pr.cgi?pr=3Dkern/130= 628

appears in 7.2-R-p3; With this kernel, against Fedora 8 distros:

Linux prism0= 9.cs.miami.edu 2.6.26.8-57.fc8 #1 SMP Thu Dec 18 18:59:49 EST 2008 x86_= 64 x86_64 x86_64 GNU/Linux

which are using NFS (tcp) to mount homedi= rs form the freebsd server to the fedora client,
server will become unresponsive from the network during graphical login of = a client.

Applying the patch given in the article http:/= /www.freebsd.org/cgi/query-pr.cgi?pr=3Dkern/130628 seems at present to = fix the problem. Under a 7.2-R-p3, we can manifest the problem in a few min= utes, and under said kernel with patches as described in the article, and a= s provided by diffs against the current source, we have not yet seen the pr= oblem.

When the problem appears, the sever cannot be pinged, an other network = connections are halted.

On the server, for instance, top shows:
=
Proc, state, pri
--------------------
pc.lockd=A0=A0 *tcpin=A0=A0 -68
nfsd=A0=A0=A0=A0=A0=A0= =A0=A0=A0 -=A0=A0=A0=A0=A0=A0 4
rpcbind=A0= =A0=A0=A0 select=A0=A0 44
ntpd=A0=A0=A0=A0=A0=A0= =A0 select=A0=A0 44
nfsd=A0=A0=A0=A0=A0=A0= =A0 select=A0=A0 44
... etc...

Also,

./lo= ckd restart
Stopping lockd.
Waiting for PIDS: 1114,= 1114, 1114, 1114,....

kill -9 1114 also ineffective.

So it seems to be something s= pinning in lockd.

I think this is a serious issue and would like to see it resolved. Our = setup is available if you would like to send instrumented code. I attach di= ffs.




--000e0cd6c8b6adc3e40475e605f0--