From owner-freebsd-current@FreeBSD.ORG Tue Nov 8 07:40:13 2005 Return-Path: X-Original-To: current@freebsd.org Delivered-To: freebsd-current@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0ECB016A420 for ; Tue, 8 Nov 2005 07:40:13 +0000 (GMT) (envelope-from mohan_srinivasan@yahoo.com) Received: from web31803.mail.mud.yahoo.com (web31803.mail.mud.yahoo.com [68.142.207.66]) by mx1.FreeBSD.org (Postfix) with SMTP id 045F043D45 for ; Tue, 8 Nov 2005 07:40:11 +0000 (GMT) (envelope-from mohan_srinivasan@yahoo.com) Received: (qmail 60590 invoked by uid 60001); 8 Nov 2005 07:40:11 -0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:Received:Date:From:Subject:To:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=kIzYhCV8wkCYiMZ9FNPXviGIUUXwI9O90R8uhFYpj8mABmxlYHc3n9npUD+zb1qK5z3FhQ069KyIKONynDdKX9NASzU7LznT4kMMnxIE8PgGY6b2EtqrCIF7L9Uwzg/1vkC4D7C9sBXAlhmawfuEcN2QB56AKkmGXeNiBAOgH1M= ; Message-ID: <20051108074011.60588.qmail@web31803.mail.mud.yahoo.com> Received: from [71.139.4.142] by web31803.mail.mud.yahoo.com via HTTP; Mon, 07 Nov 2005 23:40:11 PST Date: Mon, 7 Nov 2005 23:40:11 -0800 (PST) From: Mohan Srinivasan To: TOMITA Yoshinori , current@freebsd.org In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit Cc: Subject: Re: FreeBSD 6.0 TCP NFS client stuck X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Nov 2005 07:40:13 -0000 Great. I'll commit the fix to current shortly and it will get MFC'ed to the 6.x branch soon. thanks ! mohan --- TOMITA Yoshinori wrote: > Thanks, Mohan. > > With your patch, it seems working fine. > > I could confirm FreeBSD6 NFS client will create a new TCP connection > and old session was cleaned up. > > > [tcpdump log] > > 15:36:48.138599 IP CLIENT.3523215360 > SERVER.nfs: 40 null > 15:36:48.138832 IP SERVER.nfs > CLIENT.3523215360: reply ok 24 null > 15:36:49.188099 IP SERVER.nfsd > CLIENT.706: F 86953281:86953281(0) ack 61240098 win 10136 > > 15:36:49.188169 IP CLIENT.706 > SERVER.nfsd: . ack 1 win 65535 797285900> > > *closed* > > 15:37:18.148158 IP CLIENT.3791650816 > SERVER.nfs: 40 null > 15:37:18.148393 IP SERVER.nfs > CLIENT.3791650816: reply ok 24 null > 15:37:48.157804 IP CLIENT.4060086272 > SERVER.nfs: 40 null > 15:37:48.158054 IP SERVER.nfs > CLIENT.4060086272: reply ok 24 null > > *try NFS access* > > 15:38:02.734963 IP CLIENT.2121656486 > SERVER.nfs: 132 access [|nfs] > 15:38:02.735001 IP CLIENT.706 > SERVER.nfsd: F 133:133(0) ack 1 win 65535 558510 797285900> > 15:38:02.735127 IP CLIENT.3946253056 > SERVER.nfs: 0 proc-215011 > 15:38:02.735211 IP SERVER.nfsd > CLIENT.706: . ack 134 win 10136 558509> > 15:38:02.735354 IP SERVER.nfs > CLIENT.3560442624: reply ERR 0 > 15:38:02.735444 IP CLIENT.980 > SERVER.nfsd: . ack 169844227 win 33304 797293252> > 15:38:02.735508 IP CLIENT.2121656486 > SERVER.nfs: 132 access [|nfs] > 15:38:02.735698 IP SERVER.nfsd > CLIENT.980: . ack 132 win 10004 558510> > 15:38:02.737331 IP SERVER.nfs > CLIENT.2121656486: reply ok 124 access [|nfs] > 15:38:02.739255 IP CLIENT.2121656493 > SERVER.nfs: 132 access [|nfs] > 15:38:02.739477 IP SERVER.nfsd > CLIENT.980: . ack 264 win 10136 558514> > 15:38:02.739612 IP SERVER.nfs > CLIENT.2121656493: reply ok 124 access [|nfs] > 15:38:02.739971 IP CLIENT.2121656494 > SERVER.nfs: 128 fsstat [|nfs] > 15:38:02.740322 IP SERVER.nfs > CLIENT.2121656494: reply ok 172 fsstat [|nfs] > 15:38:02.740553 IP CLIENT.2121656495 > SERVER.nfs: 148 readdir [|nfs] > 15:38:02.741950 IP SERVER.nfs > CLIENT.2121656495: reply ok 1448 readdir [|nfs] > 15:38:02.742072 IP SERVER.nfs > CLIENT.0: reply ERR 1448 > 15:38:02.742132 IP CLIENT.980 > SERVER.nfsd: . ack 3317 win 64906 797293252> > 15:38:02.742192 IP SERVER.nfs > CLIENT.1: reply ERR 1448 > 15:38:02.742315 IP SERVER.nfs > CLIENT.2316: reply ok 1448 > 15:38:02.742370 IP CLIENT.980 > SERVER.nfsd: . ack 6213 win 63458 797293252> > 15:38:02.742639 IP SERVER.nfs > CLIENT.779122034: reply ERR 1448 > 15:38:02.742724 IP SERVER.nfs > CLIENT.4020: reply ok 952 > 15:38:02.742787 IP CLIENT.980 > SERVER.nfsd: . ack 8613 win 65535 797293252> > 15:38:02.743621 IP CLIENT.2121656496 > SERVER.nfs: 148 readdir [|nfs] > 15:38:02.744928 IP SERVER.nfs > CLIENT.2121656496: reply ok 1448 readdir [|nfs] > 15:38:02.745043 IP SERVER.nfs > CLIENT.777151841: reply ERR 1448 > 15:38:02.745103 IP CLIENT.980 > SERVER.nfsd: . ack 11509 win 64906 797293253> > ...snip... > > > > [netstat at NFS server] > > 1. first time > > SERVER.nfsd CLIENT.706 131070 0 10136 0 ESTABLISHED > > 2. closed > > SERVER.nfsd CLIENT.706 131070 0 10136 0 FIN_WAIT_2 > > 3. after reconnect > > SERVER.nfsd CLIENT.706 131070 0 10136 0 TIME_WAIT > SERVER.nfsd CLIENT.980 131070 0 10136 0 ESTABLISHED > > > >> On Mon, 7 Nov 2005 21:41:22 -0800 (PST), Mohan Srinivasan said: > > M> Hi, > M> Can you try the change attached below and tell me if it > M> fixes your problem ? Sorry, I haven't tested this change > M> (I don't have a Solaris server handy to test against and > M> would have to hack the FreeBSD server to close down the > M> connection after a period of inactivity, like Solaris seems > M> to do). > > M> The change should cause the client to detect that the > M> server has FIN'ed, and should force a reconnect. Let me > M> know if this works for you and I will commit. > > M> thanks > > M> mohan > > M> ==== //depot/yahoo/ybsd_6/src/sys/nfsclient/nfs_socket.c#10 - /homes/mohans/p4/yahoo/ybsd > M> _6/src/sys/nfsclient/nfs_socket.c ==== > M> @@ -521,6 +521,17 @@ > M> return (error); > M> } > > M> +static inline int > M> +nfs_cantrecvmore(struct socket *so) > M> +{ > M> + int ret; > M> + > M> + SOCKBUF_LOCK(&so->so_rcv); > M> + ret = (so->so_rcv.sb_state & SBS_CANTRCVMORE); > M> + SOCKBUF_UNLOCK(&so->so_rcv); > M> + return ret; > M> +} > M> + > M> int > M> nfs_reply(struct nfsreq *rep) > M> { > M> @@ -550,7 +561,7 @@ > M> } > M> so = rep->r_nmp->nm_so; > M> mtx_lock(&rep->r_nmp->nm_nfstcpstate.mtx); > M> - if (!so || > M> + if (!so || nfs_cantrecvmore(so) || > M> (rep->r_nmp->nm_nfstcpstate.flags & NFS_TCP_FORCE_RECONNECT)) { > M> mtx_unlock(&rep->r_nmp->nm_nfstcpstate.mtx); > M> error = nfs_reconnect(rep); > > > M> --- TOMITA Yoshinori wrote: > > >> I saw a NFS client of FreeBSD 6.0-RC1 gets NFS stuck using TCP NFS and > >> amd -w LARGE_NUMBER, where -w specifies dismount interval such as > >> 3600. > >> > >> 6.0-STABLE cvsuped 2005-11-7 has same problem. > >> It has always been OK on FreeBSD 4.X to 5.4-STABLE. > >> > >> FreeBSD 6.0 NFS client seems keep trying NFS access forever after NFS > >> TCP connection is (half?) closed. > >> > >> I am not sure amd and -w argument are the key of this problem, but they > >> raise probability from my experience. > >> > >> > >> > >> NFS Server: > >> Solaris 2.6 > >> > >> NFS Client: > >> FreeBSD 6.0-RC1 (cvsuped 2005-11-02) > >> > >> Starting amd at FreeBSD: > >> /usr/sbin/amd -p -w 3600 -a /.amd_mnt -l syslog /host /etc/amd.map /net /etc/amd.map > >> > >> /etc/amd.map: > >> # changed proto=udp to proto=tcp > >> ------ > >> /defaults type:=host;fs:=${autodir}/${rhost}/host;rhost:=${key} > >> * opts:=rw,grpid,resvport,vers=3,proto=tcp,nosuid,nodev > >> ------ > >> > >> > >> > >> tcpdump log: > >> -------------------------------------------------- > >> > >> [several minutes passed after NFS auto-mounted] > >> > >> 11:09:38.852928 IP CLIENT.3254779904 > SERVER.nfs: 40 null > >> 11:09:38.853261 IP SERVER.nfs > CLIENT.3254779904: reply ok 24 null > >> 11:10:08.851290 IP CLIENT.3523215360 > SERVER.nfs: 40 null > >> 11:10:08.851529 IP SERVER.nfs > CLIENT.3523215360: reply ok 24 null > >> > >> [FIN from NFS server, FreeBSD NFS client sends ACK] > >> > >> 11:10:08.854469 IP SERVER.nfsd > CLIENT.820: F 341:341(0) ack 200 win 10136 > >> 787046144 183809> > >> 11:10:08.854555 IP CLIENT.820 > SERVER.nfsd: . ack 342 win 65535 787046144> > >> 11:10:08.864321 IP SERVER.nfsd > CLIENT.679: F 341:341(0) ack 200 win 10136 > >> 787046145 183817> > >> 11:10:08.864383 IP CLIENT.679 > SERVER.nfsd: . ack 342 win 65535 787046145> > >> 11:10:08.895133 IP SERVER.nfsd > CLIENT.735: F 40141:40141(0) ack 1900 win 10136 > >> > >> 11:10:08.895203 IP CLIENT.735 > SERVER.nfsd: . ack 40142 win 65535 787046148> > >> > >> [still NFS NULL continues from FreeBSD NFS client] > >> > >> 11:10:38.849614 IP CLIENT.3791650816 > SERVER.nfs: 40 null > >> 11:10:38.849906 IP SERVER.nfs > CLIENT.3791650816: reply ok 24 null > >> 11:11:08.847947 IP CLIENT.4060086272 > SERVER.nfs: 40 null > >> 11:11:08.848289 IP SERVER.nfs > CLIENT.4060086272: reply ok 24 null > >> 11:11:38.846275 IP CLIENT.33619968 > SERVER.nfs: 40 null > >> 11:11:38.846555 IP SERVER.nfs > CLIENT.33619968: reply ok 24 null > >> > >> [do some NFS access, such as /bin/ls /host/SERVER/SOMEWHERE on FreeBSD] > >> > >> 11:12:08.413414 IP CLIENT.302055424 > SERVER.nfs: 40 null > >> 11:12:08.413797 IP SERVER.nfs > CLIENT.302055424: reply ok 24 null > >> 11:12:08.414096 IP CLIENT.1563723127 > SERVER.nfs: 132 access [|nfs] > >> 11:12:08.507632 IP SERVER.nfsd > CLIENT.735: . ack 2032 win 10136 787058109 > 663316> > >> 11:12:10.083443 IP CLIENT.1563723127 > SERVER.nfs: 132 access [|nfs] > >> 11:12:10.177666 IP SERVER.nfsd > CLIENT.735: . ack 2164 win 10136 787058276 > 664986> > >> 11:12:11.743299 IP CLIENT.1563723127 > SERVER.nfs: 132 access [|nfs] > >> 11:12:11.837696 IP SERVER.nfsd > CLIENT.735: . ack 2296 win 10136 787058442 > 666646> > >> 11:12:15.063018 IP CLIENT.1563723127 > SERVER.nfs: 132 access [|nfs] > >> 11:12:15.157790 IP SERVER.nfsd > CLIENT.735: . ack 2428 win 10136 787058774 > 669966> > >> 11:12:21.702417 IP CLIENT.1563723127 > SERVER.nfs: 132 access [|nfs] > >> 11:12:21.798028 IP SERVER.nfsd > CLIENT.735: . ack 2560 win 10136 787059438 > 676606> > >> 11:12:34.981251 IP CLIENT.1563723127 > SERVER.nfs: 132 access [|nfs] > >> 11:12:35.078429 IP SERVER.nfsd > CLIENT.735: . ack 2692 win 10136 787060766 > 689886> > >> 11:12:38.411967 IP CLIENT.570490880 > SERVER.nfs: 40 null > >> 11:12:38.412267 IP SERVER.nfs > CLIENT.570490880: reply ok 24 null > >> 11:13:01.538891 IP CLIENT.1563723127 > SERVER.nfs: 132 access [|nfs] > >> > >> [----> message repeats on FreeBSD NFS client] > >> nfs server SERVER:/export/home: not responding > >> > >> -------------------------------------------------- > >> > >> [netstat @ Solaris NFS server, when NFS is OK] > >> > >> SERVER.nfsd CLIENT.820 131070 0 10136 0 ESTABLISHED > >> SERVER.nfsd CLIENT.735 131070 0 10136 0 ESTABLISHED > >> SERVER.nfsd CLIENT.679 131070 0 10136 0 ESTABLISHED > >> > >> [after NFS stuck] > >> > >> SERVER.nfsd CLIENT.820 131070 0 10136 0 FIN_WAIT_2 > >> SERVER.nfsd CLIENT.735 131070 0 10136 0 FIN_WAIT_2 > >> SERVER.nfsd CLIENT.679 131070 0 10136 0 FIN_WAIT_2 > >> > >> [netstat @ FreeBSD NFS client, after NFS stuck] > >> > >> tcp4 0 0 CLIENT.679 SERVER.nfsd CLOSE_WAIT > >> tcp4 0 0 CLIENT.735 SERVER.nfsd CLOSE_WAIT > >> tcp4 0 0 CLIENT.820 SERVER.nfsd CLOSE_WAIT > >> > >> > >> -- > >> --- > >> TOMITA Yoshinori > >> (Fujitsu Laboratories Ltd., Kawasaki, Japan) > >> _______________________________________________ > >> freebsd-current@freebsd.org mailing list > >> http://lists.freebsd.org/mailman/listinfo/freebsd-current > >> To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org" > >> > > > -- > --- > TOMITA Yoshinori > (Fujitsu Laboratories Ltd., Kawasaki, Japan) >