From owner-freebsd-net@freebsd.org Thu Mar 18 12:58:33 2021 Return-Path: Delivered-To: freebsd-net@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id B8B195745A1 for ; Thu, 18 Mar 2021 12:58:33 +0000 (UTC) (envelope-from tuexen@freebsd.org) Received: from drew.franken.de (drew.ipv6.franken.de [IPv6:2001:638:a02:a001:20e:cff:fe4a:feaa]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "*.franken.de", Issuer "Sectigo RSA Domain Validation Secure Server CA" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4F1Rsj425yz3Q1h; Thu, 18 Mar 2021 12:58:33 +0000 (UTC) (envelope-from tuexen@freebsd.org) Received: from [IPv6:2a02:8109:1140:c3d:51d6:b5cf:58db:3d5e] (unknown [IPv6:2a02:8109:1140:c3d:51d6:b5cf:58db:3d5e]) (Authenticated sender: macmic) by mail-n.franken.de (Postfix) with ESMTPSA id 927A877A91A8E; Thu, 18 Mar 2021 13:58:30 +0100 (CET) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.60.0.2.21\)) Subject: Re: NFS Mount Hangs From: tuexen@freebsd.org In-Reply-To: <202103181253.12ICrF35016815@gndrsh.dnsmgr.net> Date: Thu, 18 Mar 2021 13:58:30 +0100 Cc: Rick Macklem , Alan Somers , "freebsd-net@freebsd.org" Content-Transfer-Encoding: quoted-printable Message-Id: References: <202103181253.12ICrF35016815@gndrsh.dnsmgr.net> To: "Rodney W. Grimes" X-Mailer: Apple Mail (2.3654.60.0.2.21) X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED,BAYES_00 autolearn=disabled version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on mail-n.franken.de X-Rspamd-Queue-Id: 4F1Rsj425yz3Q1h X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Mar 2021 12:58:33 -0000 > On 18. Mar 2021, at 13:53, Rodney W. Grimes = wrote: >=20 > Note I am NOT a TCP expert, but know enough about it to add a = comment... >=20 >> Alan Somers wrote: >> [stuff snipped] >>> Is the 128K limit related to MAXPHYS? If so, it should be greater = in 13.0. >> For the client, yes. For the server, no. >> For the server, it is just a compile time constant NFS_SRVMAXIO. >>=20 >> It's mainly related to the fact that I haven't gotten around to = testing larger >> sizes yet. >> - kern.ipc.maxsockbuf needs to be several times the limit, which = means it would >> have to increase for 1Mbyte. >> - The session code must negotiate a maximum RPC size > 1 Mbyte. >> (I think the server code does do this, but it needs to be tested.) >> And, yes, the client is limited to MAXPHYS. >>=20 >> Doing this is on my todo list, rick >>=20 >> The client should acquire the attributes that indicate that and set = rsize/wsize >> to that. "# nfsstat -m" on the client should show you what the client >> is actually using. If it is larger than 128K, set both rsize and = wsize to 128K. >>=20 >>> Output from the NFS Client when the issue occurs >>> # netstat -an | grep NFS.Server.IP.X >>> tcp 0 0 NFS.Client.IP.X:46896 NFS.Server.IP.X:2049 = FIN_WAIT2 >> I'm no TCP guy. Hopefully others might know why the client would be >> stuck in FIN_WAIT2 (I vaguely recall this means it is waiting for a = fin/ack, >> but could be wrong?) >=20 > The most common way to get stuck in FIN_WAIT2 is to call > shutdown(2) on a socket, but never following up with a > close(2) after some timeout period. The "client" is still > connected to the socket and can stay in this shutdown state > for ever, the kernel well not reap the socket as it is > associated with a processes, aka not orphaned. I suspect > that the Linux client has a corner condition that is leading > to this socket leak. >=20 > If on the Linux client you can look at the sockets to see > if these are still associated with a process, ala fstat or > what ever Linux tool does this that would be helpfull. > If they are infact connected to a process it is that > process that must call close(2) to clean these up. >=20 > IIRC the server side socket would be gone at this point > and there is nothing the server can do that would allow > a FIN_WAIT2 to close down. Jason reported that the server is in CLOSE-WAIT. This would mean the the server received the FIN, ACKed it, but has not initiated the teardown of the Server->Client direction. So the server side socket is still there and close has not be called yet. >=20 > The real TCP experts can now correct my 30 year old TCP > stack understanding... I wouldn't count myself as a real TCP expert, but the behaviour hasn't changed in the last 30 years, I think... Best regards Michael >=20 >>=20 >>> # cat /sys/kernel/debug/sunrpc/rpc_xprt/*/info >>> netid: tcp >>> addr: NFS.Server.IP.X >>> port: 2049 >>> state: 0x51 >>>=20 >>> syslog >>> Mar 4 10:29:27 hostname kernel: [437414.131978] -pid- flgs status = -client- --rqstp- ->timeout ---ops-- >>> Mar 4 10:29:27 hostname kernel: [437414.133158] 57419 40a1 0 = 9b723c73 >143cfadf 30000 4ca953b5 nfsv4 OPEN_NOATTR = a:call_connect_status [sunrpc] >q:xprt_pending >> I don't know what OPEN_NOATTR means, but I assume it is some variant >> of NFSv4 Open operation. >> [stuff snipped] >>> Mar 4 10:29:30 hostname kernel: [437417.110517] RPC: 57419 = xprt_connect_status: >connect attempt timed out >>> Mar 4 10:29:30 hostname kernel: [437417.112172] RPC: 57419 = call_connect_status >>> (status -110) >> I have no idea what status -110 means? >>> Mar 4 10:29:30 hostname kernel: [437417.113337] RPC: 57419 = call_timeout (major) >>> Mar 4 10:29:30 hostname kernel: [437417.114385] RPC: 57419 = call_bind (status 0) >>> Mar 4 10:29:30 hostname kernel: [437417.115402] RPC: 57419 = call_connect xprt >00000000e061831b is not connected >>> Mar 4 10:29:30 hostname kernel: [437417.116547] RPC: 57419 = xprt_connect xprt >00000000e061831b is not connected >>> Mar 4 10:30:31 hostname kernel: [437478.551090] RPC: 57419 = xprt_connect_status: >connect attempt timed out >>> Mar 4 10:30:31 hostname kernel: [437478.552396] RPC: 57419 = call_connect_status >(status -110) >>> Mar 4 10:30:31 hostname kernel: [437478.553417] RPC: 57419 = call_timeout (minor) >>> Mar 4 10:30:31 hostname kernel: [437478.554327] RPC: 57419 = call_bind (status 0) >>> Mar 4 10:30:31 hostname kernel: [437478.555220] RPC: 57419 = call_connect xprt >00000000e061831b is not connected >>> Mar 4 10:30:31 hostname kernel: [437478.556254] RPC: 57419 = xprt_connect xprt >00000000e061831b is not connected >> Is it possible that the client is trying to (re)connect using the = same client port#? >> I would normally expect the client to create a new TCP connection = using a >> different client port# and then retry the outstanding RPCs. >> --> Capturing packets when this happens would show us what is going = on. >>=20 >> If there is a problem on the FreeBSD end, it is most likely a broken >> network device driver. >> --> Try disabling TSO , LRO. >> --> Try a different driver for the net hardware on the server. >> --> Try a different net chip on the server. >> If you can capture packets when (not after) the hang >> occurs, then you can look at them in wireshark and see >> what is actually happening. (Ideally on both client and >> server, to check that your network hasn't dropped anything.) >> --> I know, if the hangs aren't easily reproducible, this isn't >> easily done. >> --> Try a newer Linux kernel and see if the problem persists. >> The Linux folk will get more interested if you can reproduce >> the problem on 5.12. (Recent bakeathon testing of the 5.12 >> kernel against the FreeBSD server did not find any issues.) >>=20 >> Hopefully the network folk have some insight w.r.t. why >> the TCP connection is sitting in FIN_WAIT2. >>=20 >> rick >>=20 >>=20 >>=20 >> Jason Breitman >>=20 >>=20 >>=20 >>=20 >>=20 >>=20 >> _______________________________________________ >> freebsd-net@freebsd.org mailing list >> https://lists.freebsd.org/mailman/listinfo/freebsd-net >> To unsubscribe, send any mail to = "freebsd-net-unsubscribe@freebsd.org" >>=20 >> _______________________________________________ >> freebsd-net@freebsd.org mailing list >> https://lists.freebsd.org/mailman/listinfo/freebsd-net >> To unsubscribe, send any mail to = "freebsd-net-unsubscribe@freebsd.org" >> _______________________________________________ >> freebsd-net@freebsd.org mailing list >> https://lists.freebsd.org/mailman/listinfo/freebsd-net >> To unsubscribe, send any mail to = "freebsd-net-unsubscribe@freebsd.org" >>=20 >=20 > --=20 > Rod Grimes = rgrimes@freebsd.org > _______________________________________________ > freebsd-net@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"