Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 17 Mar 2021 21:58:25 +0000
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        Alan Somers <asomers@freebsd.org>
Cc:        Jason Breitman <jbreitman@tildenparkcapital.com>, "freebsd-net@freebsd.org" <freebsd-net@freebsd.org>
Subject:   Re: NFS Mount Hangs
Message-ID:  <YQXPR0101MB09681291684FC684A3319D2ADD6A9@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM>
In-Reply-To: <CAOtMX2gQFMWbGKBzLcPW4zOBpQ3YR5=9DRpTyTDi2SC%2BhE8Ehw@mail.gmail.com>
References:  <C643BB9C-6B61-4DAC-8CF9-CE04EA7292D0@tildenparkcapital.com> <3750001D-3F1C-4D9A-A9D9-98BCA6CA65A4@tildenparkcapital.com> <33693DE3-7FF8-4FAB-9A75-75576B88A566@tildenparkcapital.com> <YQXPR0101MB0968DC18E00833DE2969C636DD6A9@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM>, <CAOtMX2gQFMWbGKBzLcPW4zOBpQ3YR5=9DRpTyTDi2SC%2BhE8Ehw@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Alan Somers wrote:=0A=
[stuff snipped]=0A=
>Is the 128K limit related to MAXPHYS?  If so, it should be greater in 13.0=
.=0A=
For the client, yes. For the server, no.=0A=
For the server, it is just a compile time constant NFS_SRVMAXIO.=0A=
=0A=
It's mainly related to the fact that I haven't gotten around to testing lar=
ger=0A=
sizes yet.=0A=
- kern.ipc.maxsockbuf needs to be several times the limit, which means it w=
ould=0A=
   have to increase for 1Mbyte.=0A=
- The session code must negotiate a maximum RPC size > 1 Mbyte.=0A=
   (I think the server code does do this, but it needs to be tested.)=0A=
And, yes, the client is limited to MAXPHYS.=0A=
=0A=
Doing this is on my todo list, rick=0A=
=0A=
The client should acquire the attributes that indicate that and set rsize/w=
size=0A=
to that. "# nfsstat -m" on the client should show you what the client=0A=
is actually using. If it is larger than 128K, set both rsize and wsize to 1=
28K.=0A=
=0A=
>Output from the NFS Client when the issue occurs=0A=
># netstat -an | grep NFS.Server.IP.X=0A=
>tcp        0      0 NFS.Client.IP.X:46896      NFS.Server.IP.X:2049       =
FIN_WAIT2=0A=
I'm no TCP guy. Hopefully others might know why the client would be=0A=
stuck in FIN_WAIT2 (I vaguely recall this means it is waiting for a fin/ack=
,=0A=
but could be wrong?)=0A=
=0A=
># cat /sys/kernel/debug/sunrpc/rpc_xprt/*/info=0A=
>netid: tcp=0A=
>addr:  NFS.Server.IP.X=0A=
>port:  2049=0A=
>state: 0x51=0A=
>=0A=
>syslog=0A=
>Mar  4 10:29:27 hostname kernel: [437414.131978] -pid- flgs status -client=
- --rqstp- ->timeout ---ops--=0A=
>Mar  4 10:29:27 hostname kernel: [437414.133158] 57419 40a1      0 9b723c7=
3 >143cfadf    30000 4ca953b5 nfsv4 OPEN_NOATTR a:call_connect_status [sunr=
pc] >q:xprt_pending=0A=
I don't know what OPEN_NOATTR means, but I assume it is some variant=0A=
of NFSv4 Open operation.=0A=
[stuff snipped]=0A=
>Mar  4 10:29:30 hostname kernel: [437417.110517] RPC: 57419 xprt_connect_s=
tatus: >connect attempt timed out=0A=
>Mar  4 10:29:30 hostname kernel: [437417.112172] RPC: 57419 call_connect_s=
tatus=0A=
>(status -110)=0A=
I have no idea what status -110 means?=0A=
>Mar  4 10:29:30 hostname kernel: [437417.113337] RPC: 57419 call_timeout (=
major)=0A=
>Mar  4 10:29:30 hostname kernel: [437417.114385] RPC: 57419 call_bind (sta=
tus 0)=0A=
>Mar  4 10:29:30 hostname kernel: [437417.115402] RPC: 57419 call_connect x=
prt >00000000e061831b is not connected=0A=
>Mar  4 10:29:30 hostname kernel: [437417.116547] RPC: 57419 xprt_connect x=
prt >00000000e061831b is not connected=0A=
>Mar  4 10:30:31 hostname kernel: [437478.551090] RPC: 57419 xprt_connect_s=
tatus: >connect attempt timed out=0A=
>Mar  4 10:30:31 hostname kernel: [437478.552396] RPC: 57419 call_connect_s=
tatus >(status -110)=0A=
>Mar  4 10:30:31 hostname kernel: [437478.553417] RPC: 57419 call_timeout (=
minor)=0A=
>Mar  4 10:30:31 hostname kernel: [437478.554327] RPC: 57419 call_bind (sta=
tus 0)=0A=
>Mar  4 10:30:31 hostname kernel: [437478.555220] RPC: 57419 call_connect x=
prt >00000000e061831b is not connected=0A=
>Mar  4 10:30:31 hostname kernel: [437478.556254] RPC: 57419 xprt_connect x=
prt >00000000e061831b is not connected=0A=
Is it possible that the client is trying to (re)connect using the same clie=
nt port#?=0A=
I would normally expect the client to create a new TCP connection using a=
=0A=
different client port# and then retry the outstanding RPCs.=0A=
--> Capturing packets when this happens would show us what is going on.=0A=
=0A=
If there is a problem on the FreeBSD end, it is most likely a broken=0A=
network device driver.=0A=
--> Try disabling TSO , LRO.=0A=
--> Try a different driver for the net hardware on the server.=0A=
--> Try a different net chip on the server.=0A=
If you can capture packets when (not after) the hang=0A=
occurs, then you can look at them in wireshark and see=0A=
what is actually happening. (Ideally on both client and=0A=
server, to check that your network hasn't dropped anything.)=0A=
--> I know, if the hangs aren't easily reproducible, this isn't=0A=
    easily done.=0A=
--> Try a newer Linux kernel and see if the problem persists.=0A=
     The Linux folk will get more interested if you can reproduce=0A=
      the problem on 5.12. (Recent bakeathon testing of the 5.12=0A=
      kernel against the FreeBSD server did not find any issues.)=0A=
=0A=
Hopefully the network folk have some insight w.r.t. why=0A=
the TCP connection is sitting in FIN_WAIT2.=0A=
=0A=
rick=0A=
=0A=
=0A=
=0A=
Jason Breitman=0A=
=0A=
=0A=
=0A=
=0A=
=0A=
=0A=
_______________________________________________=0A=
freebsd-net@freebsd.org<mailto:freebsd-net@freebsd.org> mailing list=0A=
https://lists.freebsd.org/mailman/listinfo/freebsd-net=0A=
To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org<mailt=
o:freebsd-net-unsubscribe@freebsd.org>"=0A=
=0A=
_______________________________________________=0A=
freebsd-net@freebsd.org<mailto:freebsd-net@freebsd.org> mailing list=0A=
https://lists.freebsd.org/mailman/listinfo/freebsd-net=0A=
To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org<mailt=
o:freebsd-net-unsubscribe@freebsd.org>"=0A=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?YQXPR0101MB09681291684FC684A3319D2ADD6A9>