From owner-freebsd-net@freebsd.org  Wed Mar 17 23:16:03 2021
Return-Path: <owner-freebsd-net@freebsd.org>
Delivered-To: freebsd-net@mailman.nyi.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.nyi.freebsd.org (Postfix) with ESMTP id B045157C4FB
 for <freebsd-net@mailman.nyi.freebsd.org>;
 Wed, 17 Mar 2021 23:16:03 +0000 (UTC)
 (envelope-from jbreitman@tildenparkcapital.com)
Received: from us-smtp-delivery-145.mimecast.com
 (us-smtp-delivery-145.mimecast.com [63.128.21.145])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits))
 (Client CN "*.mimecast.com",
 Issuer "DigiCert TLS RSA SHA256 2020 CA1" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 4F15cg0FNjz3m3G
 for <freebsd-net@freebsd.org>; Wed, 17 Mar 2021 23:16:02 +0000 (UTC)
 (envelope-from jbreitman@tildenparkcapital.com)
Received: from zmcc-3-mta-2.zmailcloud.com (zmcc-3-mta-2.zmailcloud.com
 [35.238.170.66]) (Using TLS) by relay.mimecast.com with ESMTP id
 us-mta-464-1kXTYh_7MOCYl1amNkZuJQ-1; Wed, 17 Mar 2021 19:16:00 -0400
X-MC-Unique: 1kXTYh_7MOCYl1amNkZuJQ-1
Received: from zmcc-3-mta-2.zmailcloud.com (localhost [127.0.0.1])
 by zmcc-3-mta-2.zmailcloud.com (Postfix) with ESMTPS id BFA53E0D63;
 Wed, 17 Mar 2021 18:15:59 -0500 (CDT)
Received: from localhost (localhost [127.0.0.1])
 by zmcc-3-mta-2.zmailcloud.com (Postfix) with ESMTP id AF0E0E0E07;
 Wed, 17 Mar 2021 18:15:59 -0500 (CDT)
X-Virus-Scanned: amavisd-new at zmcc-3-mta-2.zmailcloud.com
Received: from zmcc-3-mta-2.zmailcloud.com ([127.0.0.1])
 by localhost (zmcc-3-mta-2.zmailcloud.com [127.0.0.1]) (amavisd-new,
 port 10026)
 with ESMTP id 7Vs3wg4aSBxf; Wed, 17 Mar 2021 18:15:59 -0500 (CDT)
Received: from jbreitman-mac.zxcvm.com (unknown [72.22.182.150])
 by zmcc-3-mta-2.zmailcloud.com (Postfix) with ESMTPSA id 72699E0D63;
 Wed, 17 Mar 2021 18:15:59 -0500 (CDT)
Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.4\))
Subject: Re: NFS Mount Hangs
From: Jason Breitman <jbreitman@tildenparkcapital.com>
In-Reply-To: <3CF50285-AD1F-4D0C-B298-0B6263B4AB45@lysator.liu.se>
Date: Wed, 17 Mar 2021 19:15:58 -0400
Cc: "freebsd-net@freebsd.org" <freebsd-net@freebsd.org>
Message-Id: <347EB906-A830-4E97-AC2E-328B6EA7E8B9@tildenparkcapital.com>
References: <C643BB9C-6B61-4DAC-8CF9-CE04EA7292D0@tildenparkcapital.com>
 <3750001D-3F1C-4D9A-A9D9-98BCA6CA65A4@tildenparkcapital.com>
 <33693DE3-7FF8-4FAB-9A75-75576B88A566@tildenparkcapital.com>
 <YQXPR0101MB0968DC18E00833DE2969C636DD6A9@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM>
 <CAOtMX2gQFMWbGKBzLcPW4zOBpQ3YR5=9DRpTyTDi2SC+hE8Ehw@mail.gmail.com>
 <YQXPR0101MB09681291684FC684A3319D2ADD6A9@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM>
 <789BCFA9-D6BC-4C5A-AEA2-E6F7C6E26CB5@tildenparkcapital.com>
 <3CF50285-AD1F-4D0C-B298-0B6263B4AB45@lysator.liu.se>
To: Peter Eriksson <pen@lysator.liu.se>
X-Mailer: Apple Mail (2.3608.120.23.2.4)
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: tildenparkcapital.com
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-Rspamd-Queue-Id: 4F15cg0FNjz3m3G
X-Spamd-Bar: --
Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none;
 spf=pass (mx1.freebsd.org: domain of jbreitman@tildenparkcapital.com
 designates 63.128.21.145 as permitted sender)
 smtp.mailfrom=jbreitman@tildenparkcapital.com
X-Spamd-Result: default: False [-2.90 / 15.00]; TO_DN_EQ_ADDR_SOME(0.00)[];
 RCVD_VIA_SMTP_AUTH(0.00)[]; RCVD_COUNT_FIVE(0.00)[6];
 MID_RHS_MATCH_FROM(0.00)[]; FROM_HAS_DN(0.00)[];
 TO_DN_SOME(0.00)[]; MV_CASE(0.50)[];
 R_SPF_ALLOW(-0.20)[+ip4:63.128.21.0/24];
 MIME_GOOD(-0.10)[text/plain];
 DMARC_NA(0.00)[tildenparkcapital.com: no valid DMARC record];
 ARC_NA(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000];
 RWL_MAILSPIKE_EXCELLENT(0.00)[63.128.21.145:from];
 TO_MATCH_ENVRCPT_SOME(0.00)[]; NEURAL_HAM_SHORT(-1.00)[-1.000];
 RCPT_COUNT_TWO(0.00)[2]; NEURAL_HAM_MEDIUM(-1.00)[-1.000];
 FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[];
 MIME_TRACE(0.00)[0:+];
 ASN(0.00)[asn:30031, ipnet:63.128.21.0/24, country:US];
 RCVD_TLS_LAST(0.00)[]; MAILMAN_DEST(0.00)[freebsd-net];
 RCVD_IN_DNSWL_LOW(-0.10)[63.128.21.145:from]
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.34
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Mar 2021 23:16:03 -0000

We are using the Intel Ethernet Network Adapter X722.

Jason Breitman


On Mar 17, 2021, at 6:48 PM, Peter Eriksson <pen@lysator.liu.se> wrote:

CLOSE_WAIT on the server side usually indicates that the kernel has sent th=
e ACK to the clients FIN (start of a shutdown) packet but hasn=E2=80=99t se=
nt it=E2=80=99s own FIN packet - something that usually happens when the se=
rver has read all data queued up from the client and taken what actions it =
need to shutdown down it=E2=80=99s service=E2=80=A6

Here=E2=80=99s a fine ASCII art. Probably needs to be viewed using a monosp=
aced font :-)

Client
> ESTABLISHED --> FIN-WAIT-1   +-----> FIN-WAIT-2   +-----> TIME-WAIT ---> =
CLOSED
>                     :        ^                    ^           :
>                 FIN :        : ACK            FIN :       ACK :
>                     v        :                    :           v
> ESTABLISHED         +--> CLOSE-WAIT --....---> LAST-ACK       +--------> =
CLOSED
Server


TSO/LRO and/or =E2=80=9Cintelligence=E2=80=9D in some smart network cards c=
an cause all kinds of interesting bugs. What ethernet cards are you using?
(TSO/LRO seems to be working better these days for our Intel X710 cards, bu=
t a couple of years ago they would freeze up on us so we had to disable it)

Hmm.. Perhaps the NFS server is waiting for some locks to be released befor=
e it can close down it=E2=80=99s end of the TCP link? Reservations?=20

But I=E2=80=99d suspect something else since we=E2=80=99ve been running NFS=
v4.1/Kerberos on our FreeBSD 11.3/12.2 servers for a long time with many Li=
nux clients and most issues (the last couple of years) we=E2=80=99ve seen h=
ave been on the Linux end of things=E2=80=A6 Like the bugs in the Linux gss=
 daemons or their single-threaded mount() sys call, or automounter freezing=
 up... and other fun bugs.

- Peter

> On 17 Mar 2021, at 23:17, Jason Breitman <jbreitman@tildenparkcapital.com=
> wrote:
>=20
> Thank you for the responses.
> The NFS Client does properly negotiate down to 128K for the rsize and wsi=
ze.
>=20
> The client port should be changing as we are using the noresvport option.
>=20
> On the NFS Client
> cat /proc/mounts
> nfs-server.domain.com:/data /mnt/data nfs4 rw,relatime,vers=3D4.1,rsize=
=3D131072,wsize=3D131072,namlen=3D255,hard,noresvport,proto=3Dtcp,timeo=3D6=
00,retrans=3D2,sec=3Dkrb5,clientaddr=3DNFS.Client.IP.X,lookupcache=3Dpos,lo=
cal_lock=3Dnone,addr=3DNFS.Server.IP.X 0 0
>=20
> When the issue occurs, this is what I see on the NFS Server.
> tcp4       0      0 NFS.Server.IP.X.2049      NFS.Client.IP.X.51550     C=
LOSE_WAIT =20
>=20
> Capturing packets right before the issue is a great idea, but I am concer=
ned about running tcpdump for such an extended period of time on an active =
server.
> I have gone 9 days with no issue which would be a lot of data and overhea=
d.
>=20
> I will look into disabling the TSO and LRO options and let the group know=
 how it goes.
> Below are the current options on the NFS Server.
> lagg0: flags=3D8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metri=
c 0 mtu 1500
> =09options=3De507bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_=
HWCSUM,TSO4,TSO6,LRO,VLAN_HWFILTER,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
>=20
> Please share other ideas if you have them.
>=20
> Jason Breitman
>=20
>=20
> On Mar 17, 2021, at 5:58 PM, Rick Macklem <rmacklem@uoguelph.ca> wrote:
>=20
> Alan Somers wrote:
> [stuff snipped]
>> Is the 128K limit related to MAXPHYS? If so, it should be greater in 13.=
0.
> For the client, yes. For the server, no.
> For the server, it is just a compile time constant NFS_SRVMAXIO.
>=20
> It's mainly related to the fact that I haven't gotten around to testing l=
arger
> sizes yet.
> - kern.ipc.maxsockbuf needs to be several times the limit, which means it=
 would
> have to increase for 1Mbyte.
> - The session code must negotiate a maximum RPC size > 1 Mbyte.
> (I think the server code does do this, but it needs to be tested.)
> And, yes, the client is limited to MAXPHYS.
>=20
> Doing this is on my todo list, rick
>=20
> The client should acquire the attributes that indicate that and set rsize=
/wsize
> to that. "# nfsstat -m" on the client should show you what the client
> is actually using. If it is larger than 128K, set both rsize and wsize to=
 128K.
>=20
>> Output from the NFS Client when the issue occurs
>> # netstat -an | grep NFS.Server.IP.X
>> tcp 0 0 NFS.Client.IP.X:46896 NFS.Server.IP.X:2049 FIN_WAIT2
> I'm no TCP guy. Hopefully others might know why the client would be
> stuck in FIN_WAIT2 (I vaguely recall this means it is waiting for a fin/a=
ck,
> but could be wrong?)
>=20
>> # cat /sys/kernel/debug/sunrpc/rpc_xprt/*/info
>> netid: tcp
>> addr: NFS.Server.IP.X
>> port: 2049
>> state: 0x51
>>=20
>> syslog
>> Mar 4 10:29:27 hostname kernel: [437414.131978] -pid- flgs status -clien=
t- --rqstp- ->timeout ---ops--
>> Mar 4 10:29:27 hostname kernel: [437414.133158] 57419 40a1 0 9b723c73 >1=
43cfadf 30000 4ca953b5 nfsv4 OPEN_NOATTR a:call_connect_status [sunrpc] >q:=
xprt_pending
> I don't know what OPEN_NOATTR means, but I assume it is some variant
> of NFSv4 Open operation.
> [stuff snipped]
>> Mar 4 10:29:30 hostname kernel: [437417.110517] RPC: 57419 xprt_connect_=
status: >connect attempt timed out
>> Mar 4 10:29:30 hostname kernel: [437417.112172] RPC: 57419 call_connect_=
status
>> (status -110)
> I have no idea what status -110 means?
>> Mar 4 10:29:30 hostname kernel: [437417.113337] RPC: 57419 call_timeout =
(major)
>> Mar 4 10:29:30 hostname kernel: [437417.114385] RPC: 57419 call_bind (st=
atus 0)
>> Mar 4 10:29:30 hostname kernel: [437417.115402] RPC: 57419 call_connect =
xprt >00000000e061831b is not connected
>> Mar 4 10:29:30 hostname kernel: [437417.116547] RPC: 57419 xprt_connect =
xprt >00000000e061831b is not connected
>> Mar 4 10:30:31 hostname kernel: [437478.551090] RPC: 57419 xprt_connect_=
status: >connect attempt timed out
>> Mar 4 10:30:31 hostname kernel: [437478.552396] RPC: 57419 call_connect_=
status >(status -110)
>> Mar 4 10:30:31 hostname kernel: [437478.553417] RPC: 57419 call_timeout =
(minor)
>> Mar 4 10:30:31 hostname kernel: [437478.554327] RPC: 57419 call_bind (st=
atus 0)
>> Mar 4 10:30:31 hostname kernel: [437478.555220] RPC: 57419 call_connect =
xprt >00000000e061831b is not connected
>> Mar 4 10:30:31 hostname kernel: [437478.556254] RPC: 57419 xprt_connect =
xprt >00000000e061831b is not connected
> Is it possible that the client is trying to (re)connect using the same cl=
ient port#?
> I would normally expect the client to create a new TCP connection using a
> different client port# and then retry the outstanding RPCs.
> --> Capturing packets when this happens would show us what is going on.
>=20
> If there is a problem on the FreeBSD end, it is most likely a broken
> network device driver.
> --> Try disabling TSO , LRO.
> --> Try a different driver for the net hardware on the server.
> --> Try a different net chip on the server.
> If you can capture packets when (not after) the hang
> occurs, then you can look at them in wireshark and see
> what is actually happening. (Ideally on both client and
> server, to check that your network hasn't dropped anything.)
> --> I know, if the hangs aren't easily reproducible, this isn't
> easily done.
> --> Try a newer Linux kernel and see if the problem persists.
> The Linux folk will get more interested if you can reproduce
> the problem on 5.12. (Recent bakeathon testing of the 5.12
> kernel against the FreeBSD server did not find any issues.)
>=20
> Hopefully the network folk have some insight w.r.t. why
> the TCP connection is sitting in FIN_WAIT2.
>=20
> rick
>=20
>=20
>=20
> Jason Breitman
>=20
>=20
>=20
>=20
>=20
>=20
> _______________________________________________
> freebsd-net@freebsd.org<mailto:freebsd-net@freebsd.org> mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org<mai=
lto:freebsd-net-unsubscribe@freebsd.org>"
>=20
> _______________________________________________
> freebsd-net@freebsd.org<mailto:freebsd-net@freebsd.org> mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org<mai=
lto:freebsd-net-unsubscribe@freebsd.org>"
>=20
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"