From owner-freebsd-net@freebsd.org  Sat Apr 10 16:12:44 2021
Return-Path: <owner-freebsd-net@freebsd.org>
Delivered-To: freebsd-net@mailman.nyi.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.nyi.freebsd.org (Postfix) with ESMTP id B33605D0EEC
 for <freebsd-net@mailman.nyi.freebsd.org>;
 Sat, 10 Apr 2021 16:12:44 +0000 (UTC)
 (envelope-from tuexen@freebsd.org)
Received: from drew.franken.de (drew.ipv6.franken.de
 [IPv6:2001:638:a02:a001:20e:cff:fe4a:feaa])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client CN "*.franken.de",
 Issuer "Sectigo RSA Domain Validation Secure Server CA" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 4FHg583wgHz3kdm
 for <freebsd-net@freebsd.org>; Sat, 10 Apr 2021 16:12:44 +0000 (UTC)
 (envelope-from tuexen@freebsd.org)
Received: from [IPv6:2a02:8109:1140:c3d:1507:c609:f682:ea59] (unknown
 [IPv6:2a02:8109:1140:c3d:1507:c609:f682:ea59])
 (Authenticated sender: macmic)
 by mail-n.franken.de (Postfix) with ESMTPSA id EB9F87058918C;
 Sat, 10 Apr 2021 18:12:40 +0200 (CEST)
Content-Type: text/plain;
	charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.60.0.2.21\))
Subject: Re: NFS Mount Hangs
From: tuexen@freebsd.org
In-Reply-To: <YQXPR0101MB096894FBD385DB9A42C1399FDD729@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM>
Date: Sat, 10 Apr 2021 18:12:40 +0200
Cc: "Scheffenegger, Richard" <Richard.Scheffenegger@netapp.com>,
 Youssef GHORBAL <youssef.ghorbal@pasteur.fr>,
 "freebsd-net@freebsd.org" <freebsd-net@freebsd.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <3980F368-098D-4EE4-B213-4113C2CAFE7D@freebsd.org>
References: <C643BB9C-6B61-4DAC-8CF9-CE04EA7292D0@tildenparkcapital.com>
 <3750001D-3F1C-4D9A-A9D9-98BCA6CA65A4@tildenparkcapital.com>
 <33693DE3-7FF8-4FAB-9A75-75576B88A566@tildenparkcapital.com>
 <D67AF317-D238-4EC0-8C7F-22D54AD5144C@pasteur.fr>
 <YQXPR0101MB09684AB7BEFA911213604467DD669@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM>
 <C87066D3-BBF1-44E1-8398-E4EB6903B0F2@tildenparkcapital.com>
 <8E745920-1092-4312-B251-B49D11FE8028@pasteur.fr>
 <YQXPR0101MB0968C44C7C82A3EB64F384D0DD7B9@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM>
 <DEF8564D-0FE9-4C2C-9F3B-9BCDD423377C@freebsd.org>
 <YQXPR0101MB0968E0A17D8BCACFAF132225DD7A9@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM>
 <SN4PR0601MB3728E392BCA494EAD49605FE86789@SN4PR0601MB3728.namprd06.prod.outlook.com>
 <YQXPR0101MB09686B4F921B96DCAFEBF874DD789@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM>
 <765CE1CD-6AAB-4BEF-97C6-C2A1F0FF4AC5@freebsd.org>
 <YQXPR0101MB096876B44F33BAD8991B62C8DD789@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM>
 <2B189169-C0C9-4DE6-A01A-BE916F10BABA@freebsd.org>
 <YQXPR0101MB09688645194907BBAA6E7C7ADD789@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM>
 <BF5D23D3-5DBD-4E29-9C6B-F4CCDC205353@freebsd.org>
 <YQXPR0101MB096826445C85921C8F6410A2DD779@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM>
 <E4A51EAD-8F9A-49BB-8852-F9D61BDD9EA4@freebsd.org>
 <YQXPR0101MB09682F230F25FBF3BC427135DD729@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM>
 <SN4PR0601MB3728AF2554FDDFB4EEF2C95B86729@SN4PR0601MB3728.namprd06.prod.outlook.com>
 <077ECE2B-A84C-440D-AAAB-00293C841F14@freebsd.org>
 <SN4PR0601MB37287855390FB8A989381CFE86729@SN4PR0601MB3728.namprd06.prod.outlook.com>
 <YQXPR0101MB096894FBD385DB9A42C1399FDD729@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM>
To: Rick Macklem <rmacklem@uoguelph.ca>
X-Mailer: Apple Mail (2.3654.60.0.2.21)
X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 URIBL_BLOCKED autolearn=disabled version=3.4.1
X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on mail-n.franken.de
X-Rspamd-Queue-Id: 4FHg583wgHz3kdm
X-Spamd-Bar: /
Authentication-Results: mx1.freebsd.org;
	none
X-Spamd-Result: default: False [0.00 / 15.00];
 ASN(0.00)[asn:680, ipnet:2001:638::/32, country:DE];
 local_wl_from(0.00)[freebsd.org]
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.34
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 10 Apr 2021 16:12:44 -0000

> On 10. Apr 2021, at 17:56, Rick Macklem <rmacklem@uoguelph.ca> wrote:
>=20
> Scheffenegger, Richard <Richard.Scheffenegger@netapp.com> wrote:
>>> Rick wrote:
>>> Hi Rick,
>>>=20
>>>> Well, I have some good news and some bad news (the bad is mostly =
for Richard).
>>>>=20
>>>> The only message logged is:
>>>> tcpflags 0x4<RST>; tcp_do_segment: Timestamp missing, segment =
processed normally
>>>>=20
> Btw, I did get one additional message during further testing (with =
r367492 reverted):
> tcpflags 0x4<RST>; syncache_chkrst: Our SYN|ACK was rejected, =
connection attempt aborted
>   by remote endpoint
>=20
> This only happened once of several test cycles.
That is OK.
>=20
>>>> But...the RST battle no longer occurs. Just one RST that works and =
then the SYN gets SYN,ACK'd by the FreeBSD end and off it goes...
>>>>=20
>>>> So, what is different?
>>>>=20
>>>> r367492 is reverted from the FreeBSD server.
>>>> I did the revert because I think it might be what otis@ hang is =
being caused by. (In his case, the Recv-Q grows on the socket for the =
stuck Linux client, while others work.
>>>>=20
>>>> Why does reverting fix this?
>>>> My only guess is that the krpc gets the upcall right away and sees =
a EPIPE when it does soreceive()->results in soshutdown(SHUT_WR).
> This was bogus and incorrect. The diagnostic printf() I saw was =
generated for the
> back channel, and that would have occurred after the socket was shut =
down.
>=20
>>>=20
>>> With r367492 you don't get the upcall with the same error state? Or =
you don't get an error on a write() call, when there should be one?
> If Send-Q is 0 when the network is partitioned, after healing, the =
krpc sees no activity on
> the socket (until it acquires/processes an RPC it will not do a =
sosend()).
> Without the 6minute timeout, the RST battle goes on "forever" (I've =
never actually
> waited more than 30minutes, which is close enough to "forever" for =
me).
> --> With the 6minute timeout, the "battle" stops after 6minutes, when =
the timeout
>      causes a soshutdown(..SHUT_WR) on the socket.
>      (Since the soshutdown() patch is not yet in "main". I got =
comments, but no "reviewed"
>       on it, the 6minute timer won't help if enabled in main. The =
soclose() won't happen
>       for TCP connections with the back channel enabled, such as Linux =
4.1/4.2 ones.)
I'm confused. So you are saying that if the Send-Q is empty when you =
partition the
network, and the peer starts to send SYNs after the healing, FreeBSD =
responds
with a challenge ACK which triggers the sending of a RST by Linux. This =
RST is
ignored multiple times.
Is that true? Even with my patch for the the bug I introduced?
What version of the kernel are you using?

Best regards
Michael
>=20
> If Send-Q is non-empty when the network is partitioned, the battle =
will not happen.
>=20
>>=20
>> My understanding is that he needs this error indication when calling =
shutdown().
> There are several ways the krpc notices that a TCP connection is no =
longer functional.
> - An error return like EPIPE from either sosend() or soreceive().
> - A return of 0 from soreceive() with no data (normal EOF from other =
end).
> - A 6minute timeout on the server end, when no activity has occurred =
on the
>  connection. This timer is currently disabled for NFSv4.1/4.2 mounts =
in "main",
>  but I enabled it for this testing, to stop the "RST battle goes on =
forever"
>  during testing. I am thinking of enabling it on "main", but this =
crude bandaid
>  shouldn't be thought of as a "fix for the RST battle".
>=20
>>>=20
>>> =46rom what you describe, this is on writes, isn't it? (I'm asking, =
at the original problem that was fixed with r367492, occurs in the read =
path (draining of ths so_rcv buffer in the upcall right away, which =
subsequently influences the ACK sent by the stack).
>>>=20
>>> I only added the so_snd buffer after some discussion, if the WAKESOR =
shouldn't have a symmetric equivalent on WAKESOW....
>>>=20
>>> Thus a partial backout (leaving the WAKESOR part inside, but =
reverting the WAKESOW part) would still fix my initial problem about =
erraneous DSACKs (which can also lead to extremely poor performance with =
Linux clients), but possible address this issue...
>>>=20
>>> Can you perhaps take MAIN and apply =
https://reviews.freebsd.org/D29690 for the revert only on the so_snd =
upcall?
> Since the krpc only uses receive upcalls, I don't see how reverting =
the send side would have
> any effect?
>=20
>> Since the release of 13.0 is almost done, can we try to fix the issue =
instead of reverting the commit?
> I think it has already shipped broken.
> I don't know if an errata is possible, or if it will be broken until =
13.1.
>=20
> --> I am much more concerned with the otis@ stuck client problem than =
this RST battle that only
>       occurs after a network partitioning, especially if it is 13.0 =
specific.
>       I did this testing to try to reproduce Jason's stuck client =
(with connection in CLOSE_WAIT)
>       problem, which I failed to reproduce.
>=20
> rick
>=20
> Rs: agree, a good understanding where the interaction btwn stack, =
socket and in kernel tcp user breaks is needed;
>=20
>>=20
>> If this doesn't help, some major surgery will be necessary to prevent =
NFS sessions with SACK enabled, to transmit DSACKs...
>=20
> My understanding is that the problem is related to getting a local =
error indication after
> receiving a RST segment too late or not at all.
>=20
> Rs: but the move of the upcall should not materially change that; i =
don=E2=80=99t have a pc here to see if any upcall actually happens on =
rst...
>=20
> Best regards
> Michael
>>=20
>>=20
>>> I know from a printf that this happened, but whether it caused the =
RST battle to not happen, I don't know.
>>>=20
>>> I can put r367492 back in and do more testing if you'd like, but I =
think it probably needs to be reverted?
>>=20
>> Please, I don't quite understand why the exact timing of the upcall =
would be that critical here...
>>=20
>> A comparison of the soxxx calls and errors between the "good" and the =
"bad" would be perfect. I don't know if this is easy to do though, as =
these calls appear to be scattered all around the RPC / NFS source =
paths.
>>=20
>>> This does not explain the original hung Linux client problem, but =
does shed light on the RST war I could create by doing a network =
partitioning.
>>>=20
>>> rick
>>=20
>> _______________________________________________
>> freebsd-net@freebsd.org mailing list
>> https://lists.freebsd.org/mailman/listinfo/freebsd-net
>> To unsubscribe, send any mail to =
"freebsd-net-unsubscribe@freebsd.org"
>=20
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"