Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 11 Apr 2021 22:49:49 +0000
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        "Scheffenegger, Richard" <Richard.Scheffenegger@netapp.com>, "tuexen@freebsd.org" <tuexen@freebsd.org>
Cc:        Youssef GHORBAL <youssef.ghorbal@pasteur.fr>, "freebsd-net@freebsd.org" <freebsd-net@freebsd.org>
Subject:   Re: NFS Mount Hangs
Message-ID:  <YQXPR0101MB0968D6FF1E1BAEA1C63E81D8DD719@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM>
In-Reply-To: <SN4PR0601MB3728AE4FA60C69E7D717FAAC86719@SN4PR0601MB3728.namprd06.prod.outlook.com>
References:  <C643BB9C-6B61-4DAC-8CF9-CE04EA7292D0@tildenparkcapital.com> <3750001D-3F1C-4D9A-A9D9-98BCA6CA65A4@tildenparkcapital.com> <33693DE3-7FF8-4FAB-9A75-75576B88A566@tildenparkcapital.com> <D67AF317-D238-4EC0-8C7F-22D54AD5144C@pasteur.fr> <YQXPR0101MB09684AB7BEFA911213604467DD669@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <C87066D3-BBF1-44E1-8398-E4EB6903B0F2@tildenparkcapital.com> <8E745920-1092-4312-B251-B49D11FE8028@pasteur.fr> <YQXPR0101MB0968C44C7C82A3EB64F384D0DD7B9@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <DEF8564D-0FE9-4C2C-9F3B-9BCDD423377C@freebsd.org> <YQXPR0101MB0968E0A17D8BCACFAF132225DD7A9@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <SN4PR0601MB3728E392BCA494EAD49605FE86789@SN4PR0601MB3728.namprd06.prod.outlook.com> <YQXPR0101MB09686B4F921B96DCAFEBF874DD789@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <765CE1CD-6AAB-4BEF-97C6-C2A1F0FF4AC5@freebsd.org> <YQXPR0101MB096876B44F33BAD8991B62C8DD789@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <2B189169-C0C9-4DE6-A01A-BE916F10BABA@freebsd.org> <YQXPR0101MB09688645194907BBAA6E7C7ADD789@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <BF5D23D3-5DBD-4E29-9C6B-F4CCDC205353@freebsd.org> <YQXPR0101MB096826445C85921C8F6410A2DD779@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <E4A51EAD-8F9A-49BB-8852-F9D61BDD9EA4@freebsd.org> <YQXPR0101MB09682F230F25FBF3BC427135DD729@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <SN4PR0601MB3728AF2554FDDFB4EEF2C95B86729@SN4PR0601MB3728.namprd06.prod.outlook.com> <077ECE2B-A84C-440D-AAAB-00293C841F14@freebsd.org> <SN4PR0601MB37287855390FB8A989381CFE86729@SN4PR0601MB3728.namprd06.prod.outlook.com> <YQXPR0101MB096894FBD385DB9A42C1399FDD729@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <3980F368-098D-4EE4-B213-4113C2CAFE7D@freebsd.org> <YQXPR0101MB0968359DC371C306EB462657DD729@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM>, <23F49FD9-A8B6-460F-9CD2-BBC3181A058F@freebsd.org>, <SN4PR0601MB3728AE4FA60C69E7D717FAAC86719@SN4PR0601MB3728.namprd06.prod.outlook.com>

next in thread | previous in thread | raw e-mail | index | archive | help
I should be able to test D69290 in about a week.
Note that I will not be able to tell if it fixes otis@'s
hung Linux client problem.

rick

________________________________________
From: Scheffenegger, Richard <Richard.Scheffenegger@netapp.com>
Sent: Sunday, April 11, 2021 12:54 PM
To: tuexen@freebsd.org; Rick Macklem
Cc: Youssef GHORBAL; freebsd-net@freebsd.org
Subject: Re: NFS Mount Hangs

CAUTION: This email originated from outside of the University of Guelph. Do=
 not click links or open attachments unless you recognize the sender and kn=
ow the content is safe. If in doubt, forward suspicious emails to IThelp@uo=
guelph.ca


>From what i understand rick stating around the socket state changing before=
 the upcall, i can only speculate that the rst fight is for the new sessios=
 the client tries with the same 5tuple, while server side the old original =
session persists, as the nfs server never closes /shutdown the session .

But a debug logged version of the socket upcall used by the nfs server shou=
ld reveal any differences in socket state at the time of upcall.

I would very much like to know if d29690 addresses that problem (if it was =
due to releasing the lock before the upcall), or if that still shows differ=
ences between prior to my central upcall change, post that change and with =
d29690 ...

________________________________
Von: tuexen@freebsd.org <tuexen@freebsd.org>
Gesendet: Sunday, April 11, 2021 2:30:09 PM
An: Rick Macklem <rmacklem@uoguelph.ca>
Cc: Scheffenegger, Richard <Richard.Scheffenegger@netapp.com>; Youssef GHOR=
BAL <youssef.ghorbal@pasteur.fr>; freebsd-net@freebsd.org <freebsd-net@free=
bsd.org>
Betreff: Re: NFS Mount Hangs

NetApp Security WARNING: This is an external email. Do not click links or o=
pen attachments unless you recognize the sender and know the content is saf=
e.




> On 10. Apr 2021, at 23:59, Rick Macklem <rmacklem@uoguelph.ca> wrote:
>
> tuexen@freebsd.org wrote:
>> Rick wrote:
> [stuff snipped]
>>>> With r367492 you don't get the upcall with the same error state? Or yo=
u don't get an error on a write() call, when there should be one?
>> If Send-Q is 0 when the network is partitioned, after healing, the krpc =
sees no activity on
>> the socket (until it acquires/processes an RPC it will not do a sosend()=
).
>> Without the 6minute timeout, the RST battle goes on "forever" (I've neve=
r actually
>> waited more than 30minutes, which is close enough to "forever" for me).
>> --> With the 6minute timeout, the "battle" stops after 6minutes, when th=
e timeout
>>     causes a soshutdown(..SHUT_WR) on the socket.
>>     (Since the soshutdown() patch is not yet in "main". I got comments, =
but no "reviewed"
>>      on it, the 6minute timer won't help if enabled in main. The soclose=
() won't happen
>>      for TCP connections with the back channel enabled, such as Linux 4.=
1/4.2 ones.)
>> I'm confused. So you are saying that if the Send-Q is empty when you par=
tition the
>> network, and the peer starts to send SYNs after the healing, FreeBSD res=
ponds
>> with a challenge ACK which triggers the sending of a RST by Linux. This =
RST is
>> ignored multiple times.
>> Is that true? Even with my patch for the the bug I introduced?
> Yes and yes.
> Go take another look at linuxtofreenfs.pcap
> ("fetch https://people.freebsd.org/~rmacklem/linuxtofreenfs.pcap" if you =
don't
>  already have it.)
> Look at packet #1949->2069. I use wireshark, but you'll have your favouri=
te.
> You'll see the "RST battle" that ends after
> 6minutes at packet#2069. If there is no 6minute timeout enabled in the
> server side krpc, then the battle just continues (I once let it run for a=
bout
> 30minutes before giving up). The 6minute timeout is not currently enabled
> in main, etc.
Hmm. I don't understand why r367492 can impact the processing of the RST, w=
hich
basically destroys the TCP connection.

Richard: Can you explain that?

Best regards
Michael
>
>> What version of the kernel are you using?
> "main" dated Dec. 23, 2020 + your bugfix + assorted NFS patches that
> are not relevant + 2 small krpc related patches.
> --> The two small krpc related patches enable the 6minute timeout and
>       add a soshutdown(..SHUT_WR) call when the 6minute timeout is
>       triggered. These have no effect until the 6minutes is up and, witho=
ut
>       them the "RTS battle" goes on forever.
>
> Add to the above a revert of r367492 and the RST battle goes away and thi=
ngs
> behave as expected. The recovery happens quickly after the network is
> unpartitioned, with either 0 or 1 RSTs.
>
> rick
> ps: Once the irrelevant NFS patches make it into "main", I will upgrade t=
o
>     main bits-de-jur for testing.
>
> Best regards
> Michael
>>
>> If Send-Q is non-empty when the network is partitioned, the battle will =
not happen.
>>
>>>
>>> My understanding is that he needs this error indication when calling sh=
utdown().
>> There are several ways the krpc notices that a TCP connection is no long=
er functional.
>> - An error return like EPIPE from either sosend() or soreceive().
>> - A return of 0 from soreceive() with no data (normal EOF from other end=
).
>> - A 6minute timeout on the server end, when no activity has occurred on =
the
>> connection. This timer is currently disabled for NFSv4.1/4.2 mounts in "=
main",
>> but I enabled it for this testing, to stop the "RST battle goes on forev=
er"
>> during testing. I am thinking of enabling it on "main", but this crude b=
andaid
>> shouldn't be thought of as a "fix for the RST battle".
>>
>>>>
>>>> From what you describe, this is on writes, isn't it? (I'm asking, at t=
he original problem that was fixed with r367492, occurs in the read path (d=
raining of ths so_rcv buffer in the upcall right away, which subsequently i=
nfluences the ACK sent by the stack).
>>>>
>>>> I only added the so_snd buffer after some discussion, if the WAKESOR s=
houldn't have a symmetric equivalent on WAKESOW....
>>>>
>>>> Thus a partial backout (leaving the WAKESOR part inside, but reverting=
 the WAKESOW part) would still fix my initial problem about erraneous DSACK=
s (which can also lead to extremely poor performance with Linux clients), b=
ut possible address this issue...
>>>>
>>>> Can you perhaps take MAIN and apply https://reviews.freebsd.org/D29690=
 for the revert only on the so_snd upcall?
>> Since the krpc only uses receive upcalls, I don't see how reverting the =
send side would have
>> any effect?
>>
>>> Since the release of 13.0 is almost done, can we try to fix the issue i=
nstead of reverting the commit?
>> I think it has already shipped broken.
>> I don't know if an errata is possible, or if it will be broken until 13.=
1.
>>
>> --> I am much more concerned with the otis@ stuck client problem than th=
is RST battle that only
>>      occurs after a network partitioning, especially if it is 13.0 speci=
fic.
>>      I did this testing to try to reproduce Jason's stuck client (with c=
onnection in CLOSE_WAIT)
>>      problem, which I failed to reproduce.
>>
>> rick
>>
>> Rs: agree, a good understanding where the interaction btwn stack, socket=
 and in kernel tcp user breaks is needed;
>>
>>>
>>> If this doesn't help, some major surgery will be necessary to prevent N=
FS sessions with SACK enabled, to transmit DSACKs...
>>
>> My understanding is that the problem is related to getting a local error=
 indication after
>> receiving a RST segment too late or not at all.
>>
>> Rs: but the move of the upcall should not materially change that; i don=
=92t have a pc here to see if any upcall actually happens on rst...
>>
>> Best regards
>> Michael
>>>
>>>
>>>> I know from a printf that this happened, but whether it caused the RST=
 battle to not happen, I don't know.
>>>>
>>>> I can put r367492 back in and do more testing if you'd like, but I thi=
nk it probably needs to be reverted?
>>>
>>> Please, I don't quite understand why the exact timing of the upcall wou=
ld be that critical here...
>>>
>>> A comparison of the soxxx calls and errors between the "good" and the "=
bad" would be perfect. I don't know if this is easy to do though, as these =
calls appear to be scattered all around the RPC / NFS source paths.
>>>
>>>> This does not explain the original hung Linux client problem, but does=
 shed light on the RST war I could create by doing a network partitioning.
>>>>
>>>> rick
>>>
>>> _______________________________________________
>>> freebsd-net@freebsd.org mailing list
>>> https://lists.freebsd.org/mailman/listinfo/freebsd-net
>>> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"
>>
>> _______________________________________________
>> freebsd-net@freebsd.org mailing list
>> https://lists.freebsd.org/mailman/listinfo/freebsd-net
>> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"
>
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?YQXPR0101MB0968D6FF1E1BAEA1C63E81D8DD719>