Date: Sat, 10 Apr 2021 18:12:40 +0200 From: tuexen@freebsd.org To: Rick Macklem <rmacklem@uoguelph.ca> Cc: "Scheffenegger, Richard" <Richard.Scheffenegger@netapp.com>, Youssef GHORBAL <youssef.ghorbal@pasteur.fr>, "freebsd-net@freebsd.org" <freebsd-net@freebsd.org> Subject: Re: NFS Mount Hangs Message-ID: <3980F368-098D-4EE4-B213-4113C2CAFE7D@freebsd.org> In-Reply-To: <YQXPR0101MB096894FBD385DB9A42C1399FDD729@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> References: <C643BB9C-6B61-4DAC-8CF9-CE04EA7292D0@tildenparkcapital.com> <3750001D-3F1C-4D9A-A9D9-98BCA6CA65A4@tildenparkcapital.com> <33693DE3-7FF8-4FAB-9A75-75576B88A566@tildenparkcapital.com> <D67AF317-D238-4EC0-8C7F-22D54AD5144C@pasteur.fr> <YQXPR0101MB09684AB7BEFA911213604467DD669@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <C87066D3-BBF1-44E1-8398-E4EB6903B0F2@tildenparkcapital.com> <8E745920-1092-4312-B251-B49D11FE8028@pasteur.fr> <YQXPR0101MB0968C44C7C82A3EB64F384D0DD7B9@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <DEF8564D-0FE9-4C2C-9F3B-9BCDD423377C@freebsd.org> <YQXPR0101MB0968E0A17D8BCACFAF132225DD7A9@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <SN4PR0601MB3728E392BCA494EAD49605FE86789@SN4PR0601MB3728.namprd06.prod.outlook.com> <YQXPR0101MB09686B4F921B96DCAFEBF874DD789@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <765CE1CD-6AAB-4BEF-97C6-C2A1F0FF4AC5@freebsd.org> <YQXPR0101MB096876B44F33BAD8991B62C8DD789@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <2B189169-C0C9-4DE6-A01A-BE916F10BABA@freebsd.org> <YQXPR0101MB09688645194907BBAA6E7C7ADD789@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <BF5D23D3-5DBD-4E29-9C6B-F4CCDC205353@freebsd.org> <YQXPR0101MB096826445C85921C8F6410A2DD779@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <E4A51EAD-8F9A-49BB-8852-F9D61BDD9EA4@freebsd.org> <YQXPR0101MB09682F230F25FBF3BC427135DD729@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <SN4PR0601MB3728AF2554FDDFB4EEF2C95B86729@SN4PR0601MB3728.namprd06.prod.outlook.com> <077ECE2B-A84C-440D-AAAB-00293C841F14@freebsd.org> <SN4PR0601MB37287855390FB8A989381CFE86729@SN4PR0601MB3728.namprd06.prod.outlook.com> <YQXPR0101MB096894FBD385DB9A42C1399FDD729@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM>
next in thread | previous in thread | raw e-mail | index | archive | help
> On 10. Apr 2021, at 17:56, Rick Macklem <rmacklem@uoguelph.ca> wrote: >=20 > Scheffenegger, Richard <Richard.Scheffenegger@netapp.com> wrote: >>> Rick wrote: >>> Hi Rick, >>>=20 >>>> Well, I have some good news and some bad news (the bad is mostly = for Richard). >>>>=20 >>>> The only message logged is: >>>> tcpflags 0x4<RST>; tcp_do_segment: Timestamp missing, segment = processed normally >>>>=20 > Btw, I did get one additional message during further testing (with = r367492 reverted): > tcpflags 0x4<RST>; syncache_chkrst: Our SYN|ACK was rejected, = connection attempt aborted > by remote endpoint >=20 > This only happened once of several test cycles. That is OK. >=20 >>>> But...the RST battle no longer occurs. Just one RST that works and = then the SYN gets SYN,ACK'd by the FreeBSD end and off it goes... >>>>=20 >>>> So, what is different? >>>>=20 >>>> r367492 is reverted from the FreeBSD server. >>>> I did the revert because I think it might be what otis@ hang is = being caused by. (In his case, the Recv-Q grows on the socket for the = stuck Linux client, while others work. >>>>=20 >>>> Why does reverting fix this? >>>> My only guess is that the krpc gets the upcall right away and sees = a EPIPE when it does soreceive()->results in soshutdown(SHUT_WR). > This was bogus and incorrect. The diagnostic printf() I saw was = generated for the > back channel, and that would have occurred after the socket was shut = down. >=20 >>>=20 >>> With r367492 you don't get the upcall with the same error state? Or = you don't get an error on a write() call, when there should be one? > If Send-Q is 0 when the network is partitioned, after healing, the = krpc sees no activity on > the socket (until it acquires/processes an RPC it will not do a = sosend()). > Without the 6minute timeout, the RST battle goes on "forever" (I've = never actually > waited more than 30minutes, which is close enough to "forever" for = me). > --> With the 6minute timeout, the "battle" stops after 6minutes, when = the timeout > causes a soshutdown(..SHUT_WR) on the socket. > (Since the soshutdown() patch is not yet in "main". I got = comments, but no "reviewed" > on it, the 6minute timer won't help if enabled in main. The = soclose() won't happen > for TCP connections with the back channel enabled, such as Linux = 4.1/4.2 ones.) I'm confused. So you are saying that if the Send-Q is empty when you = partition the network, and the peer starts to send SYNs after the healing, FreeBSD = responds with a challenge ACK which triggers the sending of a RST by Linux. This = RST is ignored multiple times. Is that true? Even with my patch for the the bug I introduced? What version of the kernel are you using? Best regards Michael >=20 > If Send-Q is non-empty when the network is partitioned, the battle = will not happen. >=20 >>=20 >> My understanding is that he needs this error indication when calling = shutdown(). > There are several ways the krpc notices that a TCP connection is no = longer functional. > - An error return like EPIPE from either sosend() or soreceive(). > - A return of 0 from soreceive() with no data (normal EOF from other = end). > - A 6minute timeout on the server end, when no activity has occurred = on the > connection. This timer is currently disabled for NFSv4.1/4.2 mounts = in "main", > but I enabled it for this testing, to stop the "RST battle goes on = forever" > during testing. I am thinking of enabling it on "main", but this = crude bandaid > shouldn't be thought of as a "fix for the RST battle". >=20 >>>=20 >>> =46rom what you describe, this is on writes, isn't it? (I'm asking, = at the original problem that was fixed with r367492, occurs in the read = path (draining of ths so_rcv buffer in the upcall right away, which = subsequently influences the ACK sent by the stack). >>>=20 >>> I only added the so_snd buffer after some discussion, if the WAKESOR = shouldn't have a symmetric equivalent on WAKESOW.... >>>=20 >>> Thus a partial backout (leaving the WAKESOR part inside, but = reverting the WAKESOW part) would still fix my initial problem about = erraneous DSACKs (which can also lead to extremely poor performance with = Linux clients), but possible address this issue... >>>=20 >>> Can you perhaps take MAIN and apply = https://reviews.freebsd.org/D29690 for the revert only on the so_snd = upcall? > Since the krpc only uses receive upcalls, I don't see how reverting = the send side would have > any effect? >=20 >> Since the release of 13.0 is almost done, can we try to fix the issue = instead of reverting the commit? > I think it has already shipped broken. > I don't know if an errata is possible, or if it will be broken until = 13.1. >=20 > --> I am much more concerned with the otis@ stuck client problem than = this RST battle that only > occurs after a network partitioning, especially if it is 13.0 = specific. > I did this testing to try to reproduce Jason's stuck client = (with connection in CLOSE_WAIT) > problem, which I failed to reproduce. >=20 > rick >=20 > Rs: agree, a good understanding where the interaction btwn stack, = socket and in kernel tcp user breaks is needed; >=20 >>=20 >> If this doesn't help, some major surgery will be necessary to prevent = NFS sessions with SACK enabled, to transmit DSACKs... >=20 > My understanding is that the problem is related to getting a local = error indication after > receiving a RST segment too late or not at all. >=20 > Rs: but the move of the upcall should not materially change that; i = don=E2=80=99t have a pc here to see if any upcall actually happens on = rst... >=20 > Best regards > Michael >>=20 >>=20 >>> I know from a printf that this happened, but whether it caused the = RST battle to not happen, I don't know. >>>=20 >>> I can put r367492 back in and do more testing if you'd like, but I = think it probably needs to be reverted? >>=20 >> Please, I don't quite understand why the exact timing of the upcall = would be that critical here... >>=20 >> A comparison of the soxxx calls and errors between the "good" and the = "bad" would be perfect. I don't know if this is easy to do though, as = these calls appear to be scattered all around the RPC / NFS source = paths. >>=20 >>> This does not explain the original hung Linux client problem, but = does shed light on the RST war I could create by doing a network = partitioning. >>>=20 >>> rick >>=20 >> _______________________________________________ >> freebsd-net@freebsd.org mailing list >> https://lists.freebsd.org/mailman/listinfo/freebsd-net >> To unsubscribe, send any mail to = "freebsd-net-unsubscribe@freebsd.org" >=20 > _______________________________________________ > freebsd-net@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3980F368-098D-4EE4-B213-4113C2CAFE7D>