Date: Sat, 10 Apr 2021 21:59:51 +0000 From: Rick Macklem <rmacklem@uoguelph.ca> To: "tuexen@freebsd.org" <tuexen@freebsd.org> Cc: "Scheffenegger, Richard" <Richard.Scheffenegger@netapp.com>, Youssef GHORBAL <youssef.ghorbal@pasteur.fr>, "freebsd-net@freebsd.org" <freebsd-net@freebsd.org> Subject: Re: NFS Mount Hangs Message-ID: <YQXPR0101MB0968359DC371C306EB462657DD729@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> In-Reply-To: <3980F368-098D-4EE4-B213-4113C2CAFE7D@freebsd.org> References: <C643BB9C-6B61-4DAC-8CF9-CE04EA7292D0@tildenparkcapital.com> <3750001D-3F1C-4D9A-A9D9-98BCA6CA65A4@tildenparkcapital.com> <33693DE3-7FF8-4FAB-9A75-75576B88A566@tildenparkcapital.com> <D67AF317-D238-4EC0-8C7F-22D54AD5144C@pasteur.fr> <YQXPR0101MB09684AB7BEFA911213604467DD669@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <C87066D3-BBF1-44E1-8398-E4EB6903B0F2@tildenparkcapital.com> <8E745920-1092-4312-B251-B49D11FE8028@pasteur.fr> <YQXPR0101MB0968C44C7C82A3EB64F384D0DD7B9@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <DEF8564D-0FE9-4C2C-9F3B-9BCDD423377C@freebsd.org> <YQXPR0101MB0968E0A17D8BCACFAF132225DD7A9@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <SN4PR0601MB3728E392BCA494EAD49605FE86789@SN4PR0601MB3728.namprd06.prod.outlook.com> <YQXPR0101MB09686B4F921B96DCAFEBF874DD789@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <765CE1CD-6AAB-4BEF-97C6-C2A1F0FF4AC5@freebsd.org> <YQXPR0101MB096876B44F33BAD8991B62C8DD789@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <2B189169-C0C9-4DE6-A01A-BE916F10BABA@freebsd.org> <YQXPR0101MB09688645194907BBAA6E7C7ADD789@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <BF5D23D3-5DBD-4E29-9C6B-F4CCDC205353@freebsd.org> <YQXPR0101MB096826445C85921C8F6410A2DD779@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <E4A51EAD-8F9A-49BB-8852-F9D61BDD9EA4@freebsd.org> <YQXPR0101MB09682F230F25FBF3BC427135DD729@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <SN4PR0601MB3728AF2554FDDFB4EEF2C95B86729@SN4PR0601MB3728.namprd06.prod.outlook.com> <077ECE2B-A84C-440D-AAAB-00293C841F14@freebsd.org> <SN4PR0601MB37287855390FB8A989381CFE86729@SN4PR0601MB3728.namprd06.prod.outlook.com> <YQXPR0101MB096894FBD385DB9A42C1399FDD729@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM>, <3980F368-098D-4EE4-B213-4113C2CAFE7D@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
tuexen@freebsd.org wrote:=0A= >Rick wrote:=0A= [stuff snipped]=0A= >>> With r367492 you don't get the upcall with the same error state? Or you= don't get an error on a write() call, when there should be one?=0A= > If Send-Q is 0 when the network is partitioned, after healing, the krpc s= ees no activity on=0A= > the socket (until it acquires/processes an RPC it will not do a sosend())= .=0A= > Without the 6minute timeout, the RST battle goes on "forever" (I've never= actually=0A= > waited more than 30minutes, which is close enough to "forever" for me).= =0A= > --> With the 6minute timeout, the "battle" stops after 6minutes, when the= timeout=0A= > causes a soshutdown(..SHUT_WR) on the socket.=0A= > (Since the soshutdown() patch is not yet in "main". I got comments, = but no "reviewed"=0A= > on it, the 6minute timer won't help if enabled in main. The soclose= () won't happen=0A= > for TCP connections with the back channel enabled, such as Linux 4.= 1/4.2 ones.)=0A= >I'm confused. So you are saying that if the Send-Q is empty when you parti= tion the=0A= >network, and the peer starts to send SYNs after the healing, FreeBSD respo= nds=0A= >with a challenge ACK which triggers the sending of a RST by Linux. This RS= T is=0A= >ignored multiple times.=0A= >Is that true? Even with my patch for the the bug I introduced?=0A= Yes and yes.=0A= Go take another look at linuxtofreenfs.pcap=0A= ("fetch https://people.freebsd.org/~rmacklem/linuxtofreenfs.pcap" if you do= n't=0A= already have it.)=0A= Look at packet #1949->2069. I use wireshark, but you'll have your favourite= .=0A= You'll see the "RST battle" that ends after=0A= 6minutes at packet#2069. If there is no 6minute timeout enabled in the=0A= server side krpc, then the battle just continues (I once let it run for abo= ut=0A= 30minutes before giving up). The 6minute timeout is not currently enabled= =0A= in main, etc.=0A= =0A= >What version of the kernel are you using?=0A= "main" dated Dec. 23, 2020 + your bugfix + assorted NFS patches that=0A= are not relevant + 2 small krpc related patches.=0A= --> The two small krpc related patches enable the 6minute timeout and=0A= add a soshutdown(..SHUT_WR) call when the 6minute timeout is=0A= triggered. These have no effect until the 6minutes is up and, withou= t=0A= them the "RTS battle" goes on forever.=0A= =0A= Add to the above a revert of r367492 and the RST battle goes away and thing= s=0A= behave as expected. The recovery happens quickly after the network is=0A= unpartitioned, with either 0 or 1 RSTs.=0A= =0A= rick=0A= ps: Once the irrelevant NFS patches make it into "main", I will upgrade to= =0A= main bits-de-jur for testing.=0A= =0A= Best regards=0A= Michael=0A= >=0A= > If Send-Q is non-empty when the network is partitioned, the battle will n= ot happen.=0A= >=0A= >>=0A= >> My understanding is that he needs this error indication when calling shu= tdown().=0A= > There are several ways the krpc notices that a TCP connection is no longe= r functional.=0A= > - An error return like EPIPE from either sosend() or soreceive().=0A= > - A return of 0 from soreceive() with no data (normal EOF from other end)= .=0A= > - A 6minute timeout on the server end, when no activity has occurred on t= he=0A= > connection. This timer is currently disabled for NFSv4.1/4.2 mounts in "= main",=0A= > but I enabled it for this testing, to stop the "RST battle goes on forev= er"=0A= > during testing. I am thinking of enabling it on "main", but this crude b= andaid=0A= > shouldn't be thought of as a "fix for the RST battle".=0A= >=0A= >>>=0A= >>> From what you describe, this is on writes, isn't it? (I'm asking, at th= e original problem that was fixed with r367492, occurs in the read path (dr= aining of ths so_rcv buffer in the upcall right away, which subsequently in= fluences the ACK sent by the stack).=0A= >>>=0A= >>> I only added the so_snd buffer after some discussion, if the WAKESOR sh= ouldn't have a symmetric equivalent on WAKESOW....=0A= >>>=0A= >>> Thus a partial backout (leaving the WAKESOR part inside, but reverting = the WAKESOW part) would still fix my initial problem about erraneous DSACKs= (which can also lead to extremely poor performance with Linux clients), bu= t possible address this issue...=0A= >>>=0A= >>> Can you perhaps take MAIN and apply https://reviews.freebsd.org/D29690 = for the revert only on the so_snd upcall?=0A= > Since the krpc only uses receive upcalls, I don't see how reverting the s= end side would have=0A= > any effect?=0A= >=0A= >> Since the release of 13.0 is almost done, can we try to fix the issue in= stead of reverting the commit?=0A= > I think it has already shipped broken.=0A= > I don't know if an errata is possible, or if it will be broken until 13.1= .=0A= >=0A= > --> I am much more concerned with the otis@ stuck client problem than thi= s RST battle that only=0A= > occurs after a network partitioning, especially if it is 13.0 speci= fic.=0A= > I did this testing to try to reproduce Jason's stuck client (with c= onnection in CLOSE_WAIT)=0A= > problem, which I failed to reproduce.=0A= >=0A= > rick=0A= >=0A= > Rs: agree, a good understanding where the interaction btwn stack, socket = and in kernel tcp user breaks is needed;=0A= >=0A= >>=0A= >> If this doesn't help, some major surgery will be necessary to prevent NF= S sessions with SACK enabled, to transmit DSACKs...=0A= >=0A= > My understanding is that the problem is related to getting a local error = indication after=0A= > receiving a RST segment too late or not at all.=0A= >=0A= > Rs: but the move of the upcall should not materially change that; i don= =92t have a pc here to see if any upcall actually happens on rst...=0A= >=0A= > Best regards=0A= > Michael=0A= >>=0A= >>=0A= >>> I know from a printf that this happened, but whether it caused the RST = battle to not happen, I don't know.=0A= >>>=0A= >>> I can put r367492 back in and do more testing if you'd like, but I thin= k it probably needs to be reverted?=0A= >>=0A= >> Please, I don't quite understand why the exact timing of the upcall woul= d be that critical here...=0A= >>=0A= >> A comparison of the soxxx calls and errors between the "good" and the "b= ad" would be perfect. I don't know if this is easy to do though, as these c= alls appear to be scattered all around the RPC / NFS source paths.=0A= >>=0A= >>> This does not explain the original hung Linux client problem, but does = shed light on the RST war I could create by doing a network partitioning.= =0A= >>>=0A= >>> rick=0A= >>=0A= >> _______________________________________________=0A= >> freebsd-net@freebsd.org mailing list=0A= >> https://lists.freebsd.org/mailman/listinfo/freebsd-net=0A= >> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"= =0A= >=0A= > _______________________________________________=0A= > freebsd-net@freebsd.org mailing list=0A= > https://lists.freebsd.org/mailman/listinfo/freebsd-net=0A= > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"=0A= =0A=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?YQXPR0101MB0968359DC371C306EB462657DD729>