Date: Sat, 10 Apr 2021 14:40:24 +0000 From: "Scheffenegger, Richard" <Richard.Scheffenegger@netapp.com> To: "tuexen@freebsd.org" <tuexen@freebsd.org> Cc: Rick Macklem <rmacklem@uoguelph.ca>, Youssef GHORBAL <youssef.ghorbal@pasteur.fr>, "freebsd-net@freebsd.org" <freebsd-net@freebsd.org> Subject: Re: NFS Mount Hangs Message-ID: <SN4PR0601MB37287855390FB8A989381CFE86729@SN4PR0601MB3728.namprd06.prod.outlook.com> In-Reply-To: <077ECE2B-A84C-440D-AAAB-00293C841F14@freebsd.org> References: <C643BB9C-6B61-4DAC-8CF9-CE04EA7292D0@tildenparkcapital.com> <3750001D-3F1C-4D9A-A9D9-98BCA6CA65A4@tildenparkcapital.com> <33693DE3-7FF8-4FAB-9A75-75576B88A566@tildenparkcapital.com> <D67AF317-D238-4EC0-8C7F-22D54AD5144C@pasteur.fr> <YQXPR0101MB09684AB7BEFA911213604467DD669@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <C87066D3-BBF1-44E1-8398-E4EB6903B0F2@tildenparkcapital.com> <8E745920-1092-4312-B251-B49D11FE8028@pasteur.fr> <YQXPR0101MB0968C44C7C82A3EB64F384D0DD7B9@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <DEF8564D-0FE9-4C2C-9F3B-9BCDD423377C@freebsd.org> <YQXPR0101MB0968E0A17D8BCACFAF132225DD7A9@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <SN4PR0601MB3728E392BCA494EAD49605FE86789@SN4PR0601MB3728.namprd06.prod.outlook.com> <YQXPR0101MB09686B4F921B96DCAFEBF874DD789@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <765CE1CD-6AAB-4BEF-97C6-C2A1F0FF4AC5@freebsd.org> <YQXPR0101MB096876B44F33BAD8991B62C8DD789@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <2B189169-C0C9-4DE6-A01A-BE916F10BABA@freebsd.org> <YQXPR0101MB09688645194907BBAA6E7C7ADD789@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <BF5D23D3-5DBD-4E29-9C6B-F4CCDC205353@freebsd.org> <YQXPR0101MB096826445C85921C8F6410A2DD779@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <E4A51EAD-8F9A-49BB-8852-F9D61BDD9EA4@freebsd.org> <YQXPR0101MB09682F230F25FBF3BC427135DD729@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <SN4PR0601MB3728AF2554FDDFB4EEF2C95B86729@SN4PR0601MB3728.namprd06.prod.outlook.com>, <077ECE2B-A84C-440D-AAAB-00293C841F14@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
________________________________ Von: tuexen@freebsd.org <tuexen@freebsd.org> Gesendet: Samstag, April 10, 2021 2:19 PM An: Scheffenegger, Richard Cc: Rick Macklem; Youssef GHORBAL; freebsd-net@freebsd.org Betreff: Re: NFS Mount Hangs NetApp Security WARNING: This is an external email. Do not click links or o= pen attachments unless you recognize the sender and know the content is saf= e. > On 10. Apr 2021, at 11:19, Scheffenegger, Richard <Richard.Scheffenegger@= netapp.com> wrote: > > Hi Rick, > >> Well, I have some good news and some bad news (the bad is mostly for Ric= hard). >> >> The only message logged is: >> tcpflags 0x4<RST>; tcp_do_segment: Timestamp missing, segment processed = normally >> >> But...the RST battle no longer occurs. Just one RST that works and then = the SYN gets SYN,ACK'd by the FreeBSD end and off it goes... >> >> So, what is different? >> >> r367492 is reverted from the FreeBSD server. >> I did the revert because I think it might be what otis@ hang is being ca= used by. (In his case, the Recv-Q grows on the socket for the stuck Linux c= lient, while others work. >> >> Why does reverting fix this? >> My only guess is that the krpc gets the upcall right away and sees a EPI= PE when it does soreceive()->results in soshutdown(SHUT_WR). > > With r367492 you don't get the upcall with the same error state? Or you d= on't get an error on a write() call, when there should be one? My understanding is that he needs this error indication when calling shutdo= wn(). > > From what you describe, this is on writes, isn't it? (I'm asking, at the = original problem that was fixed with r367492, occurs in the read path (drai= ning of ths so_rcv buffer in the upcall right away, which subsequently infl= uences the ACK sent by the stack). > > I only added the so_snd buffer after some discussion, if the WAKESOR shou= ldn't have a symmetric equivalent on WAKESOW.... > > Thus a partial backout (leaving the WAKESOR part inside, but reverting th= e WAKESOW part) would still fix my initial problem about erraneous DSACKs (= which can also lead to extremely poor performance with Linux clients), but = possible address this issue... > > Can you perhaps take MAIN and apply https://reviews.freebsd.org/D29690 fo= r the revert only on the so_snd upcall? Since the release of 13.0 is almost done, can we try to fix the issue inste= ad of reverting the commit? Rs: agree, a good understanding where the interaction btwn stack, socket an= d in kernel tcp user breaks is needed; > > If this doesn't help, some major surgery will be necessary to prevent NFS= sessions with SACK enabled, to transmit DSACKs... My understanding is that the problem is related to getting a local error in= dication after receiving a RST segment too late or not at all. Rs: but the move of the upcall should not materially change that; i don=92t= have a pc here to see if any upcall actually happens on rst... Best regards Michael > > >> I know from a printf that this happened, but whether it caused the RST b= attle to not happen, I don't know. >> >> I can put r367492 back in and do more testing if you'd like, but I think= it probably needs to be reverted? > > Please, I don't quite understand why the exact timing of the upcall would= be that critical here... > > A comparison of the soxxx calls and errors between the "good" and the "ba= d" would be perfect. I don't know if this is easy to do though, as these ca= lls appear to be scattered all around the RPC / NFS source paths. > >> This does not explain the original hung Linux client problem, but does s= hed light on the RST war I could create by doing a network partitioning. >> >> rick > > _______________________________________________ > freebsd-net@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?SN4PR0601MB37287855390FB8A989381CFE86729>