From owner-freebsd-net@freebsd.org Sat Apr 10 12:19:08 2021 Return-Path: Delivered-To: freebsd-net@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 63DAC5E5107 for ; Sat, 10 Apr 2021 12:19:08 +0000 (UTC) (envelope-from tuexen@freebsd.org) Received: from drew.franken.de (mail-n.franken.de [193.175.24.27]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "*.franken.de", Issuer "Sectigo RSA Domain Validation Secure Server CA" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4FHYvc1bbMz3FFf for ; Sat, 10 Apr 2021 12:19:07 +0000 (UTC) (envelope-from tuexen@freebsd.org) Received: from [IPv6:2a02:8109:1140:c3d:1507:c609:f682:ea59] (unknown [IPv6:2a02:8109:1140:c3d:1507:c609:f682:ea59]) (Authenticated sender: macmic) by mail-n.franken.de (Postfix) with ESMTPSA id 81C6D70757838; Sat, 10 Apr 2021 14:19:05 +0200 (CEST) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.60.0.2.21\)) Subject: Re: NFS Mount Hangs From: tuexen@freebsd.org In-Reply-To: Date: Sat, 10 Apr 2021 14:19:05 +0200 Cc: Rick Macklem , Youssef GHORBAL , "freebsd-net@freebsd.org" Content-Transfer-Encoding: quoted-printable Message-Id: <077ECE2B-A84C-440D-AAAB-00293C841F14@freebsd.org> References: <3750001D-3F1C-4D9A-A9D9-98BCA6CA65A4@tildenparkcapital.com> <33693DE3-7FF8-4FAB-9A75-75576B88A566@tildenparkcapital.com> <8E745920-1092-4312-B251-B49D11FE8028@pasteur.fr> <765CE1CD-6AAB-4BEF-97C6-C2A1F0FF4AC5@freebsd.org> <2B189169-C0C9-4DE6-A01A-BE916F10BABA@freebsd.org> To: "Scheffenegger, Richard" X-Mailer: Apple Mail (2.3654.60.0.2.21) X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED,BAYES_00, URIBL_BLOCKED autolearn=disabled version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on mail-n.franken.de X-Rspamd-Queue-Id: 4FHYvc1bbMz3FFf X-Spamd-Bar: / Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [0.00 / 15.00]; local_wl_from(0.00)[freebsd.org]; ASN(0.00)[asn:680, ipnet:193.174.0.0/15, country:DE] X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 10 Apr 2021 12:19:08 -0000 > On 10. Apr 2021, at 11:19, Scheffenegger, Richard = wrote: >=20 > Hi Rick, >=20 >> Well, I have some good news and some bad news (the bad is mostly for = Richard). >>=20 >> The only message logged is: >> tcpflags 0x4; tcp_do_segment: Timestamp missing, segment = processed normally >>=20 >> But...the RST battle no longer occurs. Just one RST that works and = then the SYN gets SYN,ACK'd by the FreeBSD end and off it goes... >>=20 >> So, what is different? >>=20 >> r367492 is reverted from the FreeBSD server. >> I did the revert because I think it might be what otis@ hang is being = caused by. (In his case, the Recv-Q grows on the socket for the stuck = Linux client, while others work. >>=20 >> Why does reverting fix this? >> My only guess is that the krpc gets the upcall right away and sees a = EPIPE when it does soreceive()->results in soshutdown(SHUT_WR). >=20 > With r367492 you don't get the upcall with the same error state? Or = you don't get an error on a write() call, when there should be one? My understanding is that he needs this error indication when calling = shutdown(). >=20 > =46rom what you describe, this is on writes, isn't it? (I'm asking, at = the original problem that was fixed with r367492, occurs in the read = path (draining of ths so_rcv buffer in the upcall right away, which = subsequently influences the ACK sent by the stack). >=20 > I only added the so_snd buffer after some discussion, if the WAKESOR = shouldn't have a symmetric equivalent on WAKESOW.... >=20 > Thus a partial backout (leaving the WAKESOR part inside, but reverting = the WAKESOW part) would still fix my initial problem about erraneous = DSACKs (which can also lead to extremely poor performance with Linux = clients), but possible address this issue... >=20 > Can you perhaps take MAIN and apply https://reviews.freebsd.org/D29690 = for the revert only on the so_snd upcall? Since the release of 13.0 is almost done, can we try to fix the issue = instead of reverting the commit? >=20 > If this doesn't help, some major surgery will be necessary to prevent = NFS sessions with SACK enabled, to transmit DSACKs... My understanding is that the problem is related to getting a local error = indication after receiving a RST segment too late or not at all. Best regards Michael >=20 >=20 >> I know from a printf that this happened, but whether it caused the = RST battle to not happen, I don't know. >>=20 >> I can put r367492 back in and do more testing if you'd like, but I = think it probably needs to be reverted? >=20 > Please, I don't quite understand why the exact timing of the upcall = would be that critical here... >=20 > A comparison of the soxxx calls and errors between the "good" and the = "bad" would be perfect. I don't know if this is easy to do though, as = these calls appear to be scattered all around the RPC / NFS source = paths. >=20 >> This does not explain the original hung Linux client problem, but = does shed light on the RST war I could create by doing a network = partitioning. >>=20 >> rick >=20 > _______________________________________________ > freebsd-net@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"