From owner-freebsd-net@freebsd.org Sat Apr 10 15:56:38 2021 Return-Path: Delivered-To: freebsd-net@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 51C9B5D08C0 for ; Sat, 10 Apr 2021 15:56:38 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from CAN01-TO1-obe.outbound.protection.outlook.com (mail-to1can01on0605.outbound.protection.outlook.com [IPv6:2a01:111:f400:fe5d::605]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mail.protection.outlook.com", Issuer "DigiCert Cloud Services CA-1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4FHfkY1NzSz3kKJ; Sat, 10 Apr 2021 15:56:35 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=iPQf0ZpFooHImAv3wEq1pccQmDtQQlw29ggIdaGL2MNuKWgMipj0nDu8cs5DQe3r8McqNt5x1OlNNyqlRXJUKBr1FCNP/7bt1tswvRPlassHsbLlkjDE8slzTX9QN6deyR6cMxF5nuIeqTHLGtosjTHBszp0tWIKYxbhKcNN0vrbJO6P+sbCZEubNleuh0DwddY1QvXB8vkfdXNX2MPI5qO4ANvavTiO5lXnp7ghebTGMofc9MbQi+xVK0aoAGq+4X3ZLxWmjS6gFbwpLuAQtl5hIMZXiiTo2FtPYIJSxVGUV0Wh6hAi8+CRDfjTppR0RR5Omq5fuIPDhPWkoAlyyA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=etFhs0w9pzXC/bbg1qGjOVdh4XbhI8GSw65RtF8qqCU=; b=OV2RTQl6ilNLULrEQiSwrjvE00zLRzGK6GARysv2nJoBx27LnPx7GpK4716c9hXkT2wDQzRQw9HPq9tsd6EJvwXsHjCJEbRJL11Ii01FGUfMq79h88wYs9zpUEzJfzuShe3Z8aIj0GNAprgI7KkT5/lY61ec39JVQf27lF7wJpTG2fclS/uTaghAIrHDzMOdkytoQp0fc0+wyeh6uIyx8Xxro32OFeZNn6OlhVVH8LvBlHmBdCSVwIRUIVGqhE+P0u1jQKYIr/bZwj2i5ksvvdBbdG86S/8yE1d++gvfilNFEN8LAq42qRxq61/gewM7tpvZn2W+cDUhZ5aBfzkzHw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=uoguelph.ca; dmarc=pass action=none header.from=uoguelph.ca; dkim=pass header.d=uoguelph.ca; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=uoguelph.ca; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=etFhs0w9pzXC/bbg1qGjOVdh4XbhI8GSw65RtF8qqCU=; b=pMJE0W3MsU9KC6kgSWRCubsT0COiquswUlHHuEmeWzBC6k3qvBHdJ7ELjpp8DmA/gTa3dI3hI3z403WVhcgj0DJEJD7jeqSPkBtv/O7wP3R2G355DD2G2Wvxhhqu1abwwH+KJcgafGnzhgd94jhY9dqadJm8U7iEPUS2oe+rZC+jTGTs2vZDlrjQGeKyLVsrWgrQkMWjpFsaAxslYmM4SP43UXdUChOi4otAWU1A2L9magAYbQWlMCWW0iJ2DAvmAB1BStzXfQvdA5TwN3xPeWgBQjycMLxNUBgxmYg0PdrwqcEkbm2ipUYuw9SEGvTmPlwvHKTM338TsxNeSHON4g== Received: from YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM (2603:10b6:c00:19::29) by YQBPR0101MB4241.CANPRD01.PROD.OUTLOOK.COM (2603:10b6:c01:8::24) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3999.32; Sat, 10 Apr 2021 15:56:33 +0000 Received: from YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM ([fe80::1c05:585a:132a:f08e]) by YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM ([fe80::1c05:585a:132a:f08e%4]) with mapi id 15.20.3999.035; Sat, 10 Apr 2021 15:56:33 +0000 From: Rick Macklem To: "Scheffenegger, Richard" , "tuexen@freebsd.org" CC: Youssef GHORBAL , "freebsd-net@freebsd.org" Subject: Re: NFS Mount Hangs Thread-Topic: NFS Mount Hangs Thread-Index: AQHXG1G2D7AHBwtmAkS1jBAqNNo2I6qMDIgAgALy8kyACNDugIAAsfOAgAfoFLeAARWpAIAAUOsEgAKJ2oCAADW73YAAG5EAgAA+DUKAAB1JAIAACEqkgAEOcgCAAI4UZoAAhYMAgAXXgNmAAJVDAIAAMi2AgAAnewCAAAnWOw== Date: Sat, 10 Apr 2021 15:56:33 +0000 Message-ID: References: <3750001D-3F1C-4D9A-A9D9-98BCA6CA65A4@tildenparkcapital.com> <33693DE3-7FF8-4FAB-9A75-75576B88A566@tildenparkcapital.com> <8E745920-1092-4312-B251-B49D11FE8028@pasteur.fr> <765CE1CD-6AAB-4BEF-97C6-C2A1F0FF4AC5@freebsd.org> <2B189169-C0C9-4DE6-A01A-BE916F10BABA@freebsd.org> , <077ECE2B-A84C-440D-AAAB-00293C841F14@freebsd.org>, In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 5cbdfe69-0c17-4060-d9b3-08d8fc3936f8 x-ms-traffictypediagnostic: YQBPR0101MB4241: x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:8882; x-ms-exchange-senderadcheck: 1 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: ApFixNxkUde6r7S2pH24IYBi3UinXUPzO4IJ7vqBsw0XqQ/qctbyK3BnhZYQjBbzlGDBPpNq1v3qSsC221cT35gdxiY0Lr01+R2SIGHh0LbM7pOFTcd8+7VzB9BZ2uvLbYNeV5sGeP3l6j7g8dyLU8xe0dT3WMTTk8CkF4aDbuttJPLM1COyaqOIiv8aHlb/uLD6oo/yiP4PmLQ+g8KC8BhyarCx6rNRloI1y7UylpaVZsv11r6dUs9igTm6BwVbvuXZFIxM+zDksSrmAeOq6e4EbmgMZdTKISBAH9azziItB7aBgVQv+Cu/sOiHSQEEXYCRG6lf27PrTPgrWYoD/5cjgP4iM+ESabXunl+Y+BPTJEqIMOMmUNPqW+L102APZskBvQXruHbBNwovQtt+GH/GLvu+S1pkr0syEBBQFAhdpC2burrqBgYtaXYdi/cUkmdKJsW2ZYs33Dpe8lZ+b6VeBGuffbYtWOV/81w47HuDl7OIxF2rFM6kmS4VqLcD88SDrrVkehLwvRrHvWiDhTMzqwjW3d/ACWA7vxLsGifV5QF9NqVaLiBLEem/hcXQB9N6eaQby4M1sfTQVIc475oLcu5nDW+UoifEfBtXjqkDcNKWfpoPI3Ofv3/d1L1gsE+j0SQQdmrrUQdZg1w0uwA+IxwtLz1/IuWO/l5kXPmlWRheswVNcJwyfMy6xDZ4 x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM; PTR:; CAT:NONE; SFS:(396003)(39860400002)(346002)(366004)(136003)(376002)(7696005)(9686003)(2906002)(5660300002)(76116006)(64756008)(66446008)(7116003)(66476007)(38100700002)(66946007)(91956017)(33656002)(55016002)(966005)(8936002)(66556008)(71200400001)(8676002)(316002)(83380400001)(3480700007)(478600001)(54906003)(6506007)(110136005)(52536014)(4326008)(186003)(786003)(86362001); DIR:OUT; SFP:1101; x-ms-exchange-antispam-messagedata: =?Windows-1252?Q?oCrR1oRGsCEpBQi/rTNaoGz5MhRF5Qu41SB+lHsVoIJC1TxvZMLbp/5Y?= =?Windows-1252?Q?g4TAKXXMkor1jYdRQHgDKuJrepyLsQD7X86GsdIz5fX6Uo9rjfSyXbQP?= =?Windows-1252?Q?abzmNkD5Um4tj8V7vwFepnvI9qZcd+U9qcCIwXJCNZfVk3JE1RxSIALh?= =?Windows-1252?Q?7VjZ7Hsryx20QbzHLGIG4JlhAJCa99vyMT7HgZ17XOiODFHe40BwdL4D?= =?Windows-1252?Q?TfBaIWmpuvUjg8MNI+jXmntkxXPKUJcClfNNlFE/3V0I2moKBhEupTr2?= =?Windows-1252?Q?URIfUH9A+Tr/vrSU3w6CDqDo6P5ldGlXJwyZ6MLH9X4Z+p6Cj3Z7vfUs?= =?Windows-1252?Q?tVIgEsQADoCJTkwg4nbWG2TIgk9kGSsf0ZBElDx5zfpHlN8m3xvZ8Ynx?= =?Windows-1252?Q?H8GsRDmFnPBItff6pqYBBlHgUollK9FgznDwvrWLVmDguJfLN5auPgY9?= =?Windows-1252?Q?0fSY1wL5emZaBvOYagLg3R0GnugGUIXBmPwWQSpVY9lPvxDYGPyuJVTo?= =?Windows-1252?Q?sSQVmc5Q4Eowx8BCllU/4Az9578DOBxCF0T/coywq3rLbFRUqfgAQjvm?= =?Windows-1252?Q?jUuIY1+XOFI+3soTahfhms9GQlh8JCNzfP+aTsncmK25fc9q27bEUg/S?= =?Windows-1252?Q?imzweTTnqtbf6FFFZjCQeK8NE2sJ16NrAs5SchqzlCIZBDyee7EkYm6Z?= =?Windows-1252?Q?F+Lo3f2pA4MDV34HL+pWjxMAb2kkgTD1PMcrdR4qRd/JGDV+Qup8x6PA?= =?Windows-1252?Q?CE9u/rJMt/JZuapPZaYamDaSjgxJdTG6BhocVNUMbfyqND1Gzk9W2nov?= =?Windows-1252?Q?ABHtI3tUvdrVu+cBv/9CYliNY1YuKt2j7VkoYm7tp1Oq1H/j2mH1eHRj?= =?Windows-1252?Q?OHHPyi34q9x9imO3uPVYrCaYRxVTQ3f3dLoI8W4po/IB2dv1nssY7oGl?= =?Windows-1252?Q?r2f355bc1AYDtBq0B0usZ3TvWwFU0byma76oUL8qIsPd9vmuakKd+/aM?= =?Windows-1252?Q?4DU5YQqFJOyjh/vh7q1hFqaFCHtbNM2Dhx8POYFnArZLE1rIIAtQktZ6?= =?Windows-1252?Q?W6J69keQX2G1uk4GQWKXSAo2DF5SPMHdG6dOGJ13UyxxNidPfdfq7gwA?= =?Windows-1252?Q?TvTrZgd8/kcvtO8r58d8XvxoHKmuEJ3uYJ8tEDIJMjnvORQLG73SwVrV?= =?Windows-1252?Q?6/RSTs2AyF1Vu7W58jYy4zRWFf6y/KmjtKBsGDS20moNr7YTVMadi2pU?= =?Windows-1252?Q?9ZVq5Ne0EFvfiikcFR4U7iyroZhfYGp+pIbcIXt8qkhQIihpxYu7zU8K?= =?Windows-1252?Q?Kq1Guqo4s+NLmgS99qGuxUOB3fkxLvJ28JB8KfP1zW7y/ZQUmVwdvbu+?= =?Windows-1252?Q?KZWCr38ccHzKx9EpxGlmrGshY1MCLI00n/6ayrtJ9e9ecfHudUWVgVK+?= =?Windows-1252?Q?dYyFGuYCamEjR8eXPqGdNtYBvUYw8OaMPkSLBGKOKboRW6vq1L11dFL0?= =?Windows-1252?Q?Zph3+ht0?= x-ms-exchange-transport-forked: True Content-Type: text/plain; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: uoguelph.ca X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM X-MS-Exchange-CrossTenant-Network-Message-Id: 5cbdfe69-0c17-4060-d9b3-08d8fc3936f8 X-MS-Exchange-CrossTenant-originalarrivaltime: 10 Apr 2021 15:56:33.7329 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: be62a12b-2cad-49a1-a5fa-85f4f3156a7d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: BVYKCbOflxPlfAy/fL9QP0mNvrOzGMCp7TXmIp4APtDFeJOr67W+05KeNmZHCMCiLaiJNaxa8KqujRGTfa4Kmg== X-MS-Exchange-Transport-CrossTenantHeadersStamped: YQBPR0101MB4241 X-Rspamd-Queue-Id: 4FHfkY1NzSz3kKJ X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=uoguelph.ca header.s=selector1 header.b=pMJE0W3M; arc=pass (microsoft.com:s=arcselector9901:i=1); dmarc=pass (policy=none) header.from=uoguelph.ca; spf=pass (mx1.freebsd.org: domain of rmacklem@uoguelph.ca designates 2a01:111:f400:fe5d::605 as permitted sender) smtp.mailfrom=rmacklem@uoguelph.ca X-Spamd-Result: default: False [-4.00 / 15.00]; TO_DN_EQ_ADDR_SOME(0.00)[]; RBL_DBL_DONT_QUERY_IPS(0.00)[2a01:111:f400:fe5d::605:from]; R_DKIM_ALLOW(-0.20)[uoguelph.ca:s=selector1]; FREEFALL_USER(0.00)[rmacklem]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[4]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+ip6:2a01:111:f400::/48]; MIME_GOOD(-0.10)[text/plain]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; DWL_DNSWL_LOW(-1.00)[uoguelph.ca:dkim]; RCVD_COUNT_THREE(0.00)[3]; TO_MATCH_ENVRCPT_SOME(0.00)[]; SPAMHAUS_ZRD(0.00)[2a01:111:f400:fe5d::605:from:127.0.2.255]; DKIM_TRACE(0.00)[uoguelph.ca:+]; DMARC_POLICY_ALLOW(-0.50)[uoguelph.ca,none]; NEURAL_SPAM_LONG(1.00)[1.000]; NEURAL_HAM_SHORT(-1.00)[-1.000]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; RCVD_TLS_LAST(0.00)[]; ASN(0.00)[asn:8075, ipnet:2a01:111:f000::/36, country:US]; ARC_ALLOW(-1.00)[microsoft.com:s=arcselector9901:i=1]; MAILMAN_DEST(0.00)[freebsd-net] X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 10 Apr 2021 15:56:38 -0000 Scheffenegger, Richard wrote:=0A= >>Rick wrote:=0A= >> Hi Rick,=0A= >>=0A= >>> Well, I have some good news and some bad news (the bad is mostly for Ri= chard).=0A= >>>=0A= >>> The only message logged is:=0A= >>> tcpflags 0x4; tcp_do_segment: Timestamp missing, segment processed= normally=0A= >>>=0A= Btw, I did get one additional message during further testing (with r367492 = reverted):=0A= tcpflags 0x4; syncache_chkrst: Our SYN|ACK was rejected, connection a= ttempt aborted=0A= by remote endpoint=0A= =0A= This only happened once of several test cycles.=0A= =0A= >>> But...the RST battle no longer occurs. Just one RST that works and then= the SYN gets SYN,ACK'd by the FreeBSD end and off it goes...=0A= >>>=0A= >>> So, what is different?=0A= >>>=0A= >>> r367492 is reverted from the FreeBSD server.=0A= >>> I did the revert because I think it might be what otis@ hang is being c= aused by. (In his case, the Recv-Q grows on the socket for the stuck Linux = client, while others work.=0A= >>>=0A= >>> Why does reverting fix this?=0A= >>> My only guess is that the krpc gets the upcall right away and sees a EP= IPE when it does soreceive()->results in soshutdown(SHUT_WR).=0A= This was bogus and incorrect. The diagnostic printf() I saw was generated f= or the=0A= back channel, and that would have occurred after the socket was shut down.= =0A= =0A= >>=0A= >> With r367492 you don't get the upcall with the same error state? Or you = don't get an error on a write() call, when there should be one?=0A= If Send-Q is 0 when the network is partitioned, after healing, the krpc see= s no activity on=0A= the socket (until it acquires/processes an RPC it will not do a sosend()).= =0A= Without the 6minute timeout, the RST battle goes on "forever" (I've never a= ctually=0A= waited more than 30minutes, which is close enough to "forever" for me).=0A= --> With the 6minute timeout, the "battle" stops after 6minutes, when the t= imeout=0A= causes a soshutdown(..SHUT_WR) on the socket.=0A= (Since the soshutdown() patch is not yet in "main". I got comments, b= ut no "reviewed"=0A= on it, the 6minute timer won't help if enabled in main. The soclose(= ) won't happen=0A= for TCP connections with the back channel enabled, such as Linux 4.1= /4.2 ones.)=0A= =0A= If Send-Q is non-empty when the network is partitioned, the battle will not= happen.=0A= =0A= >=0A= >My understanding is that he needs this error indication when calling shutd= own().=0A= There are several ways the krpc notices that a TCP connection is no longer = functional.=0A= - An error return like EPIPE from either sosend() or soreceive().=0A= - A return of 0 from soreceive() with no data (normal EOF from other end).= =0A= - A 6minute timeout on the server end, when no activity has occurred on the= =0A= connection. This timer is currently disabled for NFSv4.1/4.2 mounts in "m= ain",=0A= but I enabled it for this testing, to stop the "RST battle goes on foreve= r"=0A= during testing. I am thinking of enabling it on "main", but this crude ba= ndaid=0A= shouldn't be thought of as a "fix for the RST battle".=0A= =0A= >>=0A= >> From what you describe, this is on writes, isn't it? (I'm asking, at the= original problem that was fixed with r367492, occurs in the read path (dra= ining of ths so_rcv buffer in the upcall right away, which subsequently inf= luences the ACK sent by the stack).=0A= >>=0A= >> I only added the so_snd buffer after some discussion, if the WAKESOR sho= uldn't have a symmetric equivalent on WAKESOW....=0A= >>=0A= >> Thus a partial backout (leaving the WAKESOR part inside, but reverting t= he WAKESOW part) would still fix my initial problem about erraneous DSACKs = (which can also lead to extremely poor performance with Linux clients), but= possible address this issue...=0A= >>=0A= >> Can you perhaps take MAIN and apply https://reviews.freebsd.org/D29690 f= or the revert only on the so_snd upcall?=0A= Since the krpc only uses receive upcalls, I don't see how reverting the sen= d side would have=0A= any effect?=0A= =0A= >Since the release of 13.0 is almost done, can we try to fix the issue inst= ead of reverting the commit?=0A= I think it has already shipped broken.=0A= I don't know if an errata is possible, or if it will be broken until 13.1.= =0A= =0A= --> I am much more concerned with the otis@ stuck client problem than this = RST battle that only=0A= occurs after a network partitioning, especially if it is 13.0 specif= ic.=0A= I did this testing to try to reproduce Jason's stuck client (with co= nnection in CLOSE_WAIT)=0A= problem, which I failed to reproduce.=0A= =0A= rick=0A= =0A= Rs: agree, a good understanding where the interaction btwn stack, socket an= d in kernel tcp user breaks is needed;=0A= =0A= >=0A= > If this doesn't help, some major surgery will be necessary to prevent NFS= sessions with SACK enabled, to transmit DSACKs...=0A= =0A= My understanding is that the problem is related to getting a local error in= dication after=0A= receiving a RST segment too late or not at all.=0A= =0A= Rs: but the move of the upcall should not materially change that; i don=92t= have a pc here to see if any upcall actually happens on rst...=0A= =0A= Best regards=0A= Michael=0A= >=0A= >=0A= >> I know from a printf that this happened, but whether it caused the RST b= attle to not happen, I don't know.=0A= >>=0A= >> I can put r367492 back in and do more testing if you'd like, but I think= it probably needs to be reverted?=0A= >=0A= > Please, I don't quite understand why the exact timing of the upcall would= be that critical here...=0A= >=0A= > A comparison of the soxxx calls and errors between the "good" and the "ba= d" would be perfect. I don't know if this is easy to do though, as these ca= lls appear to be scattered all around the RPC / NFS source paths.=0A= >=0A= >> This does not explain the original hung Linux client problem, but does s= hed light on the RST war I could create by doing a network partitioning.=0A= >>=0A= >> rick=0A= >=0A= > _______________________________________________=0A= > freebsd-net@freebsd.org mailing list=0A= > https://lists.freebsd.org/mailman/listinfo/freebsd-net=0A= > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"=0A= =0A=