From owner-freebsd-net@freebsd.org Tue Mar 23 00:50:01 2021 Return-Path: Delivered-To: freebsd-net@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id AFCAD5BFDA4 for ; Tue, 23 Mar 2021 00:50:01 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from CAN01-TO1-obe.outbound.protection.outlook.com (mail-eopbgr670066.outbound.protection.outlook.com [40.107.67.66]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mail.protection.outlook.com", Issuer "DigiCert Cloud Services CA-1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4F4CSm4jpFz3HWL for ; Tue, 23 Mar 2021 00:50:00 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=X4EBOcEjcH9ErEhdMYPKlcNyYK/bbBlqtN1IIiZ6ZtXow353iEjUGei12T4vJsTxBhd3LH480ZoV15+D8n6/72sOy6ao5hE8gwennKl8KGmRrFHLHLsPXNu0pR3QvrDT5lFY/5ipVmAkW5hxs0Xg745sUWxfYUi9K8PsaQQilrCigRhkUY+pCF8Mh6zsBjyfu9c8uQlaP0YO9+M+FtZgAQWQGNbXvfQXgiUSEsroDiw7Rj4hwAttBJnwKwHcnG9c5sAB1nd6wFcmOHv8mn8sKxRETq4liPxFzu7C1m9pr887BX8ith7dGYlhmdGBuLTF0y8HUWN8LWgRmCceP+1Ltw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=PevoPa5Rz/RFyh/E7mZKLjHF1lDgU2LxnRALzcndxJ8=; b=ULn0qEneOsF9B9/SS11GBflbY5w3uiqMGev8TekgkUHDWxxGSlmKV2PRABinnibieM4a748efS1ZLu6X+0g7cUN8WK60QyHIh4TTuNe/Vi77Fcj9apVvBydOuoL03IOySih5Z/m+9ZIHeLHpkrtMLNnorB4nTG1wFLine4CGm5nJgyAlUly/4SlRS+7sDC+WapSH6g/fxxW6Pcl5a9pHFvF3Wctuks7sGcfjJojy5BTt4pnAYda/2FGoMeggJQvBg5cFAyau2XoVvKuf49KkjGFopGMQAVcNmco2gM7lMond+j3zLox3liur4QRpbconW8FuceEbcxNDWJ3Zi9s9Cg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=uoguelph.ca; dmarc=pass action=none header.from=uoguelph.ca; dkim=pass header.d=uoguelph.ca; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=uoguelph.ca; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=PevoPa5Rz/RFyh/E7mZKLjHF1lDgU2LxnRALzcndxJ8=; b=UhYudLUXzoqzERE6xvWz6DpWlzNC4KvpOmUC3hfHY3x77egj7RhcpR0XS+vWJUyNXQwlPKrzeIsyc7j8loz5+7Nd26+dK60s9H23WoidQn2jBuosB1f1IPO8oh6uX+XjftcjL6IY7Qn+iHpo2DF4ktXkqNXYHWh60gySgXLanUiFlUMJqH7SHlMIdA0jlnXWIBiosz7ItAxII3mdZu5+JfsgFj4NAMcKaaonkefh4Mjdq9UU6+6z04CImqvYD4SQah6RTjbtu9Js8vro5PiUPBxHWZc20y8Bp1PwG4JTrbB83ikRhV8OinbLgOYH4kKyI3BH3hWlnrpptov8tsKrXQ== Received: from YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM (2603:10b6:c00:19::29) by YQXPR01MB4659.CANPRD01.PROD.OUTLOOK.COM (2603:10b6:c01:1e::24) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3955.24; Tue, 23 Mar 2021 00:49:57 +0000 Received: from YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM ([fe80::6073:6fc0:5ddf:dc8a]) by YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM ([fe80::6073:6fc0:5ddf:dc8a%7]) with mapi id 15.20.3955.027; Tue, 23 Mar 2021 00:49:57 +0000 From: Rick Macklem To: Jason Breitman , Youssef GHORBAL CC: "freebsd-net@freebsd.org" Subject: Re: NFS Mount Hangs Thread-Topic: NFS Mount Hangs Thread-Index: AQHXG1G2D7AHBwtmAkS1jBAqNNo2I6qMDIgAgAJsjoCAAWzyAIAAILAAgAC97fg= Date: Tue, 23 Mar 2021 00:49:57 +0000 Message-ID: References: <3750001D-3F1C-4D9A-A9D9-98BCA6CA65A4@tildenparkcapital.com> <33693DE3-7FF8-4FAB-9A75-75576B88A566@tildenparkcapital.com> , In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 44998307-a352-482c-4d58-08d8ed9594cd x-ms-traffictypediagnostic: YQXPR01MB4659: x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:4941; x-ms-exchange-senderadcheck: 1 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: w9nd9639x0XoqsPesxcuVBnrA+swHxnT7riLV4PVwJvBTFnpUe6wYi965t3gHMTcic7Jqvr2ZNwGiElv5Z+hxFoMWHEyZOwPThA88BfQGnOEJkaOvFEjt3HW7CkHg1FBn30hUE6wVbbSR6h1LVNRGmJadLvJ9Jj+BWeTRCr7di/ZbdXc8OPKKQlLjcCgxvj0am9ce3ykrtxelwev2ZqHBcKQQEQR2g1SQGH/KGoWn+RSWbCfySX1Ga67YcvNdf+tmeeSrOUtQcAzY4UjPt4+qE5GCDIsggk13FhlYBUybhXJmqcWK9YZsSJqQGSiFULjNG2HljuwVb1Ku95llkbOZ6i4C5I3aL9D6F7rEIorXvPM2y+q1RKCBanuU5QKZZ4hJU05y04u14ZgoCgdl2m0l8FdpN566KZJGCeTIEqs984BSdr8QZQN7ofbW3W1kdy0pAJYwmEnHwwhWrycFdDCJXB7Q+RG2FXHzJAmEgPvpJC9jLUl+qwR46JDQ0f5XpC6lqeSE1bEjAlGWHGEBqKZ7RfVFh+u0JVVFaZgXMGBO1KMkNlTbJtj5ancFpEy9StLGUzTaZ2QNH2P9KgdMAeYlYDgNq0TSxFbZFKLhPW10HTVreWQjhm/EIBq11iM2Y1VPL/GBawnCLrv4a/V7ZUfH1p4IRK+t/ldVuK2oIHP99WuPeyKkbyq3xTXfGpokic4YtoZma40H+2VgPBax8QwEC6NSe4DP0DXWkdPekEqfxI= x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM; PTR:; CAT:NONE; SFS:(346002)(376002)(136003)(366004)(39860400002)(396003)(91956017)(316002)(110136005)(66476007)(38100700001)(66556008)(966005)(64756008)(76116006)(2906002)(52536014)(9686003)(83380400001)(7696005)(5660300002)(786003)(66946007)(66446008)(8936002)(86362001)(53546011)(186003)(478600001)(33656002)(3480700007)(7116003)(6506007)(71200400001)(4326008)(8676002)(55016002); DIR:OUT; SFP:1101; x-ms-exchange-antispam-messagedata: =?Windows-1252?Q?eQ4N2VPV2YF/PnAJiDJkh8dx0BlgpQlIocBFUSoZJAJtapWT7ZXaqm1D?= =?Windows-1252?Q?4bQIWGLTMsHCclTzE6ICRLzLFZ5Rn3YUiAndGbm5MqVXvabG6D1H7CzY?= =?Windows-1252?Q?mnQgX1pPYJKIiD1nN35N0UVWUlvulkFNC8ZjuyZSWF5HF6Gjs+K9XSvq?= =?Windows-1252?Q?B8OKCpyAX63GvC/QbSRzR6AIa8Ly/kqDnOOwta8dnTEMRqCkKCxfppgT?= =?Windows-1252?Q?cpX0PWlgvNpg9dxcI7vhZVn2qAqcUFApnbQ7/4RIqn6BqmTO6h0RKsH3?= =?Windows-1252?Q?/pFiXgXY8btm8JHzXbLLdQ9W9jTzNAT5XSgSK5DyEEGJJT7Iv1/eGAv+?= =?Windows-1252?Q?AkdwArpqfbWoqYa7sLkV5cs9RDIj/vqwNf/Lg48zU+IkdcLEqq9zMKYa?= =?Windows-1252?Q?qaB2HRoPLSKls755KH0/KK8DPCUb8Llfr68OUVWmAhnf8pddJWn3Ovbp?= =?Windows-1252?Q?VKAEcbVbYCWyJK8XhKKDYEDUmk4Z1Uy4lK+DdeBSPTDJeh+XkyIRNyND?= =?Windows-1252?Q?N14vBI30xvS5SiqhaUugtxxBkaIg+W2rvVw96ESq4HpFMZ9cbbZID/Un?= =?Windows-1252?Q?UbPxTH6lMuKNZINv9fVjl2gqYoQKSkS+SozXKkjUdkAiYt3z4+63PgDk?= =?Windows-1252?Q?0hRIIevW72syN/UbLn4P5ryGTiFPnep0WrlKJ9I2XhE5/z8RUA3BERRc?= =?Windows-1252?Q?JH9P8wCvDURloJujMsp3P1OV0JvC3HKNKQ/SILifHvv0N6bvfa2ZTghj?= =?Windows-1252?Q?W3LBE6H1Bkml0lOIzOGWPc7AZqJ+g3Qn2eczBWynB0w/d90+cYBIgm6k?= =?Windows-1252?Q?xrregR/B4X+/dJT970nmgquRy+CrmoE28U2Z74tN5K1cjIy+GeLsImMr?= =?Windows-1252?Q?1hd7i2scDea7aDA5BWCEe7SzjqlvSAbGtY4QhQ18evEzHq3x1ThmxI8g?= =?Windows-1252?Q?MlhcofKKxYRNfmuo8JlwAyjv5+w5rQePzuhVgjRT2d5b6Mfbebz78cDl?= =?Windows-1252?Q?+u6PwGGe2ZeWNua8RqBCbct4fIGGnT37TBeT3gm2CXd8uIHdNGiTxSLf?= =?Windows-1252?Q?Q0JmQJ/RheObRYuXPdifxI5gXlsk3gjSUKwQFrVx/oc7jEy9Os6p+U0A?= =?Windows-1252?Q?O3X3F+D2s/MbgT9qWTMzfLG3Q52Fck5Kst8Kzk88koHnF8Eb5Pq3+eGi?= =?Windows-1252?Q?aiFcz6Xnk6/+n9nYCL6dE0wuXuPl0Y8D1195DajDdEVxpxAZuVkI/lSc?= =?Windows-1252?Q?66D6sUxy29Awa7nRG/GvHC3A1sDpEJKRfgTHjwPo7ySbj6qWA6I5WOfJ?= =?Windows-1252?Q?AGEjvNMzmPl+uVtBmqbV4d59LM07V57KJvS8zBhtpI0h+mScVmlOil5B?= =?Windows-1252?Q?r143e9T+Zo1K5TS3Xirc9U0DhZh3LeAqS/FslPYRYvakLMiKB3+7TUac?= =?Windows-1252?Q?+eAqalV9Yjuq4r+maCqp2Yp0mppPEPGA/KnI/DZthQw=3D?= x-ms-exchange-transport-forked: True Content-Type: text/plain; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: uoguelph.ca X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM X-MS-Exchange-CrossTenant-Network-Message-Id: 44998307-a352-482c-4d58-08d8ed9594cd X-MS-Exchange-CrossTenant-originalarrivaltime: 23 Mar 2021 00:49:57.4868 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: be62a12b-2cad-49a1-a5fa-85f4f3156a7d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: 1remKBl+CsmlwE9oWLtGjbS2c23gjEwJoD8epjCgrEMsHzIwaV6CTxuXqebppMhR8iGTRjLZIFrgiaNrO8Icgg== X-MS-Exchange-Transport-CrossTenantHeadersStamped: YQXPR01MB4659 X-Rspamd-Queue-Id: 4F4CSm4jpFz3HWL X-Spamd-Bar: ------ Authentication-Results: mx1.freebsd.org; dkim=pass header.d=uoguelph.ca header.s=selector1 header.b=UhYudLUX; arc=pass (microsoft.com:s=arcselector9901:i=1); dmarc=pass (policy=none) header.from=uoguelph.ca; spf=pass (mx1.freebsd.org: domain of rmacklem@uoguelph.ca designates 40.107.67.66 as permitted sender) smtp.mailfrom=rmacklem@uoguelph.ca X-Spamd-Result: default: False [-6.09 / 15.00]; TO_DN_EQ_ADDR_SOME(0.00)[]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+ip4:40.107.0.0/16]; RCVD_COUNT_THREE(0.00)[3]; DKIM_TRACE(0.00)[uoguelph.ca:+]; DMARC_POLICY_ALLOW(-0.50)[uoguelph.ca,none]; NEURAL_HAM_SHORT(-0.99)[-0.991]; RCVD_IN_DNSWL_LOW(-0.10)[40.107.67.66:from]; RCVD_TLS_LAST(0.00)[]; RBL_DBL_DONT_QUERY_IPS(0.00)[40.107.67.66:from]; ARC_ALLOW(-1.00)[microsoft.com:s=arcselector9901:i=1]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:8075, ipnet:40.104.0.0/14, country:US]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; R_DKIM_ALLOW(-0.20)[uoguelph.ca:s=selector1]; FROM_EQ_ENVFROM(0.00)[]; FREEFALL_USER(0.00)[rmacklem]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; MIME_GOOD(-0.10)[text/plain]; NEURAL_HAM_LONG(-1.00)[-1.000]; DWL_DNSWL_LOW(-1.00)[uoguelph.ca:dkim]; SPAMHAUS_ZRD(0.00)[40.107.67.66:from:127.0.2.255]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RWL_MAILSPIKE_POSSIBLE(0.00)[40.107.67.66:from]; MAILMAN_DEST(0.00)[freebsd-net] X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 23 Mar 2021 00:50:01 -0000 I am going to create a FreeBSD PR for this, so that this does not get forgotten. If anyone has a problem with me cutting/pasting their comments in this thread into the PR, please email me soon. (If I don't hear from you soon, I'll assume you are ok with it.) Same goes for a post to linux-nfs@ver.kernels.org at some point. I think the recently posted patch *might* work around the problem. The underlying cause will likely be a mystery for some time, I think? Thanks everyone for your comments, rick ________________________________________ From: owner-freebsd-net@freebsd.org on beha= lf of Jason Breitman Sent: Monday, March 22, 2021 9:24 AM To: Youssef GHORBAL Cc: freebsd-net@freebsd.org Subject: Re: NFS Mount Hangs CAUTION: This email originated from outside of the University of Guelph. Do= not click links or open attachments unless you recognize the sender and kn= ow the content is safe. If in doubt, forward suspicious emails to IThelp@uo= guelph.ca Agreed. I had made the changes on the FreeBSD Server side and was suggesti= ng that a new TCP connection needed to be established between the client an= d server for the settings to take effect. I rebooted all of my Debian clients on Sunday to achieve that goal, establi= shing a new NFSv4 TCP connection with the file server, and will let the gr= oup know if I see another hang. Jason Breitman On Mar 22, 2021, at 7:27 AM, Youssef GHORBAL w= rote: > On 21 Mar 2021, at 14:41, Jason Breitman wrote: > > Thanks for sharing as this sounds exactly like my issue. > > I had implemented the change below on 3/8/2021 and have experienced the N= FS hang after that. > Do I need to reboot or umount / mount all of the clients and then I will = be ok? > > I had not rebooted the clients, but would to get out of this situation. > It is logical that a new TCP session over 2049 needs to be reestablished = for the changes to take effect. > > net.inet.tcp.fast_finwait2_recycle=3D1 > net.inet.tcp.finwait2_timeout=3D1000 In my case, those were implemented on the server (FreeBSD side) since the B= SD box that was closing the connection and the FIN_WAIT_2 state was on its = side. In your cas the FIN_WAIT_2 is on the client side. I don=92t know if these s= ysctl are even availale on Linux. > I can also confirm that the iptables solution that you use on the client = to get out of the hung mount without a reboot work for me. > #!/bin/sh > > progName=3D"nfsClientFix" > delay=3D15 > nfs_ip=3DNFS.Server.IP.X > > nfs_fin_wait2_state() { > /usr/bin/netstat -an | /usr/bin/grep ${nfs_ip}:2049 | /usr/bin/grep FIN= _WAIT2 > /dev/null 2>&1 > return $? > } > > > nfs_fin_wait2_state > result=3D$? > if [ ${result} -eq 0 ] ; then > /usr/bin/logger -s -i -p local7.error -t ${progName} "NFS Connection is= in FIN_WAIT2!" > /usr/bin/logger -s -i -p local7.error -t ${progName} "Enabling firewall= to block ${nfs_ip}!" > /usr/sbin/iptables -A INPUT -s ${nfs_ip} -j DROP > > while true > do > /usr/bin/sleep ${delay} > nfs_fin_wait2_state > result=3D$? > if [ ${result} -ne 0 ] ; then > /usr/bin/logger -s -i -p local7.notice -t ${progName} "NFS Conn= ection is OK." > /usr/bin/logger -s -i -p local7.error -t ${progName} "Disabling= firewall to allow access to ${nfs_ip}!" > /usr/sbin/iptables -D INPUT -s ${nfs_ip} -j DROP > break > fi > done > fi > > > Jason Breitman > > > On Mar 19, 2021, at 8:40 PM, Youssef GHORBAL = wrote: > > Hi Jason, > >> On 17 Mar 2021, at 18:17, Jason Breitman wrote: >> >> Please review the details below and let me know if there is a setting th= at I should apply to my FreeBSD NFS Server or if there is a bug fix that I = can apply to resolve my issue. >> I shared this information with the linux-nfs mailing list and they belie= ve the issue is on the server side. >> >> Issue >> NFSv4 mounts periodically hang on the NFS Client. >> >> During this time, it is possible to manually mount from another NFS Serv= er on the NFS Client having issues. >> Also, other NFS Clients are successfully mounting from the NFS Server in= question. >> Rebooting the NFS Client appears to be the only solution. > > I had experienced a similar weird situation with periodically stuck Linux= NFS clients mounting Isilon NFS servers (Isilon is FreeBSD based but they = seem to have there own nfsd) > We=92ve had better luck and we did manage to have packet captures on both= sides during the issue. The gist of it goes like follows: > > - Data flows correctly between SERVER and the CLIENT > - At some point SERVER starts decreasing it's TCP Receive Window until it= reachs 0 > - The client (eager to send data) can only ack data sent by SERVER. > - When SERVER was done sending data, the client starts sending TCP Window= Probes hoping that the TCP Window opens again so he can flush its buffers. > - SERVER responds with a TCP Zero Window to those probes. > - After 6 minutes (the NFS server default Idle timeout) SERVER racefully = closes the TCP connection sending a FIN Packet (and still a TCP Window at 0= ) > - CLIENT ACK that FIN. > - SERVER goes in FIN_WAIT_2 state > - CLIENT closes its half part part of the socket and goes in LAST_ACK sta= te. > - FIN is never sent by the client since there still data in its SendQ and= receiver TCP Window is still 0. At this stage the client starts sending TC= P Window Probes again and again hoping that the server opens its TCP Window= so it can flush it's buffers and terminate its side of the socket. > - SERVER keeps responding with a TCP Zero Window to those probes. > =3D> The last two steps goes on and on for hours/days freezing the NFS mo= unt bound to that TCP session. > > If we had a situation where CLIENT was responsible for closing the TCP Wi= ndow (and initiating the TCP FIN first) and server wanting to send data we= =92ll end up in the same state as you I think. > > We=92ve never had the root cause of why the SERVER decided to close the T= CP Window and no more acccept data, the fix on the Isilon part was to recyc= le more aggressively the FIN_WAIT_2 sockets (net.inet.tcp.fast_finwait2_rec= ycle=3D1 & net.inet.tcp.finwait2_timeout=3D5000). Once the socket recycled = and at the next occurence of CLIENT TCP Window probe, SERVER sends a RST, t= riggering the teardown of the session on the client side, a new TCP handcha= ke, etc and traffic flows again (NFS starts responding) > > To avoid rebooting the client (and before the aggressive FIN_WAIT_2 was = implemented on the Isilon side) we=92ve added a check script on the client = that detects LAST_ACK sockets on the client and through iptables rule enfor= ces a TCP RST, Something like: -A OUTPUT -p tcp -d $nfs_server_addr --sport= $local_port -j REJECT --reject-with tcp-reset (the script removes this ipt= ables rule as soon as the LAST_ACK disappears) > > The bottom line would be to have a packet capture during the outage (clie= nt and/or server side), it will show you at least the shape of the TCP exch= ange when NFS is stuck. > > Youssef > > _______________________________________________ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"