From owner-freebsd-net@freebsd.org Wed Mar 17 21:58:29 2021 Return-Path: Delivered-To: freebsd-net@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id BAA8E57A3E0 for ; Wed, 17 Mar 2021 21:58:29 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from CAN01-QB1-obe.outbound.protection.outlook.com (mail-qb1can01on0616.outbound.protection.outlook.com [IPv6:2a01:111:f400:fe5c::616]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mail.protection.outlook.com", Issuer "DigiCert Cloud Services CA-1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4F13v81Rqqz3NvQ; Wed, 17 Mar 2021 21:58:27 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=WpawpshanpM0MP2Ox9LQzMI8q44YREB2S/syAGvgeJ9VLTVTFbAQAOPpykLGtCk5abcW4KpGajm7Lceclc4RtCk8a0qQrqQiFDOkFCmSwbroLar16r8JdBNVo7d5Ld5HMUNV4/RXZVMSViluQ9RZmJti12gnKYO4/7rNVg/Fdq2XJvo6lrYszz5EU46vdpeHqmWtG2i8Ge2pjLB0F8zfMNS8d7POR35EPswAvRkbuPqiI8UtSPzbRw0mVR/eQSzdKY1Y7gbbF+gmG65LIvFHvK7GN4Fm3d8wxfN/cbMxTQemWDUPVHShOxXRI2wzJdSAk3flydPvZnEsQfVKX4kEFQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=MCrgBCPgo6EwFLhs/Z5l+E2CzpUdhmpUWFFTvRGl+Ys=; b=N76FFyd36GOUiMn6Z1qbnA9waQ37sINu+60ontmbsIHT9/DerQftTqSX5JjlC5n+x1FQ7coFe9xh7cf3MgZusPbzSngIl8PHh1GHNW1A9+enEk2LxIUfnSvU0ke3syugdVbYP4SnW+uppUCPi+GuDiiyW262UK0djRyfymAIB3PW55x2iSozvCpHNFSxOoAnqc0abZhdoC7+hxiXjUndLrbi4yiXC1MjJCL8aHnLPWr8ADZMGp5M/2dENTbnlRCJBlgE8dkLTVNUZLjsl8yUOVhEqmP83nAL7hDCo1JyZuBYdDXksWDLoDqCh+TypnVDqIbtanmpZLHk9qu4mgQvOA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=uoguelph.ca; dmarc=pass action=none header.from=uoguelph.ca; dkim=pass header.d=uoguelph.ca; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=uoguelph.ca; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=MCrgBCPgo6EwFLhs/Z5l+E2CzpUdhmpUWFFTvRGl+Ys=; b=Oh+cEwujwYr/R2lQmFp5pI4eIZoDFDOi4cA6moRcNmzY7LnjChabN+wmviT9uOqvufpkwlTe+LcGgwcXxFRP0cXh83SUFIlIhs3P7M8uJakbeA+5uge7VuAUr5Xtx4MnflFTG4IbJUkrKH7h9MtSaD3tLIwGGrCTM1M5m98eo3nl6qXQslGZu1IC6yIzrH9Ka77sVB0lzjoUWGxOWfeLiVqYsZDA23vqaJxujttZGyR2bNWKU0XTwP7qvHbDkFehJ7nHAb6P0HcR2IDhhPjbsGqylDP+akThRJ+zs3Tg8aJnk/qxriNs1fGwBjH3hF3/Yie7KpkTi3wEXmmr+erJYA== Received: from YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM (2603:10b6:c00:19::29) by QB1PR01MB3473.CANPRD01.PROD.OUTLOOK.COM (2603:10b6:c00:37::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3933.32; Wed, 17 Mar 2021 21:58:25 +0000 Received: from YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM ([fe80::6073:6fc0:5ddf:dc8a]) by YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM ([fe80::6073:6fc0:5ddf:dc8a%7]) with mapi id 15.20.3933.032; Wed, 17 Mar 2021 21:58:25 +0000 From: Rick Macklem To: Alan Somers CC: Jason Breitman , "freebsd-net@freebsd.org" Subject: Re: NFS Mount Hangs Thread-Topic: NFS Mount Hangs Thread-Index: AQHXG1G2D7AHBwtmAkS1jBAqNNo2I6qIqQ+SgAAOFICAAAENXg== Date: Wed, 17 Mar 2021 21:58:25 +0000 Message-ID: References: <3750001D-3F1C-4D9A-A9D9-98BCA6CA65A4@tildenparkcapital.com> <33693DE3-7FF8-4FAB-9A75-75576B88A566@tildenparkcapital.com> , In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 8f897760-8ef4-42e0-9b1a-08d8e98fca5f x-ms-traffictypediagnostic: QB1PR01MB3473: x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:10000; x-ms-exchange-senderadcheck: 1 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: EP084E3rq72/8Vqolcx/CLJYJEWdRpYwh0svitAUkNQxcZ9JxQtbFsa9OiEeD7UOERMb023xdR4JbnMLpj6jdHeR0XE6j2qBix8fLNRgCZ9aeV/IR57HKlic1aMTKDn0IxMXUyE71972MaAYPwT+H7g/l2kIAaE39htFj/xqQAd36/+afM7DlZ1yrxX8CBM/BYLdQuKqdjSAKBS9glMZtoH5owmg81ZRcshtoPkrwipVBrHiOdZJgCTZvj7H3UxJwUMy6u5eoSz9Qiblv7noLU3v0ya77bVuipLRzw5sFgpMuSmtOQAFq0dIjFWk9usWzrAnn8n6w3gcrJzpCLv4AZ8n1kkRxPr7pEAU+h1hBQKGaakPd9hee3Dp1LVOhI7FAipZKOD13gsKp14EbZuuJlQLAHWyrtZa9IfRI/b+ulmJ6ez/wmrGZa7fzBtdG20bQEhuPCAbUX8yxC2qGvkwsUGMlZxpHaFKlodkymD3MmAkb8dEGWITZiuSmd3oPrxgdYMmgKZBFnTILPnDIKW+xiea1ZIU7sb2tO61jghOQMSuym2tURRyL/Ez1rSRmp3wEIHTryp7lPOguU4aPp9k/Ln3GNgk9O8JSPOOzy0Jsm/hn1UrfnLmPCfFQAmvwnZEjN9YRaZEnSq/ZLUNh4DDIA== x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM; PTR:; CAT:NONE; SFS:(136003)(366004)(376002)(39860400002)(346002)(396003)(8936002)(5660300002)(52536014)(66946007)(9686003)(478600001)(64756008)(55016002)(7696005)(6916009)(966005)(8676002)(66476007)(6506007)(7116003)(4326008)(66446008)(3480700007)(91956017)(54906003)(786003)(83380400001)(2906002)(71200400001)(186003)(66556008)(86362001)(33656002)(76116006)(316002); DIR:OUT; SFP:1101; x-ms-exchange-antispam-messagedata: =?iso-8859-1?Q?UODmeGRcVdHXF64xjTzjBbwxvnyOe6lYNt3DJhqrLKbOCGnsjEq+dOAZP4?= =?iso-8859-1?Q?clUdAAaabL50SSKPBPNe7clZQehEZpgHRmu+PtgItb3H6/QckgAY670Z8S?= =?iso-8859-1?Q?VfKz0I1z1470JEhmwcu28qHn07pJjaaVz/yc9dsxj5yoot6ghcJa3uf78b?= =?iso-8859-1?Q?vKBF95F1cbYTFb72zejAMXrQr9wOGyB2fe8hvVQmLGAbclQDRnqgad57Cn?= =?iso-8859-1?Q?CW8VpeesS442+9PS6LbLEq+chYh2Kb2+vmgPFZ8KDqwhaZyn8ChVJm6cGI?= =?iso-8859-1?Q?XR8WYHaYYzSmv6qeClbOiTIhNP+l2i8VcWtQ6+NiTmq4THH8I6FUYDlF50?= =?iso-8859-1?Q?y34VJWqhapdaskzQw/6bhXClj+xGT0BpWMEImz4QAmkthFKwpytsmc7zrw?= =?iso-8859-1?Q?FULJZb3F08h4F3SlCKT5JMEGrHIJmJXKR/u7zhKXA9MvUio8V3VQ+aFOj5?= =?iso-8859-1?Q?pnFb0kZioOdPKTuofngQxkhQITo/YT6NUT5cZZleIPiZjVVfzWE/8GqR7/?= =?iso-8859-1?Q?ge/B+C+lFcBFVZ3MhFeFbMdr6g2t37h+DpP2WxhM6b065rG0H/1lqc2mI2?= =?iso-8859-1?Q?3FOGjgkJq30qZ6JYNB37mxjeo4GNZdEt0yPbfhL3HH3v3BIfOqz8mNg2s2?= =?iso-8859-1?Q?aLgse6coXOxRUmysaKxunWufZx9ASJU7b2e10SY8x0FhpqHHU3txKNh7ue?= =?iso-8859-1?Q?JANzuaQ/9gpbHl/ZdYA/rVLNudwEX1dCIzQdamJESxUDls6a0booGyTRd4?= =?iso-8859-1?Q?wjz6bwXURZgRChZa2wnTeSI4mZpBbUk/4U6RRB0uuwCy4V6AxU1OQoSaVA?= =?iso-8859-1?Q?ha+X11IiLwKNqKdqnY80JhZz+DfUjFLKGUXbKGbVu0dyhl4VMN+EoT3hYS?= =?iso-8859-1?Q?LnVRoAeIEosFSarFgg9Jf/+9oD96h/D48mm4bmHofZ0rFCxNeS6R1RCxyT?= =?iso-8859-1?Q?VbWCys11JA5a6YPaTtajI4ZL+B0lOs81OkAVpztiZgzPuMwyAoZcVqBUnA?= =?iso-8859-1?Q?nGw/2+qO4pEG6HI77iNB+s8YcnsrQ6B5dSJ5mqVFlc5ByOaiJRtQ95rVZE?= =?iso-8859-1?Q?135lRkYDnPZf82TB5gSXLjM14tQX+Tn7E1nNb+FP/+IBjEMfpROd6X5bp+?= =?iso-8859-1?Q?ONqNx2RZthORJwMMkqYcMNoNjVXcuQ7dh/k/mUBhs5BLcxrZSIqThmAoG3?= =?iso-8859-1?Q?Rj7g3Gwn+WoRzL9TIArwhmTjN2d8LPR2V71gAmTa7jMRxXrS6oP8ek9hjx?= =?iso-8859-1?Q?P4W98B052MdkHnEUTt7CNBT4Nhi1BT7TisqGEqQY9FJW8yP9K48FqkmcES?= =?iso-8859-1?Q?50pPn1X7KP7Yw2sLbwon0BE3ad3gJ2f2AlmCKEIW03i4YMPzbqwRcNZRPC?= =?iso-8859-1?Q?zUg67B8qN2d9lVv/3Ptj2m3qCU0Svx6LT0vXN+TeaaHk3iQ2lIdgI=3D?= x-ms-exchange-transport-forked: True Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: uoguelph.ca X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM X-MS-Exchange-CrossTenant-Network-Message-Id: 8f897760-8ef4-42e0-9b1a-08d8e98fca5f X-MS-Exchange-CrossTenant-originalarrivaltime: 17 Mar 2021 21:58:25.7463 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: be62a12b-2cad-49a1-a5fa-85f4f3156a7d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: uLFglPLZiWr0/8reO3RGz1rXnWXW6bwotuborZw1Db1BijzO9al2kto2pSeHkDfj+DC3YmMqYXPiKPUHo0X/yA== X-MS-Exchange-Transport-CrossTenantHeadersStamped: QB1PR01MB3473 X-Rspamd-Queue-Id: 4F13v81Rqqz3NvQ X-Spamd-Bar: ------ Authentication-Results: mx1.freebsd.org; dkim=pass header.d=uoguelph.ca header.s=selector1 header.b=Oh+cEwuj; arc=pass (microsoft.com:s=arcselector9901:i=1); dmarc=pass (policy=none) header.from=uoguelph.ca; spf=pass (mx1.freebsd.org: domain of rmacklem@uoguelph.ca designates 2a01:111:f400:fe5c::616 as permitted sender) smtp.mailfrom=rmacklem@uoguelph.ca X-Spamd-Result: default: False [-6.00 / 15.00]; TO_DN_EQ_ADDR_SOME(0.00)[]; RBL_DBL_DONT_QUERY_IPS(0.00)[2a01:111:f400:fe5c::616:from]; R_DKIM_ALLOW(-0.20)[uoguelph.ca:s=selector1]; FREEFALL_USER(0.00)[rmacklem]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+ip6:2a01:111:f400::/48]; MIME_GOOD(-0.10)[text/plain]; NEURAL_HAM_LONG(-1.00)[-1.000]; DWL_DNSWL_LOW(-1.00)[uoguelph.ca:dkim]; RCVD_COUNT_THREE(0.00)[3]; TO_MATCH_ENVRCPT_SOME(0.00)[]; SPAMHAUS_ZRD(0.00)[2a01:111:f400:fe5c::616:from:127.0.2.255]; DKIM_TRACE(0.00)[uoguelph.ca:+]; DMARC_POLICY_ALLOW(-0.50)[uoguelph.ca,none]; NEURAL_HAM_SHORT(-1.00)[-1.000]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; RCVD_TLS_LAST(0.00)[]; ASN(0.00)[asn:8075, ipnet:2a01:111:f000::/36, country:US]; ARC_ALLOW(-1.00)[microsoft.com:s=arcselector9901:i=1]; MAILMAN_DEST(0.00)[freebsd-net] X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Mar 2021 21:58:29 -0000 Alan Somers wrote:=0A= [stuff snipped]=0A= >Is the 128K limit related to MAXPHYS? If so, it should be greater in 13.0= .=0A= For the client, yes. For the server, no.=0A= For the server, it is just a compile time constant NFS_SRVMAXIO.=0A= =0A= It's mainly related to the fact that I haven't gotten around to testing lar= ger=0A= sizes yet.=0A= - kern.ipc.maxsockbuf needs to be several times the limit, which means it w= ould=0A= have to increase for 1Mbyte.=0A= - The session code must negotiate a maximum RPC size > 1 Mbyte.=0A= (I think the server code does do this, but it needs to be tested.)=0A= And, yes, the client is limited to MAXPHYS.=0A= =0A= Doing this is on my todo list, rick=0A= =0A= The client should acquire the attributes that indicate that and set rsize/w= size=0A= to that. "# nfsstat -m" on the client should show you what the client=0A= is actually using. If it is larger than 128K, set both rsize and wsize to 1= 28K.=0A= =0A= >Output from the NFS Client when the issue occurs=0A= ># netstat -an | grep NFS.Server.IP.X=0A= >tcp 0 0 NFS.Client.IP.X:46896 NFS.Server.IP.X:2049 = FIN_WAIT2=0A= I'm no TCP guy. Hopefully others might know why the client would be=0A= stuck in FIN_WAIT2 (I vaguely recall this means it is waiting for a fin/ack= ,=0A= but could be wrong?)=0A= =0A= ># cat /sys/kernel/debug/sunrpc/rpc_xprt/*/info=0A= >netid: tcp=0A= >addr: NFS.Server.IP.X=0A= >port: 2049=0A= >state: 0x51=0A= >=0A= >syslog=0A= >Mar 4 10:29:27 hostname kernel: [437414.131978] -pid- flgs status -client= - --rqstp- ->timeout ---ops--=0A= >Mar 4 10:29:27 hostname kernel: [437414.133158] 57419 40a1 0 9b723c7= 3 >143cfadf 30000 4ca953b5 nfsv4 OPEN_NOATTR a:call_connect_status [sunr= pc] >q:xprt_pending=0A= I don't know what OPEN_NOATTR means, but I assume it is some variant=0A= of NFSv4 Open operation.=0A= [stuff snipped]=0A= >Mar 4 10:29:30 hostname kernel: [437417.110517] RPC: 57419 xprt_connect_s= tatus: >connect attempt timed out=0A= >Mar 4 10:29:30 hostname kernel: [437417.112172] RPC: 57419 call_connect_s= tatus=0A= >(status -110)=0A= I have no idea what status -110 means?=0A= >Mar 4 10:29:30 hostname kernel: [437417.113337] RPC: 57419 call_timeout (= major)=0A= >Mar 4 10:29:30 hostname kernel: [437417.114385] RPC: 57419 call_bind (sta= tus 0)=0A= >Mar 4 10:29:30 hostname kernel: [437417.115402] RPC: 57419 call_connect x= prt >00000000e061831b is not connected=0A= >Mar 4 10:29:30 hostname kernel: [437417.116547] RPC: 57419 xprt_connect x= prt >00000000e061831b is not connected=0A= >Mar 4 10:30:31 hostname kernel: [437478.551090] RPC: 57419 xprt_connect_s= tatus: >connect attempt timed out=0A= >Mar 4 10:30:31 hostname kernel: [437478.552396] RPC: 57419 call_connect_s= tatus >(status -110)=0A= >Mar 4 10:30:31 hostname kernel: [437478.553417] RPC: 57419 call_timeout (= minor)=0A= >Mar 4 10:30:31 hostname kernel: [437478.554327] RPC: 57419 call_bind (sta= tus 0)=0A= >Mar 4 10:30:31 hostname kernel: [437478.555220] RPC: 57419 call_connect x= prt >00000000e061831b is not connected=0A= >Mar 4 10:30:31 hostname kernel: [437478.556254] RPC: 57419 xprt_connect x= prt >00000000e061831b is not connected=0A= Is it possible that the client is trying to (re)connect using the same clie= nt port#?=0A= I would normally expect the client to create a new TCP connection using a= =0A= different client port# and then retry the outstanding RPCs.=0A= --> Capturing packets when this happens would show us what is going on.=0A= =0A= If there is a problem on the FreeBSD end, it is most likely a broken=0A= network device driver.=0A= --> Try disabling TSO , LRO.=0A= --> Try a different driver for the net hardware on the server.=0A= --> Try a different net chip on the server.=0A= If you can capture packets when (not after) the hang=0A= occurs, then you can look at them in wireshark and see=0A= what is actually happening. (Ideally on both client and=0A= server, to check that your network hasn't dropped anything.)=0A= --> I know, if the hangs aren't easily reproducible, this isn't=0A= easily done.=0A= --> Try a newer Linux kernel and see if the problem persists.=0A= The Linux folk will get more interested if you can reproduce=0A= the problem on 5.12. (Recent bakeathon testing of the 5.12=0A= kernel against the FreeBSD server did not find any issues.)=0A= =0A= Hopefully the network folk have some insight w.r.t. why=0A= the TCP connection is sitting in FIN_WAIT2.=0A= =0A= rick=0A= =0A= =0A= =0A= Jason Breitman=0A= =0A= =0A= =0A= =0A= =0A= =0A= _______________________________________________=0A= freebsd-net@freebsd.org mailing list=0A= https://lists.freebsd.org/mailman/listinfo/freebsd-net=0A= To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"=0A= =0A= _______________________________________________=0A= freebsd-net@freebsd.org mailing list=0A= https://lists.freebsd.org/mailman/listinfo/freebsd-net=0A= To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"=0A=