From owner-freebsd-fs@freebsd.org Sat Aug 13 05:22:10 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A547ABB8B38 for ; Sat, 13 Aug 2016 05:22:10 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from na01-bl2-obe.outbound.protection.outlook.com (mail-bl2on0087.outbound.protection.outlook.com [65.55.169.87]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (Client CN "mail.protection.outlook.com", Issuer "Microsoft IT SSL SHA2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 3529810E9 for ; Sat, 13 Aug 2016 05:22:09 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from YQBPR01MB0401.CANPRD01.PROD.OUTLOOK.COM (10.169.142.147) by YQBPR01MB0401.CANPRD01.PROD.OUTLOOK.COM (10.169.142.147) with Microsoft SMTP Server (version=TLS1_0, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA_P384) id 15.1.557.21; Fri, 12 Aug 2016 21:49:45 +0000 Received: from YQBPR01MB0401.CANPRD01.PROD.OUTLOOK.COM ([10.169.142.147]) by YQBPR01MB0401.CANPRD01.PROD.OUTLOOK.COM ([10.169.142.147]) with mapi id 15.01.0557.021; Fri, 12 Aug 2016 21:49:45 +0000 From: Rick Macklem To: Marc Goroff , "freebsd-fs@freebsd.org" Subject: Re: Hanging/stalling mountd on heavily loaded NFS server Thread-Topic: Hanging/stalling mountd on heavily loaded NFS server Thread-Index: AQHR6FuOZIb3G4VVH0qiD8yOyqrjbqAtDjqZgBeDtICAAWMHuQ== Date: Fri, 12 Aug 2016 21:49:45 +0000 Message-ID: References: <98b4db11-8b41-608c-c714-f704a78914b7@quorum.net> , In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=rmacklem@uoguelph.ca; x-originating-ip: [24.57.164.61] x-ms-office365-filtering-correlation-id: 90cdfe08-0d69-4a4c-5093-08d3c2fa9343 x-microsoft-exchange-diagnostics: 1; YQBPR01MB0401; 6:IRWFuy485KjyZ2D1E2dJRyzvkeoBJ2fe1nY2D9z5h3nkMdTBNkzcpugGmH5nWDf2JjG11DMzTYIpHJMnyKLSukuANUOzRcWHOZpj0fTeIe3EaLTHfC2Nk+e726r/kfunwzqqZcVUyTxDb7DB8R4MEWe8UwYsZJ2vhvIHT9vj7p1FyXDeQOgLe33FDEgCjvJifMRh/g40tvcZjwRQXLbtuCKQ9eQ2lzpeVRAISopCWTbzRbBOkdxNnRMZAkYj+MfUJsiKrt7sNbZQNoXSPn8QFuB1ZXWGJ6I8poCQ9sYuH3vIuohnK0X4yrDdnsDk0I3O; 5:vgB6t8xNNWGBZBNk9NU7Nymnvemo1bUx2VpV2vc0v2rdU+HxhlnC/kj9u3W9WeF6fERp8XBy9nYLLNwoCXkYEsoWnDSog7+qzzCiM5fC0zf1jK8DOikVgFCEdwWpWT3SuA/Es7l+AbTAlXqSqHFGNQ==; 24:7xj+N4JVvKVooUVWfx1pppdBod1s+XQjqT1Hk6bMDGqMkcoA9wJS/yAu612mw1LKbavTydx4+V/6yTxfof4uOdsQAPZiihKrO54sw+tpkPc=; 7:2UW43xN8dUzSE01+oPI3cXfkU6/5onSJjo4WYtpxXtEAZRUk9xMI0xgTEy1z+M0xMda3TuVRZV5VqzbT0u1rySrB0EvtL9Jkl6rfGGqGtHSiWCygsJ9lhpZX12jjlgk1AGukUcWo/jcIyAXMdP5wBasm5Wvun4trbKe9HUJkvS0g6kMr2QJaixzK6T+F9Tv7BMpWB/qnN9MUFGxQuj7IEu6YQVdKznuHwiWDzuhjM4vPbYiJ9zId43p8DS9VUfd0 x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:YQBPR01MB0401; x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(56005881305849)(158342451672863)(192374486261705)(75325880899374); x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(6040176)(2401047)(8121501046)(5005006)(3002001)(10201501046)(6043046)(6042046); SRVR:YQBPR01MB0401; BCL:0; PCL:0; RULEID:; SRVR:YQBPR01MB0401; x-forefront-prvs: 003245E729 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(7916002)(24454002)(54094003)(189002)(199003)(377454003)(19627405001)(19617315012)(86362001)(9686002)(8936002)(2906002)(92566002)(74482002)(5002640100001)(2501003)(7846002)(68736007)(87936001)(19580395003)(19580405001)(106116001)(7736002)(7696003)(122556002)(105586002)(6116002)(3846002)(102836003)(586003)(16236675004)(50986999)(66066001)(76176999)(3280700002)(11100500001)(97736004)(54356999)(15975445007)(77096005)(2950100001)(2900100001)(7906003)(74316002)(8676002)(10400500002)(101416001)(19625215002)(189998001)(33656002)(81166006)(3660700001)(81156014)(5001770100001)(107886002)(106356001); DIR:OUT; SFP:1101; SCL:1; SRVR:YQBPR01MB0401; H:YQBPR01MB0401.CANPRD01.PROD.OUTLOOK.COM; FPR:; SPF:None; PTR:InfoNoRecords; MX:1; A:1; LANG:en; received-spf: None (protection.outlook.com: uoguelph.ca does not designate permitted sender hosts) spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM MIME-Version: 1.0 X-OriginatorOrg: uoguelph.ca X-MS-Exchange-CrossTenant-originalarrivaltime: 12 Aug 2016 21:49:45.5765 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: be62a12b-2cad-49a1-a5fa-85f4f3156a7d X-MS-Exchange-Transport-CrossTenantHeadersStamped: YQBPR01MB0401 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.22 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 13 Aug 2016 05:22:10 -0000 Marc Goroff wrote: >Just to followup on this issue, the patch referenced below seems to have f= ixed the >problem. > I wonder if this patch should be made a 10.3 update? (At one time, it was o= nly fixes for security issues that became errata fixes, but that has changed. I= 'm not sure what it takes for a patch to qualify?) It may not affect a lot of people, but it is a simple self contained patch. Is anyone reading this familiar with the current decision "rules" for errat= a? Thanks for testing it, rick Thanks! Marc On 7/27/16 6:41 PM, Rick Macklem wrote: Marc Goroff wrote: > From: owner-freebsd-fs@freebsd.org <= owner-freebsd-fs@freebsd.org> on behal= f of Marc Goroff > Sent: Wednesday, July 27, 2016 7:04 PM > To: freebsd-fs@freebsd.org > Subject: Hanging/stalling mountd on heavily loaded NFS server > > We have a large and busy production NFS server running 10.2 that is > serving approximately 200 ZFS file systems to production VMs. The system > has been very stable up until last night when we attempted to mount new > ZFS filesystems on NFS clients. The mountd process hung and client mount > requests timed out. The NFS server continued to serve traffic to > existing clients during this time. The mountd was hung in state nfsv4lck: > > [root@zfs-west1 ~]# ps -axgl|grep mount 0 38043 1 0 20 0 63672 17644 nfsv4lck Ds - 0:00.30 /usr/sbin/mountd -r -S /etc/exports /etc/zfs/exports > > It remains in this state for an indeterminate amount of time. I once saw > it continue on after several minutes, but most of the time it seems to > stay in this state for 15+ minutes. During this time, it does not > respond to kill -9 but it will eventually exit after many minutes. > Restarting mountd will allow the existing NFS clients to continue (they > hang when mountd exits), but any attempt to perform additional NFS > mounts will push mountd back into the bad state. > > This problem seems to be related to the number of NFS mounts off the > server. If we unmount some of the clients, we can successfully perform > the NFS mounts of the new ZFS filesystems. However, when we attempt to > mount all of the production NFS mounts, mountd will hang as above. > Stuff snipped for brevity... > > Any suggestion on how to resolve this issue? Since this is a production > server, my options for intrusive debugging are very limited. > I think you should try the patch that is r300254 in stable/10. It is a simp= le patch you can apply to your kernel without other changes. http://svnweb.freebsd.org/base/stable/10/sys/fs/nfsserver/nfs_nfsdkrpc.c?r1= =3D291869&r2=3D300254 It reverses the lock acquisition priority so that mountd doesn't wait until= the nfsd threads are idle before updating exports. rick > Thanks. > > Marc > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"