Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 12 Aug 2016 21:49:45 +0000
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        Marc Goroff <marc.goroff@quorum.net>, "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org>
Subject:   Re: Hanging/stalling mountd on heavily loaded NFS server
Message-ID:  <YQBPR01MB040149A4068D9B927AD4BB6BDD1F0@YQBPR01MB0401.CANPRD01.PROD.OUTLOOK.COM>
In-Reply-To: <bd618cab-f46b-e4d6-cd9a-ea18b19e39b2@quorum.net>
References:  <98b4db11-8b41-608c-c714-f704a78914b7@quorum.net> <YTXPR01MB04952015505F90E89CEEDA0EDD000@YTXPR01MB0495.CANPRD01.PROD.OUTLOOK.COM>, <bd618cab-f46b-e4d6-cd9a-ea18b19e39b2@quorum.net>

next in thread | previous in thread | raw e-mail | index | archive | help
Marc Goroff wrote:

>Just to followup on this issue, the patch referenced below seems to have f=
ixed the >problem.

>

I wonder if this patch should be made a 10.3 update? (At one time, it was o=
nly

fixes for security issues that became errata fixes, but that has changed. I=
'm not

sure what it takes for a patch to qualify?)


It may not affect a lot of people, but it is a simple self contained patch.


Is anyone reading this familiar with the current decision "rules" for errat=
a?


Thanks for testing it, rick

Thanks!


Marc

On 7/27/16 6:41 PM, Rick Macklem wrote:

Marc Goroff wrote:

> From: owner-freebsd-fs@freebsd.org<mailto:owner-freebsd-fs@freebsd.org> <=
owner-freebsd-fs@freebsd.org><mailto:owner-freebsd-fs@freebsd.org> on behal=
f of Marc Goroff <marc.goroff@quorum.net<mailto:marc.goroff@quorum.net>

> Sent: Wednesday, July 27, 2016 7:04 PM
> To: freebsd-fs@freebsd.org<mailto:freebsd-fs@freebsd.org>
> Subject: Hanging/stalling mountd on heavily loaded NFS server
>
> We have a large and busy production NFS server running 10.2 that is
> serving approximately 200 ZFS file systems to production VMs. The system
> has been very stable up until last night when we attempted to mount new
> ZFS filesystems on NFS clients. The mountd process hung and client mount
> requests timed out. The NFS server continued to serve traffic to
> existing clients during this time. The mountd was hung in state nfsv4lck:
>
> [root@zfs-west1 ~]# ps -axgl|grep mount
   0 38043     1   0  20  0  63672 17644 nfsv4lck Ds    - 0:00.30
/usr/sbin/mountd -r -S /etc/exports /etc/zfs/exports
>
> It remains in this state for an indeterminate amount of time. I once saw
> it continue on after several minutes, but most of the time it seems to
> stay in this state for 15+ minutes. During this time, it does not
> respond to kill -9 but it will eventually exit after many minutes.
> Restarting mountd will allow the existing NFS clients to continue (they
> hang when mountd exits), but any attempt to perform additional NFS
> mounts will push mountd back into the bad state.
>
> This problem seems to be related to the number of NFS mounts off the
> server. If we unmount some of the clients, we can successfully perform
> the NFS mounts of the new ZFS filesystems. However, when we attempt to
> mount all of the production NFS mounts, mountd will hang as above.
>
Stuff snipped for brevity...
>
> Any suggestion on how to resolve this issue? Since this is a production
> server, my options for intrusive debugging are very limited.
>
I think you should try the patch that is r300254 in stable/10. It is a simp=
le
patch you can apply to your kernel without other changes.

http://svnweb.freebsd.org/base/stable/10/sys/fs/nfsserver/nfs_nfsdkrpc.c?r1=
=3D291869&r2=3D300254

It reverses the lock acquisition priority so that mountd doesn't wait until=
 the
nfsd threads are idle before updating exports.

rick

> Thanks.
>
> Marc
>
> _______________________________________________
> freebsd-fs@freebsd.org<mailto:freebsd-fs@freebsd.org> mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-fs<https://lists.freeb=
sd.org/mailman/listinfo/freebsd-fs>
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"<mai=
lto:freebsd-fs-unsubscribe@freebsd.org>




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?YQBPR01MB040149A4068D9B927AD4BB6BDD1F0>