Date: Sat, 28 May 2022 21:27:45 +0000 From: Rick Macklem <rmacklem@uoguelph.ca> To: Kurt Jaeger <pi@freebsd.org> Cc: "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org> Subject: Re: FreeBSD 12.3/13.1 NFS client hang Message-ID: <YQBPR0101MB9742D1CC394CB37C135A94E3DDDB9@YQBPR0101MB9742.CANPRD01.PROD.OUTLOOK.COM> In-Reply-To: <YQBPR0101MB9742B91118878E58691DB94CDDDB9@YQBPR0101MB9742.CANPRD01.PROD.OUTLOOK.COM> References: <YpEwxdGCouUUFHiE@shipon.lysator.liu.se> <YQBPR0101MB9742280313FC17543132A61CDDD89@YQBPR0101MB9742.CANPRD01.PROD.OUTLOOK.COM> <YpHZb0fgsmbBrxD8@fc.opsec.eu> <YQBPR0101MB9742B91118878E58691DB94CDDDB9@YQBPR0101MB9742.CANPRD01.PROD.OUTLOOK.COM>
next in thread | previous in thread | raw e-mail | index | archive | help
Rick Macklem <rmacklem@uoguelph.ca> wrote:=0A= > Kurt Jaeger <pi@freebsd.org> wrote:=0A= > > > > I'm having issues with the NFS clients on FreeBSD 12.3 and 13.1=0A= > >=0A= > > I have it with an 13.0p7 client against an 13.1 server with=0A= > > a hanging soft-mount (I tried unmount to change it to a hard mount).=0A= > >=0A= > > 61585 93- D+ 0:00.00 umount /office/serv=0A= > > 61635 133 D 0:00.00 umount -f /office/serv=0A= > > 7784 138 D 0:00.00 umount -N /office/serv=0A= > The first umount must be "-N". Once you've hung a non "-N" umount,=0A= > rebooting is the only option.=0A= > (I have thought of doing a "umount -N -A" (for all NFS mounts), which=0A= > would allow it to kill off all NFS activity without even finding the path= name=0A= > for the mountpoint, but I have not done so.)=0A= I take this back. I just did a fairly trivial test of this and it worked.= =0A= Looking at the "ps" output, I don't think your case is a "NFS protocol hang= ".=0A= When I look at the "ps" output, there are no threads waiting on NFS RPCs to= complete.=0A= (umount -N kills off outstanding RPCs, so the VFS/VOP ops can complete with= error, which should=0A= dismount a hang caused by an unresponsive NFS server or similar.)=0A= =0A= The only threads sleeping in the nfs code are waiting for an NFS vnode lock= .=0A= I suspect that some process/thread is hung for something non-NFS while hold= ing a lock=0A= on a NFS vnode. "umount -N" won't know how to unhang this process/thread.= =0A= Just a hunch, but I'd suspect one of the threads sleeping on "vmopar", alth= ough I'm=0A= not a vm guy.=0A= What I don't know how to do is figure out what thread(s) are holding vnode = locks?=0A= =0A= This also implies that switching from soft->hard won't fix the problem.=0A= =0A= It would be nice if "umount -N" could handle this case. I'll look at the VF= S code and=0A= maybe talk to kib@ to see if there is a way to mark all NFS vnodes "dead" s= o that=0A= vn_lock() will either return an error or a locked bit VI_DOOMED vnode (if L= K_RETRY is=0A= specified).=0A= =0A= In summary, I don't think your hang is anything like Andreas's, rick=0A= =0A= > and procstat:=0A= >=0A= > # procstat -kk 7784=0A= > PID TID COMM TDNAME KSTACK=0A= > 7784 107226 umount - mi_switch+0xc1 sleepl= k+0xec lockmgr_xlock_hard+0x345 _vn_lock+0x48 vget_finish+0x21 cache_lookup= +0x299 vfs_cache_lookup+0x7b lookup+0x68c namei+0x487 kern_unmount+0x164 am= d64_syscall+0x10c fast_syscall_common+0xf8=0A= > # procstat -kk 61635=0A= > PID TID COMM TDNAME KSTACK=0A= > 61635 775458 umount - mi_switch+0xc1 sleep= lk+0xec lockmgr_slock_hard+0x382 _vn_lock+0x48 vget_finish+0x21 cache_looku= p+0x299 vfs_cache_lookup+0x7b lookup+0x68c namei+0x487 sys_statfs+0xc3 amd6= 4_syscall+0x10c fast_syscall_common+0xf8=0A= > # procstat -kk 61585=0A= > PID TID COMM TDNAME KSTACK=0A= > 61585 516164 umount - mi_switch+0xc1 sleep= lk+0xec lockmgr_xlock_hard+0x345 nfs_lock+0x2c vop_sigdefer+0x2b _vn_lock+0= x48 vflush+0x151 nfs_unmount+0xc3 vfs_unmount_sigdefer+0x2e dounmount+0x437= kern_unmount+0x332 amd64_syscall+0x10c fast_syscall_common+0xf8=0A= These just show that they are waiting for NFS vnodes. In the "ps" there are= =0A= threads waiting on zfs vnodes as well.=0A= =0A= > ps-axHl can be found at=0A= >=0A= > https://people.freebsd.org/~pi/logs/ps-axHl.txt=0A= I suspect your problem might be related to wired pages. Note that=0A= several threads are sleeping on "vmopar". I'm no vm guy, but I=0A= think that might mean too many pages have become wired.=0A= =0A= rick=0A= =0A= > > systems hanging when using a CentOS 7 server.=0A= > First, make sure you are using hard mounts. "soft" or "intr" mounts won't= =0A= > work and will mess up the session sooner or later. (A messed up session c= ould=0A= > result in no free slots on the session and that will wedge threads in=0A= > nfsv4_sequencelookup() as you describe.=0A= > (This is briefly described in the BUGS section of "man mount_nfs".)=0A= >=0A= > Do a:=0A= > # nfsstat -m=0A= > on the clients and look for "hard".=0A= =0A= No output at all for that 8-(=0A= =0A= --=0A= pi@FreeBSD.org +49 171 3101372 Now what ?=0A= =0A= =0A=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?YQBPR0101MB9742D1CC394CB37C135A94E3DDDB9>