Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 27 May 2022 20:59:57 +0000
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        Andreas Kempe <kempe@lysator.liu.se>, "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org>
Subject:   Re: FreeBSD 12.3/13.1 NFS client hang
Message-ID:  <YQBPR0101MB9742280313FC17543132A61CDDD89@YQBPR0101MB9742.CANPRD01.PROD.OUTLOOK.COM>
In-Reply-To: <YpEwxdGCouUUFHiE@shipon.lysator.liu.se>
References:  <YpEwxdGCouUUFHiE@shipon.lysator.liu.se>

next in thread | previous in thread | raw e-mail | index | archive | help
Andreas Kempe <kempe@lysator.liu.se> wrote:=0A=
> Hello everyone!=0A=
>=0A=
> I'm having issues with the NFS clients on FreeBSD 12.3 and 13.1=0A=
> systems hanging when using a CentOS 7 server.=0A=
First, make sure you are using hard mounts. "soft" or "intr" mounts won't=
=0A=
work and will mess up the session sooner or later. (A messed up session cou=
ld=0A=
result in no free slots on the session and that will wedge threads in=0A=
nfsv4_sequencelookup() as you describe.=0A=
(This is briefly described in the BUGS section of "man mount_nfs".)=0A=
=0A=
Do a:=0A=
# nfsstat -m=0A=
on the clients and look for "hard".=0A=
=0A=
Next, is there anything logged on the console for the 13.1 client(s)?=0A=
(13.1 has some diagnostics for things like a server replying with the=0A=
 wrong session slot#.)=0A=
=0A=
Also, maybe I'm old fashioned, but I find "ps axHl" useful, since it shows=
=0A=
where all the processes are sleeping.=0A=
And "procstat -kk" covers all of the locks.=0A=
=0A=
> Below are procstat kstack $PID invocations showing where the processes=0A=
> have hung. In the nfsv4_sequencelookup it seems hung waiting for=0A=
> nfsess_slots to have an available slot. In the second nfs_lock case,=0A=
> it seems the processes are stuck waiting on vnode locks.=0A=
> =0A=
> These issues seem to appear seemingly at random, but also if=0A=
> operations that open a lot of files or create a lot of file locks are=0A=
> used. An example that can often provoke a hang is performing a=0A=
> recursive grep through a large file hierarchy like the FreeBSD=0A=
> codebase.=0A=
>=0A=
> The NFS code is large and complicated so any advice is appriciated!=0A=
Yea. I'm the author and I don't know exactly what it all does;-)\=0A=
=0A=
> Cordially,=0A=
> Andreas Kempe=0A=
>=0A=
> Hang provoked when calling "grep -R SOME_STRING ." in the FreeBSD code ba=
se.=0A=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=0A=
> =0A=
>  PID    TID COMM                TDNAME              KSTACK=0A=
> 35585 101045 python3.8           -                   mi_switch sleepq_tim=
edwait _sleep nfsv4_sequencelookup nfsv4_setsequence nfscl_reqstart nfsrpc_=
lock nfsrpc_advlock nfs_advlock VOP_ADVLOCK_APV kern_fcntl kern_fcntl_freeb=
sd amd64_syscall fast_syscall_common=0A=
As you noted, this is waiting for a session slot to become available. Norma=
l, so long as other RPCs=0A=
are "in progress" that will release slots when replies are received. (If yo=
ur mount was not hard, sooner or=0A=
later, the client will give up waiting for the reply and the session slot w=
ill not be released. Once all slots=0A=
are "not released" you are hung.=0A=
=0A=
> 35585 101045 python3.8           -                   mi_switch sleepq_tim=
edwait _sleep nfsv4_sequencelookup nfsv4_setsequence nfscl_reqstart nfsrpc_=
lock nfsrpc_advlock nfs_advlock VOP_ADVLOCK_APV kern_fcntl kern_fcntl_freeb=
sd amd64_syscall fast_syscall_common=0A=
Same as above.=0A=
=0A=
> 44046 101189 vim                 -                   mi_switch sleepq_tim=
edwait _sleep nfsv4_sequencelookup nfsv4_setsequence nfscl_reqstart nfsrpc_=
accessrpc nfs34_access_otw nfs_access VOP_ACCESS_APV vn_dir_check_exec nfs_=
lookup VOP_LOOKUP_APV lookup namei kern_statat sys_fstatat amd64_syscall=0A=
Again.=0A=
> 44046 101764 vim                 -                   mi_switch sleepq_cat=
ch_signals sleepq_timedwait_sig _cv_timedwait_sig_sbt seltdwait kern_select=
 sys_select amd64_syscall fast_syscall_common=0A=
> 44046 101853 vim                 -                   mi_switch sleepq_cat=
ch_signals sleepq_wait_sig _sleep umtxq_sleep __umtx_op_sem2_wait sys__umtx=
_op amd64_syscall fast_syscall_common=0A=
> 44046 102164 vim                 -                   mi_switch sleepq_cat=
ch_signals sleepq_wait_sig _sleep umtxq_sleep __umtx_op_sem2_wait sys__umtx=
_op amd64_syscall fast_syscall_common=0A=
> 44046 102165 vim                 -                   mi_switch sleepq_cat=
ch_signals sleepq_wait_sig _sleep umtxq_sleep __umtx_op_sem2_wait sys__umtx=
_op amd64_syscall fast_syscall_common=0A=
> 44046 102457 vim                 -                   mi_switch sleepq_cat=
ch_signals sleepq_wait_sig _sleep umtxq_sleep __umtx_op_sem2_wait sys__umtx=
_op amd64_syscall fast_syscall_common=0A=
> 44046 102472 vim                 -                   mi_switch sleepq_cat=
ch_signals sleepq_wait_sig _sleep umtxq_sleep __umtx_op_sem2_wait sys__umtx=
_op amd64_syscall fast_syscall_common=0A=
I know nothing about umtx, so can't help here.=0A=
=0A=
> 44172 101824 tmux                -                   mi_switch sleepq_tim=
edwait _sleep nfsv4_sequencelookup nfsv4_setsequence nfscl_reqstart nfsrpc_=
accessrpc nfs34_access_otw nfs_access VOP_ACCESS_APV vn_dir_check_exec nfs_=
lookup VOP_LOOKUP_APV lookup namei kern_chdir amd64_syscall fast_syscall_co=
mmon=0A=
Another one waiting for a session slot.=0A=
=0A=
> Hang provoked randomly when trying to save an image in kolourpaint.=0A=
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=0A=
>=0A=
> PID    TID COMM                TDNAME              KSTACK=0A=
> 58062 159450 kolourpaint         -                   mi_switch sleeplk lo=
ckmgr_slock_hard nfs_lock vop_sigdefer _vn_lock vfs_cache_root vfs_root_sig=
defer lookup namei kern_statat sys_fstatat amd64_syscall fast_syscall_commo=
n=0A=
Yep, waiting for a vnode lock, I think.=0A=
=0A=
> 58062 176390 kolourpaint         -                   mi_switch sleepq_cat=
ch_signals sleepq_wait_sig _cv_wait_sig seltdwait kern_poll sys_poll amd64_=
syscall fast_syscall_common=0A=
> 58062 176678 kolourpaint         -                   mi_switch sleepq_cat=
ch_signals sleepq_wait_sig _sleep kqueue_kevent kern_kevent_fp kern_kevent_=
generic sys_kevent amd64_syscall fast_syscall_common=0A=
>=0A=
>  PID    TID COMM                TDNAME              KSTACK=0A=
> 34291 101005 fish                -                   mi_switch sleepq_wai=
t sleeplk lockmgr_slock_hard VOP_LOCK1_APV nfs_lock VOP_LOCK1_APV _vn_lock =
vget vfs_hash_get ncl_nget nfs_root lookup namei vn_open_cred kern_openat a=
md64_syscall fast_syscall_common=0A=
Also waiting for a vnode.=0A=
=0A=
> 34291 102492 fish                -                   mi_switch sleepq_wai=
t sleeplk lockmgr_slock_hard VOP_LOCK1_APV nfs_lock VOP_LOCK1_APV _vn_lock =
vget vfs_hash_get ncl_nget nfs_root lookup namei kern_accessat amd64_syscal=
l fast_syscall_common=0A=
Again.=0A=
> 34291 102493 fish                -                   mi_switch sleepq_wai=
t sleeplk lockmgr_slock_hard VOP_LOCK1_APV nfs_lock VOP_LOCK1_APV _vn_lock =
vget vfs_hash_get ncl_nget nfs_root lookup namei kern_accessat amd64_syscal=
l fast_syscall_common=0A=
And again.=0A=
=0A=
>  PID    TID COMM                TDNAME              KSTACK=0A=
>  204 100923 autounmountd        -                   mi_switch sleepq_wait=
 sleeplk lockmgr_xlock_hard VOP_LOCK1_APV nfs_lock VOP_LOCK1_APV _vn_lock v=
get vfs_hash_get ncl_nget nfs_statfs __vfs_statfs kern_getfsstat sys_getfss=
tat amd64_syscall fast_syscall_common=0A=
And again.=0A=
=0A=
Not very useful unless you have all the processes and their locks to try an=
d figure out what is holding=0A=
the vnode locks.=0A=
=0A=
rick=0A=
=0A=
=0A=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?YQBPR0101MB9742280313FC17543132A61CDDD89>