Date: Sun, 13 Dec 2020 03:09:10 -0500 From: J David <j.david.lists@gmail.com> To: Konstantin Belousov <kostikbel@gmail.com> Cc: Rick Macklem <rmacklem@uoguelph.ca>, "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org> Subject: Re: Major issues with nfsv4 Message-ID: <CABXB=RRNnW9nNhFCJS1evNUTEX9LNnzyf2gOmZHHGkzAoQxbPw@mail.gmail.com> In-Reply-To: <X9UDArKjUqJVS035@kib.kiev.ua> References: <CABXB=RRB2nUk0pPDisBQPdicUA3ooHpg8QvBwjG_nFU4cHvCYw@mail.gmail.com> <YQXPR0101MB096849ADF24051F7479E565CDDCA0@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <CABXB=RSyN%2Bo2yXcpmYw8sCSUUDhN-w28Vu9v_cCWa-2=pLZmHg@mail.gmail.com> <YQXPR0101MB09680D155B6D685442B5E25EDDCA0@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <CABXB=RSSE=yOwgOXsnbEYPqiWk5K5NfzLY=D%2BN9mXdVn%2B--qLQ@mail.gmail.com> <YQXPR0101MB0968B17010B3B36C8C41FDE1DDC90@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <X9Q9GAhNHbXGbKy7@kib.kiev.ua> <YQXPR0101MB0968C7629D57CA21319E50C2DDC90@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <X9UDArKjUqJVS035@kib.kiev.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
The output of "procstat -kk -a" is extremely large. So I found what looked like four processes that were having problems, ran procstat -kk in a loop for all four for a couple of minutes, appending the output to a file, then did sort | uniq on the file to find common traces. 353 samples were collected. 11 "<running>" samples had no kernel call stack and were discarded. Of the 342 stack samples, here's a frequency analysis of the entries present in the trace: 451 VOP_LOOKUP_APV 310 kern_statat 309 fast_syscall_common 303 amd64_syscall 234 namei 233 lookup 226 null_lookup 225 freebsd11_stat 220 nfs_lookup 189 __mtx_lock_sleep 156 nfscl_nodeleg 155 mi_switch 136 vputx 134 VOP_INACTIVE_APV 134 vinactive 83 sys_fstatat 68 nfsrpc_close 68 ncl_inactive 67 turnstile_wait 66 vrecycle 66 VOP_RECLAIM_APV 66 vgonel 66 null_reclaim 66 null_inactive 55 VOP_GETATTR_APV 55 nfscl_request 55 newnfs_request 55 clnt_reconnect_call 54 clnt_vc_call 53 sleepq_timedwait 53 _sleep 41 nfsrpc_lookup 30 nfs_getattr 26 nfscl_mustflush 26 ncl_getattrcache 25 null_bypass 24 vn_stat 24 null_getattr 19 ast 14 VOP_LOCK1_APV 13 VOP_ACCESS_APV 13 vn_dir_check_exec 13 nfsrpc_accessrpc 13 nfs34_access_otw 13 nfs_access 13 critical_exit_preempt 12 Xipi_intr_bitmap_handler 12 ipi_bitmap_handler 12 doreti_ast 11 thread_lock_flags_ 9 sigdeferstop_impl 8 nfscl_doclose 6 _vn_lock 5 null_lock 3 sleepq_wait 3 sleeplk 3 nfscl_deleggetmodtime 3 nfs_lock 2 VOP_READLINK_APV 2 VOP_ISLOCKED_APV 2 sigallowstop_impl 2 nfscl_getcl 2 lockmgr_xlock_hard 1 Xtimerint 1 vtnet_txq_mq_start_locked 1 vtnet_txq_mq_start 1 vget 1 tcp_usr_send 1 tcp_output 1 sosend_generic 1 sosend 1 null_nodeget 1 nfsrpc_getattr 1 nfscl_clientrelease 1 ncl_bioread 1 lockmgr_slock_hard 1 ip_output 1 ether_output_frame 1 ether_output 1 cache_lookup (VOP_LOOKUP_APV appears twice on many lines.) The most common trace (91 times, or over 1/4 of all observed) is: __mtx_lock_sleep+0xf8 nfscl_nodeleg+0x207 nfs_lookup+0x314 VOP_LOOKUP_APV+0x75 null_lookup+0x98 VOP_LOOKUP_APV+0x75 lookup+0x451 namei+0x414 kern_statat+0x72 freebsd11_stat+0x30 amd64_syscall+0x387 fast_syscall_common+0xf8 This trace appears roughly uniformly (28/38/34) in 3 of the 4 processes. The full set of traces (sorted by uniq -c) is here: https://pastebin.com/HUqkeMri (This message is already long enough!) The characters stripped off the front of each line are consistently: (pid) (uid) python3 - It happens these were all Python-based jobs. Python seems predisposed to trigger this, but non-Python jobs trigger it as well. Heavy use of stat() does seem to be a common element regardless of job type. Here's the output of "nfsstat -E -c" 60 seconds after running it with -z: Client Info: RPC Counts: Getattr Setattr Lookup Readlink Read Write 1667 111 6376 0 42 153 Create Remove Rename Link Symlink Mkdir 111 7 14 0 0 0 Rmdir Readdir RdirPlus Access Mknod Fsstat 0 0 0 2620 0 160 FSinfo pathConf Commit SetClId SetClIdCf Lock 0 0 113 0 0 48 LockT LockU Open OpenCfr 0 48 320 0 OpenDownGr Close 0 402 RelLckOwn FreeStateID PutRootFH DelegRet GetAcl SetAcl 0 3 0 0 0 0 ExchangeId CreateSess DestroySess DestroyClId LayoutGet GetDevInfo 0 0 0 0 0 0 LayoutCommit LayoutReturn ReclaimCompl ReadDataS WriteDataS CommitDataS 0 0 0 0 0 0 OpenLayout CreateLayout 0 0 OpenOwner Opens LockOwner Locks Delegs LocalOwn 21175 130439 30 6 0 0 LocalOpen LocalLown LocalLock 0 0 0 Rpc Info: TimedOut Invalid X Replies Retries Requests 0 0 0 0 12247 Cache Info: Attr Hits Attr Misses Lkup Hits Lkup Misses 1110054 858 1002829 6361 BioR Hits BioR Misses BioW Hits BioW Misses 2000 54 292 153 BioRL Hits BioRL Misses BioD Hits BioD Misses 6911 0 208 0 DirE Hits DirE Misses 104 0 This does reflect the whole machine, not just those four processes. Finally, when I attempted to kill those four processes with ktrace running on all of them, the system panicked: Fatal trap 12: page fault while in kernel mode cpuid = 2; apic id = 02 fault virtual address = 0x10 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff8272d28f stack pointer = 0x28:0xfffffe008a5f24c0 frame pointer = 0x28:0xfffffe008a5f25f0 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 5627 (node) trap number = 12 panic: page fault cpuid = 2 time = 1607845622 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe008a5f2180 vpanic() at vpanic+0x17b/frame 0xfffffe008a5f21d0 panic() at panic+0x43/frame 0xfffffe008a5f2230 trap_fatal() at trap_fatal+0x391/frame 0xfffffe008a5f2290 trap_pfault() at trap_pfault+0x4f/frame 0xfffffe008a5f22e0 trap() at trap+0x286/frame 0xfffffe008a5f23f0 calltrap() at calltrap+0x8/frame 0xfffffe008a5f23f0 --- trap 0xc, rip = 0xffffffff8272d28f, rsp = 0xfffffe008a5f24c0, rbp = 0xfffffe008a5f25f0 --- null_bypass() at null_bypass+0xaf/frame 0xfffffe008a5f25f0 VOP_ADVLOCK_APV() at VOP_ADVLOCK_APV+0x80/frame 0xfffffe008a5f2620 closef() at closef+0x8f/frame 0xfffffe008a5f26b0 fdescfree_fds() at fdescfree_fds+0x3c/frame 0xfffffe008a5f2700 fdescfree() at fdescfree+0x466/frame 0xfffffe008a5f27c0 exit1() at exit1+0x488/frame 0xfffffe008a5f2820 sigexit() at sigexit+0x159/frame 0xfffffe008a5f2b00 postsig() at postsig+0x2fd/frame 0xfffffe008a5f2bc0 ast() at ast+0x317/frame 0xfffffe008a5f2bf0 doreti_ast() at doreti_ast+0x1f/frame 0x7fffffff67a0 The fsck ate the ktrace.out file on reboot. This system does not have kernel debug symbols. If there's a way to try to figure out what null_bypass+0xaf corresponds to using the machine that built this kernel, which has the kernel, objects, debug symbols, and source, please let me know. It's built from releng/12.2 r368515. Thanks!
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CABXB=RRNnW9nNhFCJS1evNUTEX9LNnzyf2gOmZHHGkzAoQxbPw>