Date: Mon, 14 Dec 2020 00:24:14 +0000 From: Rick Macklem <rmacklem@uoguelph.ca> To: J David <j.david.lists@gmail.com>, Konstantin Belousov <kostikbel@gmail.com> Cc: "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org> Subject: Re: Major issues with nfsv4 Message-ID: <YQXPR0101MB096834DB16439960ADAFCE29DDC70@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> In-Reply-To: <CABXB=RRNnW9nNhFCJS1evNUTEX9LNnzyf2gOmZHHGkzAoQxbPw@mail.gmail.com> References: <CABXB=RRB2nUk0pPDisBQPdicUA3ooHpg8QvBwjG_nFU4cHvCYw@mail.gmail.com> <YQXPR0101MB096849ADF24051F7479E565CDDCA0@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <CABXB=RSyN%2Bo2yXcpmYw8sCSUUDhN-w28Vu9v_cCWa-2=pLZmHg@mail.gmail.com> <YQXPR0101MB09680D155B6D685442B5E25EDDCA0@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <CABXB=RSSE=yOwgOXsnbEYPqiWk5K5NfzLY=D%2BN9mXdVn%2B--qLQ@mail.gmail.com> <YQXPR0101MB0968B17010B3B36C8C41FDE1DDC90@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <X9Q9GAhNHbXGbKy7@kib.kiev.ua> <YQXPR0101MB0968C7629D57CA21319E50C2DDC90@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <X9UDArKjUqJVS035@kib.kiev.ua>, <CABXB=RRNnW9nNhFCJS1evNUTEX9LNnzyf2gOmZHHGkzAoQxbPw@mail.gmail.com>
index | next in thread | previous in thread | raw e-mail
[-- Attachment #1 --] J David wrote: [stuff snipped] >The most common trace (91 times, or over 1/4 of all observed) is: > >__mtx_lock_sleep+0xf8 nfscl_nodeleg+0x207 nfs_lookup+0x314 >VOP_LOOKUP_APV+0x75 null_lookup+0x98 VOP_LOOKUP_APV+0x75 >lookup+0x451 >namei+0x414 kern_statat+0x72 freebsd11_stat+0x30 amd64_syscall+0x387 >fast_syscall_common+0xf8 > >This trace appears roughly uniformly (28/38/34) in 3 of the 4 processes. nfscl_nodeleg() plus two others acquire the mutex to check for a delegation for the file. Servers should only issue delegations when the callback path is working and some never issue delegations. So, if you are not running the nfscbd(8) daemon (there is no need unless you are using delegations or pNFS) a trivial check for the callback path being enabled should determine that no delegation exists if it is not enabled. This avoids acquiring the mutex for this common case. --> You have no "Delegs" according to your "nfsstat -c -E". So, if you set up a test system with the attached patch applied to the kernel and you do not run nfscbd(8) (ie no nfscbd_enable="YES" in your /etc/rc.conf) this patch might help. --> It avoids acquiring the mutex for lookups and getattrs (stat calls). The only reason I haven't done this before is a concern w.r.t. a broken NFSv4 server that erroneously issues delegations when the callback path is not working. (I don't know of such a server and, since you do not have delegations, it should not be an issue for your case.) Might be enough to alleviate your problem. I will work on changing the single open linked list to a hash table of linked lists, but that is probably weeks away from a test patch. Let me know if you can try the patch and if it helps, rick The full set of traces (sorted by uniq -c) is here: https://pastebin.com/HUqkeMri (This message is already long enough!) The characters stripped off the front of each line are consistently: (pid) (uid) python3 - It happens these were all Python-based jobs. Python seems predisposed to trigger this, but non-Python jobs trigger it as well. Heavy use of stat() does seem to be a common element regardless of job type. Here's the output of "nfsstat -E -c" 60 seconds after running it with -z: Client Info: RPC Counts: Getattr Setattr Lookup Readlink Read Write 1667 111 6376 0 42 153 Create Remove Rename Link Symlink Mkdir 111 7 14 0 0 0 Rmdir Readdir RdirPlus Access Mknod Fsstat 0 0 0 2620 0 160 FSinfo pathConf Commit SetClId SetClIdCf Lock 0 0 113 0 0 48 LockT LockU Open OpenCfr 0 48 320 0 OpenDownGr Close 0 402 RelLckOwn FreeStateID PutRootFH DelegRet GetAcl SetAcl 0 3 0 0 0 0 ExchangeId CreateSess DestroySess DestroyClId LayoutGet GetDevInfo 0 0 0 0 0 0 LayoutCommit LayoutReturn ReclaimCompl ReadDataS WriteDataS CommitDataS 0 0 0 0 0 0 OpenLayout CreateLayout 0 0 OpenOwner Opens LockOwner Locks Delegs LocalOwn 21175 130439 30 6 0 0 LocalOpen LocalLown LocalLock 0 0 0 Rpc Info: TimedOut Invalid X Replies Retries Requests 0 0 0 0 12247 Cache Info: Attr Hits Attr Misses Lkup Hits Lkup Misses 1110054 858 1002829 6361 BioR Hits BioR Misses BioW Hits BioW Misses 2000 54 292 153 BioRL Hits BioRL Misses BioD Hits BioD Misses 6911 0 208 0 DirE Hits DirE Misses 104 0 This does reflect the whole machine, not just those four processes. Finally, when I attempted to kill those four processes with ktrace running on all of them, the system panicked: Fatal trap 12: page fault while in kernel mode cpuid = 2; apic id = 02 fault virtual address = 0x10 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff8272d28f stack pointer = 0x28:0xfffffe008a5f24c0 frame pointer = 0x28:0xfffffe008a5f25f0 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 5627 (node) trap number = 12 panic: page fault cpuid = 2 time = 1607845622 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe008a5f2180 vpanic() at vpanic+0x17b/frame 0xfffffe008a5f21d0 panic() at panic+0x43/frame 0xfffffe008a5f2230 trap_fatal() at trap_fatal+0x391/frame 0xfffffe008a5f2290 trap_pfault() at trap_pfault+0x4f/frame 0xfffffe008a5f22e0 trap() at trap+0x286/frame 0xfffffe008a5f23f0 calltrap() at calltrap+0x8/frame 0xfffffe008a5f23f0 --- trap 0xc, rip = 0xffffffff8272d28f, rsp = 0xfffffe008a5f24c0, rbp = 0xfffffe008a5f25f0 --- null_bypass() at null_bypass+0xaf/frame 0xfffffe008a5f25f0 VOP_ADVLOCK_APV() at VOP_ADVLOCK_APV+0x80/frame 0xfffffe008a5f2620 closef() at closef+0x8f/frame 0xfffffe008a5f26b0 fdescfree_fds() at fdescfree_fds+0x3c/frame 0xfffffe008a5f2700 fdescfree() at fdescfree+0x466/frame 0xfffffe008a5f27c0 exit1() at exit1+0x488/frame 0xfffffe008a5f2820 sigexit() at sigexit+0x159/frame 0xfffffe008a5f2b00 postsig() at postsig+0x2fd/frame 0xfffffe008a5f2bc0 ast() at ast+0x317/frame 0xfffffe008a5f2bf0 doreti_ast() at doreti_ast+0x1f/frame 0x7fffffff67a0 The fsck ate the ktrace.out file on reboot. This system does not have kernel debug symbols. If there's a way to try to figure out what null_bypass+0xaf corresponds to using the machine that built this kernel, which has the kernel, objects, debug symbols, and source, please let me know. It's built from releng/12.2 r368515. Thanks! [-- Attachment #2 --] --- sys/fs/nfsclient/nfs_clstate.c.sav 2020-12-13 06:17:30.642752000 -0800 +++ sys/fs/nfsclient/nfs_clstate.c 2020-12-13 06:38:10.819909000 -0800 @@ -4374,7 +4374,7 @@ nfscl_nodeleg(vnode_t vp, int writedeleg) np = VTONFS(vp); nmp = VFSTONFS(vp->v_mount); - if (!NFSHASNFSV4(nmp)) + if (!NFSHASNFSV4(nmp) || nfscl_enablecallb == 0) return (1); NFSLOCKCLSTATE(); clp = nfscl_findcl(nmp); @@ -4730,7 +4730,7 @@ nfscl_delegmodtime(vnode_t vp) struct nfsmount *nmp; nmp = VFSTONFS(vp->v_mount); - if (!NFSHASNFSV4(nmp)) + if (!NFSHASNFSV4(nmp) || nfscl_enablecallb == 0) return; NFSLOCKCLSTATE(); clp = nfscl_findcl(nmp); @@ -4759,7 +4759,7 @@ nfscl_deleggetmodtime(vnode_t vp, struct timespec *mti struct nfsmount *nmp; nmp = VFSTONFS(vp->v_mount); - if (!NFSHASNFSV4(nmp)) + if (!NFSHASNFSV4(nmp) || nfscl_enablecallb == 0) return; NFSLOCKCLSTATE(); clp = nfscl_findcl(nmp);help
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?YQXPR0101MB096834DB16439960ADAFCE29DDC70>
