Date: Mon, 14 Dec 2020 00:24:14 +0000 From: Rick Macklem <rmacklem@uoguelph.ca> To: J David <j.david.lists@gmail.com>, Konstantin Belousov <kostikbel@gmail.com> Cc: "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org> Subject: Re: Major issues with nfsv4 Message-ID: <YQXPR0101MB096834DB16439960ADAFCE29DDC70@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> In-Reply-To: <CABXB=RRNnW9nNhFCJS1evNUTEX9LNnzyf2gOmZHHGkzAoQxbPw@mail.gmail.com> References: <CABXB=RRB2nUk0pPDisBQPdicUA3ooHpg8QvBwjG_nFU4cHvCYw@mail.gmail.com> <YQXPR0101MB096849ADF24051F7479E565CDDCA0@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <CABXB=RSyN%2Bo2yXcpmYw8sCSUUDhN-w28Vu9v_cCWa-2=pLZmHg@mail.gmail.com> <YQXPR0101MB09680D155B6D685442B5E25EDDCA0@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <CABXB=RSSE=yOwgOXsnbEYPqiWk5K5NfzLY=D%2BN9mXdVn%2B--qLQ@mail.gmail.com> <YQXPR0101MB0968B17010B3B36C8C41FDE1DDC90@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <X9Q9GAhNHbXGbKy7@kib.kiev.ua> <YQXPR0101MB0968C7629D57CA21319E50C2DDC90@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <X9UDArKjUqJVS035@kib.kiev.ua>, <CABXB=RRNnW9nNhFCJS1evNUTEX9LNnzyf2gOmZHHGkzAoQxbPw@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
--_002_YQXPR0101MB096834DB16439960ADAFCE29DDC70YQXPR0101MB0968_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable J David wrote:=0A= [stuff snipped]=0A= >The most common trace (91 times, or over 1/4 of all observed) is:=0A= >=0A= >__mtx_lock_sleep+0xf8 nfscl_nodeleg+0x207 nfs_lookup+0x314=0A= >VOP_LOOKUP_APV+0x75 null_lookup+0x98 VOP_LOOKUP_APV+0x75 >lookup+0x451=0A= >namei+0x414 kern_statat+0x72 freebsd11_stat+0x30 amd64_syscall+0x387=0A= >fast_syscall_common+0xf8=0A= >=0A= >This trace appears roughly uniformly (28/38/34) in 3 of the 4 processes.= =0A= nfscl_nodeleg() plus two others acquire the mutex to check for a delegation= =0A= for the file. Servers should only issue delegations when the callback=0A= path is working and some never issue delegations.=0A= =0A= So, if you are not running the nfscbd(8) daemon (there is no need=0A= unless you are using delegations or pNFS) a trivial check for the=0A= callback path being enabled should determine that no delegation=0A= exists if it is not enabled. This avoids acquiring the mutex for this=0A= common case.=0A= --> You have no "Delegs" according to your "nfsstat -c -E".=0A= =0A= So, if you set up a test system with the attached patch applied to the=0A= kernel and you do not run nfscbd(8) (ie no nfscbd_enable=3D"YES" in=0A= your /etc/rc.conf) this patch might help.=0A= --> It avoids acquiring the mutex for lookups and getattrs (stat calls).=0A= =0A= The only reason I haven't done this before is a concern w.r.t. a broken=0A= NFSv4 server that erroneously issues delegations when the callback=0A= path is not working. (I don't know of such a server and, since you do not= =0A= have delegations, it should not be an issue for your case.)=0A= =0A= Might be enough to alleviate your problem.=0A= =0A= I will work on changing the single open linked list to a hash table of=0A= linked lists, but that is probably weeks away from a test patch.=0A= =0A= Let me know if you can try the patch and if it helps, rick=0A= =0A= The full set of traces (sorted by uniq -c) is here:=0A= =0A= https://pastebin.com/HUqkeMri=0A= =0A= (This message is already long enough!)=0A= =0A= The characters stripped off the front of each line are consistently:=0A= =0A= (pid) (uid) python3 -=0A= =0A= It happens these were all Python-based jobs. Python seems predisposed=0A= to trigger this, but non-Python jobs trigger it as well. Heavy use of=0A= stat() does seem to be a common element regardless of job type.=0A= =0A= Here's the output of "nfsstat -E -c" 60 seconds after running it with -z:= =0A= =0A= Client Info:=0A= RPC Counts:=0A= Getattr Setattr Lookup Readlink Read Wr= ite=0A= 1667 111 6376 0 42 = 153=0A= Create Remove Rename Link Symlink Mk= dir=0A= 111 7 14 0 0 = 0=0A= Rmdir Readdir RdirPlus Access Mknod Fss= tat=0A= 0 0 0 2620 0 = 160=0A= FSinfo pathConf Commit SetClId SetClIdCf L= ock=0A= 0 0 113 0 0 = 48=0A= LockT LockU Open OpenCfr=0A= 0 48 320 0=0A= OpenDownGr Close=0A= 0 402=0A= RelLckOwn FreeStateID PutRootFH DelegRet GetAcl Set= Acl=0A= 0 3 0 0 0 = 0=0A= ExchangeId CreateSess DestroySess DestroyClId LayoutGet GetDevI= nfo=0A= 0 0 0 0 0 = 0=0A= LayoutCommit LayoutReturn ReclaimCompl ReadDataS WriteDataS CommitDa= taS=0A= 0 0 0 0 0 = 0=0A= OpenLayout CreateLayout=0A= 0 0=0A= OpenOwner Opens LockOwner Locks Delegs Local= Own=0A= 21175 130439 30 6 0 = 0=0A= LocalOpen LocalLown LocalLock=0A= 0 0 0=0A= Rpc Info:=0A= TimedOut Invalid X Replies Retries Requests=0A= 0 0 0 0 12247=0A= Cache Info:=0A= Attr Hits Attr Misses Lkup Hits Lkup Misses=0A= 1110054 858 1002829 6361=0A= BioR Hits BioR Misses BioW Hits BioW Misses=0A= 2000 54 292 153=0A= BioRL Hits BioRL Misses BioD Hits BioD Misses=0A= 6911 0 208 0=0A= DirE Hits DirE Misses=0A= 104 0=0A= =0A= This does reflect the whole machine, not just those four processes.=0A= =0A= Finally, when I attempted to kill those four processes with ktrace=0A= running on all of them, the system panicked:=0A= =0A= Fatal trap 12: page fault while in kernel mode=0A= cpuid =3D 2; apic id =3D 02=0A= fault virtual address =3D 0x10=0A= fault code =3D supervisor read data, page not present=0A= instruction pointer =3D 0x20:0xffffffff8272d28f=0A= stack pointer =3D 0x28:0xfffffe008a5f24c0=0A= frame pointer =3D 0x28:0xfffffe008a5f25f0=0A= code segment =3D base 0x0, limit 0xfffff, type 0x1b=0A= =3D DPL 0, pres 1, long 1, def32 0, gran 1=0A= processor eflags =3D interrupt enabled, resume, IOPL =3D 0=0A= current process =3D 5627 (node)=0A= trap number =3D 12=0A= panic: page fault=0A= cpuid =3D 2=0A= time =3D 1607845622=0A= KDB: stack backtrace:=0A= db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe008a5f2= 180=0A= vpanic() at vpanic+0x17b/frame 0xfffffe008a5f21d0=0A= panic() at panic+0x43/frame 0xfffffe008a5f2230=0A= trap_fatal() at trap_fatal+0x391/frame 0xfffffe008a5f2290=0A= trap_pfault() at trap_pfault+0x4f/frame 0xfffffe008a5f22e0=0A= trap() at trap+0x286/frame 0xfffffe008a5f23f0=0A= calltrap() at calltrap+0x8/frame 0xfffffe008a5f23f0=0A= --- trap 0xc, rip =3D 0xffffffff8272d28f, rsp =3D 0xfffffe008a5f24c0, rbp= =0A= =3D 0xfffffe008a5f25f0 ---=0A= null_bypass() at null_bypass+0xaf/frame 0xfffffe008a5f25f0=0A= VOP_ADVLOCK_APV() at VOP_ADVLOCK_APV+0x80/frame 0xfffffe008a5f2620=0A= closef() at closef+0x8f/frame 0xfffffe008a5f26b0=0A= fdescfree_fds() at fdescfree_fds+0x3c/frame 0xfffffe008a5f2700=0A= fdescfree() at fdescfree+0x466/frame 0xfffffe008a5f27c0=0A= exit1() at exit1+0x488/frame 0xfffffe008a5f2820=0A= sigexit() at sigexit+0x159/frame 0xfffffe008a5f2b00=0A= postsig() at postsig+0x2fd/frame 0xfffffe008a5f2bc0=0A= ast() at ast+0x317/frame 0xfffffe008a5f2bf0=0A= doreti_ast() at doreti_ast+0x1f/frame 0x7fffffff67a0=0A= =0A= The fsck ate the ktrace.out file on reboot. This system does not have=0A= kernel debug symbols. If there's a way to try to figure out what=0A= null_bypass+0xaf corresponds to using the machine that built this=0A= kernel, which has the kernel, objects, debug symbols, and source,=0A= please let me know. It's built from releng/12.2 r368515.=0A= =0A= Thanks!=0A= --_002_YQXPR0101MB096834DB16439960ADAFCE29DDC70YQXPR0101MB0968_ Content-Type: application/octet-stream; name="clstate.patch" Content-Description: clstate.patch Content-Disposition: attachment; filename="clstate.patch"; size=912; creation-date="Mon, 14 Dec 2020 00:24:08 GMT"; modification-date="Mon, 14 Dec 2020 00:24:08 GMT" Content-Transfer-Encoding: base64 LS0tIHN5cy9mcy9uZnNjbGllbnQvbmZzX2Nsc3RhdGUuYy5zYXYJMjAyMC0xMi0xMyAwNjoxNzoz MC42NDI3NTIwMDAgLTA4MDAKKysrIHN5cy9mcy9uZnNjbGllbnQvbmZzX2Nsc3RhdGUuYwkyMDIw LTEyLTEzIDA2OjM4OjEwLjgxOTkwOTAwMCAtMDgwMApAQCAtNDM3NCw3ICs0Mzc0LDcgQEAgbmZz Y2xfbm9kZWxlZyh2bm9kZV90IHZwLCBpbnQgd3JpdGVkZWxlZykKIAogCW5wID0gVlRPTkZTKHZw KTsKIAlubXAgPSBWRlNUT05GUyh2cC0+dl9tb3VudCk7Ci0JaWYgKCFORlNIQVNORlNWNChubXAp KQorCWlmICghTkZTSEFTTkZTVjQobm1wKSB8fCBuZnNjbF9lbmFibGVjYWxsYiA9PSAwKQogCQly ZXR1cm4gKDEpOwogCU5GU0xPQ0tDTFNUQVRFKCk7CiAJY2xwID0gbmZzY2xfZmluZGNsKG5tcCk7 CkBAIC00NzMwLDcgKzQ3MzAsNyBAQCBuZnNjbF9kZWxlZ21vZHRpbWUodm5vZGVfdCB2cCkKIAlz dHJ1Y3QgbmZzbW91bnQgKm5tcDsKIAogCW5tcCA9IFZGU1RPTkZTKHZwLT52X21vdW50KTsKLQlp ZiAoIU5GU0hBU05GU1Y0KG5tcCkpCisJaWYgKCFORlNIQVNORlNWNChubXApIHx8IG5mc2NsX2Vu YWJsZWNhbGxiID09IDApCiAJCXJldHVybjsKIAlORlNMT0NLQ0xTVEFURSgpOwogCWNscCA9IG5m c2NsX2ZpbmRjbChubXApOwpAQCAtNDc1OSw3ICs0NzU5LDcgQEAgbmZzY2xfZGVsZWdnZXRtb2R0 aW1lKHZub2RlX3QgdnAsIHN0cnVjdCB0aW1lc3BlYyAqbXRpCiAJc3RydWN0IG5mc21vdW50ICpu bXA7CiAKIAlubXAgPSBWRlNUT05GUyh2cC0+dl9tb3VudCk7Ci0JaWYgKCFORlNIQVNORlNWNChu bXApKQorCWlmICghTkZTSEFTTkZTVjQobm1wKSB8fCBuZnNjbF9lbmFibGVjYWxsYiA9PSAwKQog CQlyZXR1cm47CiAJTkZTTE9DS0NMU1RBVEUoKTsKIAljbHAgPSBuZnNjbF9maW5kY2wobm1wKTsK --_002_YQXPR0101MB096834DB16439960ADAFCE29DDC70YQXPR0101MB0968_--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?YQXPR0101MB096834DB16439960ADAFCE29DDC70>