Date: Sun, 08 Oct 2023 02:27:19 +0000 From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 274346] kernel panic/page fault in nfs_commonkrpc.c::newnfs_request(), due to duplicate hostid's Message-ID: <bug-274346-227@https.bugs.freebsd.org/bugzilla/>
next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D274346 Bug ID: 274346 Summary: kernel panic/page fault in nfs_commonkrpc.c::newnfs_request(), due to duplicate hostid's Product: Base System Version: 14.0-STABLE Hardware: Any OS: Any Status: New Severity: Affects Only Me Priority: --- Component: kern Assignee: bugs@FreeBSD.org Reporter: freebsd@kumba.dev So I have managed to trigger a kernel panic in 14.0-BETA5 in the NFS subsys= tem, but this is partly due to a mistake I myself made by not changing the hosti= d of a system that's a clone of another active system. The first system is runn= ing 13.2-RELEASE-p4 and the cloned system is running 14.0-BETA5, newly upgraded. There are several elements that lead to this panic: - Both systems have the same hostid - Both systems mount the same remote NFS share from a third system - Have the 13.2-RELEASE-p4 system start doing a job on the remote share, = like compiling code (e.g., /usr/ports is on this share) - Have the cloned system running 14.0-BETA5 attempt to unmount the remote share - The 14.0-BETA5 system will crash I know it's due to duplicate hostid's, because the below message is printed= on the console immediately before the kernel crashes: >=20 > Initiate recovery. If server has not rebooted, check NFS clients for uniq= ue /etc/hostid's >=20 And the printf() for that exact string is in the crashing function right wh= ere GDB says the crash happens, in nfs_commonkrpc.c, function newnfs_request(), line 1212. I'm just not sure if it's the if statement immediately preceedi= ng the printf() call or the if statement that happens after. The next call is memcmp() in machine code, so I am assuming a NULL deref of some kind. My kernel is a custom build, but this can be triggered on a GENERIC kernel = as well, as my first crash happened on GENERIC right before I was set to reboot into my rebuilt custom kernel after doing the second `freebsd-update instal= l` phase to upgrade to 14.0-BETA5. At that time, I had crashdumps disabled. = So the below crash info is from that custom kernel, after I enabled crashdumps= and re-triggered the crash (it's at least reproducible...): > Unread portion of the kernel message buffer: > [179] > [179] > [179] Fatal trap 12: page fault while in kernel mode > [179] cpuid =3D 0; apic id =3D 00 > [179] fault virtual address =3D 0x4 > [179] fault code =3D supervisor read data, page not present > [179] instruction pointer =3D 0x20:0xffffffff809e9893 > [179] stack pointer =3D 0x28:0xfffffe00a233e800 > [179] frame pointer =3D 0x28:0xfffffe00a233e800 > [179] code segment =3D base 0x0, limit 0xfffff, type 0x1b > [179] =3D DPL 0, pres 1, long 1, def32 0, gran 1 > [179] processor eflags =3D interrupt enabled, resume, IOPL =3D 0 > [179] current process =3D 87256 (umount) > [179] rdi: fffff800077761e4 rsi: 0000000000000004 rdx: 0000000000000010 > [179] rcx: 0000000000000000 r8: 0000000000000024 r9: fffffe00a233f000 > [179] rax: 0000000000000000 rbx: fffffe00a251b020 rbp: fffffe00a233e800 > [179] r10: 0000000000000585 r11: 000000007ff9687f r12: fffff80007776010 > [180] r13: fffff80003abb800 r14: fffffe00a233ea18 r15: fffff80007776000 > [180] trap number =3D 12 > [180] panic: page fault > [180] cpuid =3D 0 > [180] time =3D 1696723338 > [180] KDB: stack backtrace: > [180] #0 0xffffffff806b5edd at kdb_backtrace+0x5d > [180] #1 0xffffffff8066aa20 at vpanic+0x130 > [180] #2 0xffffffff8066a8e3 at panic+0x43 > [180] #3 0xffffffff809ee34c at trap_fatal+0x40c > [180] #4 0xffffffff809ee39e at trap_pfault+0x4e > [180] #5 0xffffffff809c6288 at calltrap+0x8 > [180] #6 0xffffffff8053f804 at newnfs_request+0x10a4 > [180] #7 0xffffffff8054dbad at nfsrpc_destroysession+0x11d > [180] #8 0xffffffff80557252 at nfscl_umount+0x312 > [180] #9 0xffffffff80589470 at nfs_unmount+0x70 > [180] #10 0xffffffff8073c4ad at vfs_unmount_sigdefer+0x2d > [180] #11 0xffffffff80741e37 at dounmount+0x787 > [180] #12 0xffffffff80741645 at kern_unmount+0x2f5 > [180] #13 0xffffffff809eeaf9 at amd64_syscall+0x109 > [180] #14 0xffffffff809c6b9b at fast_syscall_common+0xf8 > [180] Timeout initializing vt_vga > [180] Uptime: 3m0s > [180] Dumping 447 out of 8077 MB:..4%..11%..22%..33%..43%..51%..61%..72%.= .83%..93% >=20 > __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:57 > 57 /usr/src/sys/amd64/include/pcpu_aux.h: No such file or directory. > (kgdb) #0 __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:57 > #1 doadump (textdump=3D<optimized out>) at ../../../kern/kern_shutdown.c= :405 > #2 0xffffffff8066a5b7 in kern_reboot (howto=3D260) > at ../../../kern/kern_shutdown.c:526 > #3 0xffffffff8066aa8d in vpanic (fmt=3D0xffffffff80a3bcd1 "%s", > ap=3Dap@entry=3D0xfffffe00a233e680) at ../../../kern/kern_shutdown.c:= 970 > #4 0xffffffff8066a8e3 in panic (fmt=3D<unavailable>) > at ../../../kern/kern_shutdown.c:894 > #5 0xffffffff809ee34c in trap_fatal (frame=3D0xfffffe00a233e740, eva=3D4) > at ../../../amd64/amd64/trap.c:952 > #6 0xffffffff809ee39e in trap_pfault (frame=3D0xfffffe00a233e740, > usermode=3Dfalse, signo=3D<optimized out>, ucode=3D<optimized out>) > at ../../../amd64/amd64/trap.c:760 > #7 <signal handler called> > #8 memcmp () at ../../../amd64/amd64/support.S:115 > #9 0xffffffff8053f804 in newnfs_request (nd=3Dnd@entry=3D0xfffffe00a233e= a18, > nmp=3Dnmp@entry=3D0xfffff80003abb800, clp=3Dclp@entry=3D0x0, > nrp=3Dnrp@entry=3D0xfffff80003abbcd8, vp=3Dvp@entry=3D0x0, > td=3Dtd@entry=3D0xfffffe00a251b020, cred=3D0xfffff8000765aa00, prog= =3D100003, > vers=3D4, retsum=3D0x0, toplevel=3D1, xidp=3D0x0, dssep=3D0x0) > at ../../../fs/nfs/nfs_commonkrpc.c:1212 > #10 0xffffffff8054dbad in nfsrpc_destroysession ( > nmp=3Dnmp@entry=3D0xfffff80003abb800, tsep=3D0xfffff80007776010, > tsep@entry=3D0x0, cred=3Dcred@entry=3D0xfffff8000765aa00, > p=3Dp@entry=3D0xfffffe00a251b020) at ../../../fs/nfs/nfs_commonsubs.c= :5151 > #11 0xffffffff80557252 in nfscl_umount (nmp=3Dnmp@entry=3D0xfffff80003abb= 800, > p=3Dp@entry=3D0xfffffe00a251b020, dhp=3Ddhp@entry=3D0x0) > at ../../../fs/nfsclient/nfs_clstate.c:2094 > #12 0xffffffff80589470 in nfs_unmount (mp=3D0xfffffe00a4058000, > mntflags=3D<optimized out>) at ../../../fs/nfsclient/nfs_clvfsops.c:1= 903 > #13 0xffffffff8073c4ad in vfs_unmount_sigdefer (mp=3D0xfffffe00a4058000, > mntflags=3D134217728) at ../../../kern/vfs_init.c:185 > #14 0xffffffff80741e37 in dounmount (mp=3D0xfffff800077761e4, > mp@entry=3D0xfffffe00a4058000, flags=3Dflags@entry=3D134217728, > td=3Dtd@entry=3D0xfffffe00a251b020) at ../../../kern/vfs_mount.c:2327 > #15 0xffffffff80741645 in kern_unmount (td=3D0xfffffe00a251b020, > path=3D<optimized out>, flags=3D134217728) at ../../../kern/vfs_mount= .c:1785 > #16 0xffffffff809eeaf9 in syscallenter (td=3D0xfffffe00a251b020) > at ../../../amd64/amd64/../../kern/subr_syscall.c:187 > #17 amd64_syscall (td=3D0xfffffe00a251b020, traced=3D0) > at ../../../amd64/amd64/trap.c:1197 > #18 <signal handler called> > #19 0x0000244bc41489ba in ?? () > Backtrace stopped: Cannot access memory at address 0x244bc20f4c18 > (kgdb) --=20 You are receiving this mail because: You are the assignee for the bug.=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-274346-227>