Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 23 Sep 2024 15:38:21 -0700
From:      Rick Macklem <rick.macklem@gmail.com>
To:        J David <j.david.lists@gmail.com>
Cc:        FreeBSD FS <freebsd-fs@freebsd.org>
Subject:   Re: panic: nfsv4root ref cnt cpuid = 1
Message-ID:  <CAM5tNy6L7C6f1rN2%2BkUaC_TdfMQkTfCS38BqE=RU60E9VExgww@mail.gmail.com>
In-Reply-To: <CABXB=RQZ4jjWe39Nd26u66ZQURjvybL5eCeGX=n%2Bk3EaJRdfZQ@mail.gmail.com>
References:  <CABXB=RShoxwT3PuPQK9OdJNBbWrShUuYchK7oVnT7gBbLH5D0w@mail.gmail.com> <CABXB=RRKvfiwipfaaNA%2BAuA3Ug1VLyNvxa_o-5hWEq1-qjjTbg@mail.gmail.com> <CAM5tNy5Hh=6b9ZNseeQsRddLSFehiTsYNZOH==CeAGthie5SQw@mail.gmail.com> <CABXB=RRDG6-_NU1rjrmT86Hv7uDRzSAbj-HP5ryd1WQ6ZUZNTA@mail.gmail.com> <CABXB=RQZ4jjWe39Nd26u66ZQURjvybL5eCeGX=n%2Bk3EaJRdfZQ@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Sep 23, 2024 at 2:14=E2=80=AFPM J David <j.david.lists@gmail.com> w=
rote:
>
> We've had some other kernel panics that may be related to this. At the
> very least, the call stack is the same up to nfsrpc_lookup+0x87f.
It is probably caused by the same underlying bug.
If you can easily get the source line# for nfsrpc_lookup+0x87f, that
could be helpful.
The only way I know of to do it is with a kernel.debug. What I do is:
nm kernel.debug | fgrep nfsrpc_lookup
--> Then I add 0x87f to the above hex value.
addr2line -e kernel.debug
- enter the hex value for nfsrpc_lookup+0x87f

If you do not have a kernel.debug, maybe someone else knows a
way to do this?

But, if you cannot do it, I suspect the patch will deal with these as well.
(The patch disables doing Open in the same RPC as Lookup. It is just
an optimization and is only done for "oneopenown".)

rick

>
> Fatal trap 12: page fault while in kernel mode
> cpuid =3D 0; apic id =3D 00
> fault virtual address =3D 0x28
> fault code =3D supervisor read data, page not present
> instruction pointer =3D 0x20:0xffffffff809da260
> stack pointer         =3D 0x28:0xfffffe0111f18438
> frame pointer         =3D 0x28:0xfffffe0111f18470
> code segment =3D base 0x0, limit 0xfffff, type 0x1b
> =3D DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags =3D interrupt enabled, resume, IOPL =3D 0
> current process =3D 14676 (sh)
> rdi: 0000000000000028 rsi: fffff80071cee200 rdx: 0000000000000000
> rcx: 0000000000000000  r8: 0000000000000032  r9: fffffe0111f19000
> rax: 0000000000000000 rbx: fffff80161e4b000 rbp: fffffe0111f18470
> r10: 00000000000001f4 r11: fffff8024e1e5760 r12: 0000000000000000
> r13: 0000000000000000 r14: fffff80071cee200 r15: 0000000000000000
> trap number =3D 12
> panic: page fault
> cpuid =3D 0
> time =3D 1727064012
> KDB: stack backtrace:
> #0 0xffffffff80b7fefd at kdb_backtrace+0x5d
> #1 0xffffffff80b32bd1 at vpanic+0x131
> #2 0xffffffff80b32a93 at panic+0x43
> #3 0xffffffff8100091b at trap_fatal+0x40b
> #4 0xffffffff81000966 at trap_pfault+0x46
> #5 0xffffffff80fd6d48 at calltrap+0x8
> #6 0xffffffff809f9eef at nfsrpc_lookup+0x87f
> #7 0xffffffff80a0e2fd at nfs_lookup+0x43d
> #8 0xffffffff80c0341a at vop_sigdefer+0x2a
> #9 0xffffffff8302c3a7 at null_lookup+0xc7
> #10 0xffffffff80c08745 at vfs_lookup+0x425
> #11 0xffffffff80c079b8 at namei+0x238
> #12 0xffffffff80c2d2da at vn_open_cred+0x53a
> #13 0xffffffff80c239a8 at openatfp+0x268
> #14 0xffffffff80c236b8 at sys_open+0x28
> #15 0xffffffff810011c0 at amd64_syscall+0x100
> #16 0xffffffff80fd765b at fast_syscall_common+0xf8
>
> I don't know if this gives any more insight or confirmation to your
> theory about the problem, but it seems worth sharing.
>
> We got three of these panics (and two of the first kind) on different
> machines in the past 24 hours, so I'll definitely at least be
> experimenting with bulk NFS mounts instead of one NFS + nullfs bulk
> mounts in addition to trying the patch.
>
> Thanks!



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAM5tNy6L7C6f1rN2%2BkUaC_TdfMQkTfCS38BqE=RU60E9VExgww>