Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 9 Feb 2024 17:03:30 -0500
From:      Zaphod Beeblebrox <zbeeble@gmail.com>
To:        Mark Johnston <markj@freebsd.org>
Cc:        "Matthew L. Dailey" <Matthew.L.Dailey@dartmouth.edu>,  "freebsd-current@freebsd.org" <freebsd-current@freebsd.org>
Subject:   Re: FreeBSD panics possibly caused by nfs clients
Message-ID:  <CACpH0MfvdizKo%2BRA0E6jnMVZSayotA2Vn2znZG8qD1K18dsF6g@mail.gmail.com>
In-Reply-To: <ZcaWkUwMlBCZCUhg@nuc>
References:  <c5d44484-8660-4b8b-a379-79423cb208f6@dartmouth.edu> <ZcZNDtN1nNJmo8cS@nuc> <c9eca81a-9eff-4b17-9928-bee2c79cef8f@dartmouth.edu> <b3243928-4d66-4c5e-9745-254d57f1cc5e@dartmouth.edu> <ZcaWkUwMlBCZCUhg@nuc>

next in thread | previous in thread | raw e-mail | index | archive | help
--000000000000f273500610fa199d
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

Just in case it's relevant, I'm carrying around this patch on my fairly
busy little RISC-V machine.

diff --git a/sys/fs/nfsclient/nfs_clvnops.c b/sys/fs/nfsclient/nfs_clvnops.=
c
index 0b8c587a542c..85c0ebd7a10f 100644
--- a/sys/fs/nfsclient/nfs_clvnops.c
+++ b/sys/fs/nfsclient/nfs_clvnops.c
@@ -2459,6 +2459,16 @@ nfs_readdir(struct vop_readdir_args *ap)
                return (EINVAL);
        uio->uio_resid -=3D left;

+       /*
+        * For readdirplus, if starting to read the directory,
+        * purge the name cache, since it will be reloaded by
+        * this directory read.
+        * This removes potentially stale name cache entries.
+        */
+       if (uio->uio_offset =3D=3D 0 &&
+           (VFSTONFS(vp->v_mount)->nm_flag & NFSMNT_RDIRPLUS) !=3D 0)
+               cache_purge(vp);
+
        /*
         * Call ncl_bioread() to do the real work.
         */
... without it, I can panic.


On Fri, Feb 9, 2024 at 4:18=E2=80=AFPM Mark Johnston <markj@freebsd.org> wr=
ote:

> On Fri, Feb 09, 2024 at 06:23:08PM +0000, Matthew L. Dailey wrote:
> > I had my first kernel panic with a KASAN kernel after only 01:27. This
> > first panic was a "double fault," which isn't anything we've seen
> > previously - usually we've seen trap 9 or trap 12, but sometimes others=
.
> > Based on the backtrace, it definitely looks like KASAN caught something=
,
> > but I don't have the expertise to know if this points to anything
> > specific. From the backtrace, it looks like this might have originated
> > in ipfw code.
>
> A double fault is rather unexpected.  I presume you're running
> releng/14.0?  Is it at all possible to test with FreeBSD-CURRENT?
>
> Did you add INVARIANTS etc. to the kernel configuration used here, or
> just KASAN?
>
> > Please let me know what other info I can provide or what I can do to di=
g
> > deeper.
>
> If you could repeat the test several times, I'd be interested in seeing
> if you always get the same result.  If you're willing to share the
> vmcore (or several), I'd be willing to take a look at it.
>
> > Thanks!!
> >
> > Panic message:
> > [5674] Fatal double fault
> > [5674] rip 0xffffffff812f6e32 rsp 0xfffffe014677afe0 rbp
> 0xfffffe014677b430
> > [5674] rax 0x1fffffc028cef620 rdx 0xf2f2f2f8f2f2f2f2 rbx 0x1
> > [5674] rcx 0xdffff7c000000000 rsi 0xfffffe004086a4a0 rdi
> 0xf8f8f8f8f2f2f2f8
> > [5674] r8 0xf8f8f8f8f8f8f8f8 r9 0x162a r10 0x835003002d3a64e1
> > [5674] r11 0 r12 0xfffff78028cef620 r13 0xfffffe004086a440
> > [5674] r14 0xfffffe01488c0560 r15 0x26f40 rflags 0x10006
> > [5674] cs 0x20 ss 0x28 ds 0x3b es 0x3b fs 0x13 gs 0x1b
> > [5674] fsbase 0x95d1d81a130 gsbase 0xffffffff84a14000 kgsbase 0
> > [5674] cpuid =3D 4; apic id =3D 08
> > [5674] panic: double fault
> > [5674] cpuid =3D 4
> > [5674] time =3D 1707498420
> > [5674] KDB: stack backtrace:
> > [5674] Uptime: 1h34m34s
> >
> > Backtrace:
> > #0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:57
> > #1  doadump (textdump=3D<optimized out>) at
> > /usr/src/sys/kern/kern_shutdown.c:405
> > #2  0xffffffff8128b7dc in kern_reboot (howto=3Dhowto@entry=3D260)
> >      at /usr/src/sys/kern/kern_shutdown.c:526
> > #3  0xffffffff8128c000 in vpanic (
> >      fmt=3Dfmt@entry=3D0xffffffff82589a00 <str> "double fault",
> >      ap=3Dap@entry=3D0xfffffe0040866de0) at
> > /usr/src/sys/kern/kern_shutdown.c:970
> > #4  0xffffffff8128bd75 in panic (fmt=3D0xffffffff82589a00 <str> "double
> > fault")
> >      at /usr/src/sys/kern/kern_shutdown.c:894
> > #5  0xffffffff81c4b335 in dblfault_handler (frame=3D<optimized out>)
> >      at /usr/src/sys/amd64/amd64/trap.c:1012
> > #6  <signal handler called>
> > #7  0xffffffff812f6e32 in sched_clock (td=3Dtd@entry=3D0xfffffe01488c05=
60,
> >      cnt=3Dcnt@entry=3D1) at /usr/src/sys/kern/sched_ule.c:2601
> > #8  0xffffffff8119e2a7 in statclock (cnt=3Dcnt@entry=3D1,
> >      usermode=3Dusermode@entry=3D0) at /usr/src/sys/kern/kern_clock.c:7=
60
> > #9  0xffffffff8119fb67 in handleevents (now=3Dnow@entry=3D2437185569983=
2,
> >      fake=3Dfake@entry=3D0) at /usr/src/sys/kern/kern_clocksource.c:195
> > #10 0xffffffff811a10cc in timercb (et=3D<optimized out>, arg=3D<optimiz=
ed
> out>)
> >      at /usr/src/sys/kern/kern_clocksource.c:353
> > #11 0xffffffff81dcd280 in lapic_handle_timer (frame=3D0xfffffe014677b75=
0)
> >      at /usr/src/sys/x86/x86/local_apic.c:1343
> > #12 <signal handler called>
> > #13 __asan_load8_noabort (addr=3D18446741880219689232)
> >      at /usr/src/sys/kern/subr_asan.c:1113
> > #14 0xffffffff851488b8 in ?? () from /boot/thayer/ipfw.ko
> > #15 0xfffffe0100000000 in ?? ()
> > #16 0xffffffff8134dcd5 in pcpu_find (cpuid=3D1238425856)
> >      at /usr/src/sys/kern/subr_pcpu.c:286
> > #17 0xffffffff85151f6f in ?? () from /boot/thayer/ipfw.ko
> > #18 0x0000000000000000 in ?? ()
>
>

--000000000000f273500610fa199d
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div>Just in case it&#39;s relevant, I&#39;m carrying arou=
nd this patch on my fairly busy little RISC-V machine.</div><div><br></div>=
<div>diff --git a/sys/fs/nfsclient/nfs_clvnops.c b/sys/fs/nfsclient/nfs_clv=
nops.c<br>index 0b8c587a542c..85c0ebd7a10f 100644<br>--- a/sys/fs/nfsclient=
/nfs_clvnops.c<br>+++ b/sys/fs/nfsclient/nfs_clvnops.c<br>@@ -2459,6 +2459,=
16 @@ nfs_readdir(struct vop_readdir_args *ap)<br>=C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 return (EINVAL);<br>=C2=A0 =C2=A0 =C2=A0 =
=C2=A0 uio-&gt;uio_resid -=3D left;<br><br>+ =C2=A0 =C2=A0 =C2=A0 /*<br>+ =
=C2=A0 =C2=A0 =C2=A0 =C2=A0* For readdirplus, if starting to read the direc=
tory,<br>+ =C2=A0 =C2=A0 =C2=A0 =C2=A0* purge the name cache, since it will=
 be reloaded by<br>+ =C2=A0 =C2=A0 =C2=A0 =C2=A0* this directory read.<br>+=
 =C2=A0 =C2=A0 =C2=A0 =C2=A0* This removes potentially stale name cache ent=
ries.<br>+ =C2=A0 =C2=A0 =C2=A0 =C2=A0*/<br>+ =C2=A0 =C2=A0 =C2=A0 if (uio-=
&gt;uio_offset =3D=3D 0 &amp;&amp;<br>+ =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
(VFSTONFS(vp-&gt;v_mount)-&gt;nm_flag &amp; NFSMNT_RDIRPLUS) !=3D 0)<br>+ =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 cache_purge(vp);<br>+<br>=
=C2=A0 =C2=A0 =C2=A0 =C2=A0 /*<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0* Call =
ncl_bioread() to do the real work.<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0*/<=
/div><div>... without it, I can panic.</div><div><br></div></div><br><div c=
lass=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr">On Fri, Feb 9, 2=
024 at 4:18=E2=80=AFPM Mark Johnston &lt;<a href=3D"mailto:markj@freebsd.or=
g">markj@freebsd.org</a>&gt; wrote:<br></div><blockquote class=3D"gmail_quo=
te" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204=
);padding-left:1ex">On Fri, Feb 09, 2024 at 06:23:08PM +0000, Matthew L. Da=
iley wrote:<br>
&gt; I had my first kernel panic with a KASAN kernel after only 01:27. This=
 <br>
&gt; first panic was a &quot;double fault,&quot; which isn&#39;t anything w=
e&#39;ve seen <br>
&gt; previously - usually we&#39;ve seen trap 9 or trap 12, but sometimes o=
thers. <br>
&gt; Based on the backtrace, it definitely looks like KASAN caught somethin=
g, <br>
&gt; but I don&#39;t have the expertise to know if this points to anything =
<br>
&gt; specific. From the backtrace, it looks like this might have originated=
 <br>
&gt; in ipfw code.<br>
<br>
A double fault is rather unexpected.=C2=A0 I presume you&#39;re running<br>
releng/14.0?=C2=A0 Is it at all possible to test with FreeBSD-CURRENT?<br>
<br>
Did you add INVARIANTS etc. to the kernel configuration used here, or<br>
just KASAN?<br>
<br>
&gt; Please let me know what other info I can provide or what I can do to d=
ig <br>
&gt; deeper.<br>
<br>
If you could repeat the test several times, I&#39;d be interested in seeing=
<br>
if you always get the same result.=C2=A0 If you&#39;re willing to share the=
<br>
vmcore (or several), I&#39;d be willing to take a look at it.<br>
<br>
&gt; Thanks!!<br>
&gt; <br>
&gt; Panic message:<br>
&gt; [5674] Fatal double fault<br>
&gt; [5674] rip 0xffffffff812f6e32 rsp 0xfffffe014677afe0 rbp 0xfffffe01467=
7b430<br>
&gt; [5674] rax 0x1fffffc028cef620 rdx 0xf2f2f2f8f2f2f2f2 rbx 0x1<br>
&gt; [5674] rcx 0xdffff7c000000000 rsi 0xfffffe004086a4a0 rdi 0xf8f8f8f8f2f=
2f2f8<br>
&gt; [5674] r8 0xf8f8f8f8f8f8f8f8 r9 0x162a r10 0x835003002d3a64e1<br>
&gt; [5674] r11 0 r12 0xfffff78028cef620 r13 0xfffffe004086a440<br>
&gt; [5674] r14 0xfffffe01488c0560 r15 0x26f40 rflags 0x10006<br>
&gt; [5674] cs 0x20 ss 0x28 ds 0x3b es 0x3b fs 0x13 gs 0x1b<br>
&gt; [5674] fsbase 0x95d1d81a130 gsbase 0xffffffff84a14000 kgsbase 0<br>
&gt; [5674] cpuid =3D 4; apic id =3D 08<br>
&gt; [5674] panic: double fault<br>
&gt; [5674] cpuid =3D 4<br>
&gt; [5674] time =3D 1707498420<br>
&gt; [5674] KDB: stack backtrace:<br>
&gt; [5674] Uptime: 1h34m34s<br>
&gt; <br>
&gt; Backtrace:<br>
&gt; #0=C2=A0 __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:57<br=
>
&gt; #1=C2=A0 doadump (textdump=3D&lt;optimized out&gt;) at <br>
&gt; /usr/src/sys/kern/kern_shutdown.c:405<br>
&gt; #2=C2=A0 0xffffffff8128b7dc in kern_reboot (howto=3Dhowto@entry=3D260)=
<br>
&gt;=C2=A0 =C2=A0 =C2=A0 at /usr/src/sys/kern/kern_shutdown.c:526<br>
&gt; #3=C2=A0 0xffffffff8128c000 in vpanic (<br>
&gt;=C2=A0 =C2=A0 =C2=A0 fmt=3Dfmt@entry=3D0xffffffff82589a00 &lt;str&gt; &=
quot;double fault&quot;,<br>
&gt;=C2=A0 =C2=A0 =C2=A0 ap=3Dap@entry=3D0xfffffe0040866de0) at <br>
&gt; /usr/src/sys/kern/kern_shutdown.c:970<br>
&gt; #4=C2=A0 0xffffffff8128bd75 in panic (fmt=3D0xffffffff82589a00 &lt;str=
&gt; &quot;double <br>
&gt; fault&quot;)<br>
&gt;=C2=A0 =C2=A0 =C2=A0 at /usr/src/sys/kern/kern_shutdown.c:894<br>
&gt; #5=C2=A0 0xffffffff81c4b335 in dblfault_handler (frame=3D&lt;optimized=
 out&gt;)<br>
&gt;=C2=A0 =C2=A0 =C2=A0 at /usr/src/sys/amd64/amd64/trap.c:1012<br>
&gt; #6=C2=A0 &lt;signal handler called&gt;<br>
&gt; #7=C2=A0 0xffffffff812f6e32 in sched_clock (td=3Dtd@entry=3D0xfffffe01=
488c0560,<br>
&gt;=C2=A0 =C2=A0 =C2=A0 cnt=3Dcnt@entry=3D1) at /usr/src/sys/kern/sched_ul=
e.c:2601<br>
&gt; #8=C2=A0 0xffffffff8119e2a7 in statclock (cnt=3Dcnt@entry=3D1,<br>
&gt;=C2=A0 =C2=A0 =C2=A0 usermode=3Dusermode@entry=3D0) at /usr/src/sys/ker=
n/kern_clock.c:760<br>
&gt; #9=C2=A0 0xffffffff8119fb67 in handleevents (now=3Dnow@entry=3D2437185=
5699832,<br>
&gt;=C2=A0 =C2=A0 =C2=A0 fake=3Dfake@entry=3D0) at /usr/src/sys/kern/kern_c=
locksource.c:195<br>
&gt; #10 0xffffffff811a10cc in timercb (et=3D&lt;optimized out&gt;, arg=3D&=
lt;optimized out&gt;)<br>
&gt;=C2=A0 =C2=A0 =C2=A0 at /usr/src/sys/kern/kern_clocksource.c:353<br>
&gt; #11 0xffffffff81dcd280 in lapic_handle_timer (frame=3D0xfffffe014677b7=
50)<br>
&gt;=C2=A0 =C2=A0 =C2=A0 at /usr/src/sys/x86/x86/local_apic.c:1343<br>
&gt; #12 &lt;signal handler called&gt;<br>
&gt; #13 __asan_load8_noabort (addr=3D18446741880219689232)<br>
&gt;=C2=A0 =C2=A0 =C2=A0 at /usr/src/sys/kern/subr_asan.c:1113<br>
&gt; #14 0xffffffff851488b8 in ?? () from /boot/thayer/ipfw.ko<br>
&gt; #15 0xfffffe0100000000 in ?? ()<br>
&gt; #16 0xffffffff8134dcd5 in pcpu_find (cpuid=3D1238425856)<br>
&gt;=C2=A0 =C2=A0 =C2=A0 at /usr/src/sys/kern/subr_pcpu.c:286<br>
&gt; #17 0xffffffff85151f6f in ?? () from /boot/thayer/ipfw.ko<br>
&gt; #18 0x0000000000000000 in ?? ()<br>
<br>
</blockquote></div>

--000000000000f273500610fa199d--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CACpH0MfvdizKo%2BRA0E6jnMVZSayotA2Vn2znZG8qD1K18dsF6g>