From nobody Fri Feb 9 22:03:30 2024 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4TWntV1BVbz5BQJX for ; Fri, 9 Feb 2024 22:03:46 +0000 (UTC) (envelope-from zbeeble@gmail.com) Received: from mail-lj1-x22d.google.com (mail-lj1-x22d.google.com [IPv6:2a00:1450:4864:20::22d]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4TWntT527Nz4FJL; Fri, 9 Feb 2024 22:03:45 +0000 (UTC) (envelope-from zbeeble@gmail.com) Authentication-Results: mx1.freebsd.org; none Received: by mail-lj1-x22d.google.com with SMTP id 38308e7fff4ca-2d0d95e8133so17659911fa.1; Fri, 09 Feb 2024 14:03:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1707516224; x=1708121024; darn=freebsd.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=DjIgXqNALCEXinqtuhnjnxxG5dEPI2ptcvWRk7SFVOg=; b=V8mRH05Dnmk3cTu/S8frkx59/ratKlqXhupdeCT3u5bgU/1mS+fBcozXp2aaSJPG/W qAWwbea73cAo2tlu4cpjbA6Q0WW5gouC6oonPmkfkvL7Mh+Q/vPrHi/bjZYbWIAMAVdw JgQhFDoHKrQxvMfhF10c/WQ2c1Hg5fO0uJweWHt1cqQFQzHMtAanpVxzkpyGb9TuOovO s8Epj2aH8EtcxZ73hCc6jRYOV6/XfohSfwkkQX927/KFlM1w6CP5Vn5vjhdh6p1s+x7z F9pWynyqHFsvFz1DZJq0KRUPsKsOHfhnHN1vXUpRPdzKLocmVP5C2CMpOudMgFdIEM+R xxaA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707516224; x=1708121024; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=DjIgXqNALCEXinqtuhnjnxxG5dEPI2ptcvWRk7SFVOg=; b=iLDBGSKCAQC0EauTUVI4RhrlmaEpGAv3cn0FqVrqebvtDXA8xc7+rZ5LEnpLXyyCEQ 7sSNtkiJ494iY43W4iLWy4fEjdUSMtN1ttXhSVkpxWA7wTzsOML8T0yxadIywpkWqeKt 9J6tQMnF1wIe1lotWxy5ex5Pp3naXhX0Uo8y1VFH7r8E3nkSGwjUe7WBa29JbRUf6f/p ISnzVhEDtgPKavwALDnx+mymHyDIdzvPbZyMuiyhsriA5flNJU6Mr6xBfPVD1GiWUfkt pkAoNoJ6fM3KtXMd+r6zFpi0yQEvX7VoF0xAoy4i1LJH+cFxAKfnureGcVrCIOA3h1QW FRQQ== X-Forwarded-Encrypted: i=1; AJvYcCVoIWAwpD3etTdbAZ8n052kDgMfJs/3rRqYGVLe+sBdpaP0m/5c3+HvWfthGM9gZmPDWxHxIeSEGWQImXNe7dVkW22NNPKI/2ruPBo= X-Gm-Message-State: AOJu0Yy563QeEi2JdH3hS7p43+EidHtKTkAE2cwcXJZXeStCIpEPJRor AbIJfvJBZ6N4z2V3lMXW5CAQeRzVKp8SlIqBngPllt5Sx2B1g6/6PU753kDkTUltx++qVgtnfDb mLKEh02tzFwrIVGqSilA+r8Enb/TTHS3okA== X-Google-Smtp-Source: AGHT+IHrrP/LzJAuZ7zExlP3IKrrw17BLC7HSLocPVjjsDPuWrOkkOJ/E9y8W4ufKp8nxpevvOOZnNPYJsa8srJlSkc= X-Received: by 2002:a2e:7411:0:b0:2d0:d161:98c2 with SMTP id p17-20020a2e7411000000b002d0d16198c2mr168411ljc.35.1707516223450; Fri, 09 Feb 2024 14:03:43 -0800 (PST) List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@freebsd.org MIME-Version: 1.0 References: In-Reply-To: From: Zaphod Beeblebrox Date: Fri, 9 Feb 2024 17:03:30 -0500 Message-ID: Subject: Re: FreeBSD panics possibly caused by nfs clients To: Mark Johnston Cc: "Matthew L. Dailey" , "freebsd-current@freebsd.org" Content-Type: multipart/alternative; boundary="000000000000f273500610fa199d" X-Rspamd-Queue-Id: 4TWntT527Nz4FJL X-Spamd-Bar: ---- X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US] --000000000000f273500610fa199d Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Just in case it's relevant, I'm carrying around this patch on my fairly busy little RISC-V machine. diff --git a/sys/fs/nfsclient/nfs_clvnops.c b/sys/fs/nfsclient/nfs_clvnops.= c index 0b8c587a542c..85c0ebd7a10f 100644 --- a/sys/fs/nfsclient/nfs_clvnops.c +++ b/sys/fs/nfsclient/nfs_clvnops.c @@ -2459,6 +2459,16 @@ nfs_readdir(struct vop_readdir_args *ap) return (EINVAL); uio->uio_resid -=3D left; + /* + * For readdirplus, if starting to read the directory, + * purge the name cache, since it will be reloaded by + * this directory read. + * This removes potentially stale name cache entries. + */ + if (uio->uio_offset =3D=3D 0 && + (VFSTONFS(vp->v_mount)->nm_flag & NFSMNT_RDIRPLUS) !=3D 0) + cache_purge(vp); + /* * Call ncl_bioread() to do the real work. */ ... without it, I can panic. On Fri, Feb 9, 2024 at 4:18=E2=80=AFPM Mark Johnston wr= ote: > On Fri, Feb 09, 2024 at 06:23:08PM +0000, Matthew L. Dailey wrote: > > I had my first kernel panic with a KASAN kernel after only 01:27. This > > first panic was a "double fault," which isn't anything we've seen > > previously - usually we've seen trap 9 or trap 12, but sometimes others= . > > Based on the backtrace, it definitely looks like KASAN caught something= , > > but I don't have the expertise to know if this points to anything > > specific. From the backtrace, it looks like this might have originated > > in ipfw code. > > A double fault is rather unexpected. I presume you're running > releng/14.0? Is it at all possible to test with FreeBSD-CURRENT? > > Did you add INVARIANTS etc. to the kernel configuration used here, or > just KASAN? > > > Please let me know what other info I can provide or what I can do to di= g > > deeper. > > If you could repeat the test several times, I'd be interested in seeing > if you always get the same result. If you're willing to share the > vmcore (or several), I'd be willing to take a look at it. > > > Thanks!! > > > > Panic message: > > [5674] Fatal double fault > > [5674] rip 0xffffffff812f6e32 rsp 0xfffffe014677afe0 rbp > 0xfffffe014677b430 > > [5674] rax 0x1fffffc028cef620 rdx 0xf2f2f2f8f2f2f2f2 rbx 0x1 > > [5674] rcx 0xdffff7c000000000 rsi 0xfffffe004086a4a0 rdi > 0xf8f8f8f8f2f2f2f8 > > [5674] r8 0xf8f8f8f8f8f8f8f8 r9 0x162a r10 0x835003002d3a64e1 > > [5674] r11 0 r12 0xfffff78028cef620 r13 0xfffffe004086a440 > > [5674] r14 0xfffffe01488c0560 r15 0x26f40 rflags 0x10006 > > [5674] cs 0x20 ss 0x28 ds 0x3b es 0x3b fs 0x13 gs 0x1b > > [5674] fsbase 0x95d1d81a130 gsbase 0xffffffff84a14000 kgsbase 0 > > [5674] cpuid =3D 4; apic id =3D 08 > > [5674] panic: double fault > > [5674] cpuid =3D 4 > > [5674] time =3D 1707498420 > > [5674] KDB: stack backtrace: > > [5674] Uptime: 1h34m34s > > > > Backtrace: > > #0 __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:57 > > #1 doadump (textdump=3D) at > > /usr/src/sys/kern/kern_shutdown.c:405 > > #2 0xffffffff8128b7dc in kern_reboot (howto=3Dhowto@entry=3D260) > > at /usr/src/sys/kern/kern_shutdown.c:526 > > #3 0xffffffff8128c000 in vpanic ( > > fmt=3Dfmt@entry=3D0xffffffff82589a00 "double fault", > > ap=3Dap@entry=3D0xfffffe0040866de0) at > > /usr/src/sys/kern/kern_shutdown.c:970 > > #4 0xffffffff8128bd75 in panic (fmt=3D0xffffffff82589a00 "double > > fault") > > at /usr/src/sys/kern/kern_shutdown.c:894 > > #5 0xffffffff81c4b335 in dblfault_handler (frame=3D) > > at /usr/src/sys/amd64/amd64/trap.c:1012 > > #6 > > #7 0xffffffff812f6e32 in sched_clock (td=3Dtd@entry=3D0xfffffe01488c05= 60, > > cnt=3Dcnt@entry=3D1) at /usr/src/sys/kern/sched_ule.c:2601 > > #8 0xffffffff8119e2a7 in statclock (cnt=3Dcnt@entry=3D1, > > usermode=3Dusermode@entry=3D0) at /usr/src/sys/kern/kern_clock.c:7= 60 > > #9 0xffffffff8119fb67 in handleevents (now=3Dnow@entry=3D2437185569983= 2, > > fake=3Dfake@entry=3D0) at /usr/src/sys/kern/kern_clocksource.c:195 > > #10 0xffffffff811a10cc in timercb (et=3D, arg=3D out>) > > at /usr/src/sys/kern/kern_clocksource.c:353 > > #11 0xffffffff81dcd280 in lapic_handle_timer (frame=3D0xfffffe014677b75= 0) > > at /usr/src/sys/x86/x86/local_apic.c:1343 > > #12 > > #13 __asan_load8_noabort (addr=3D18446741880219689232) > > at /usr/src/sys/kern/subr_asan.c:1113 > > #14 0xffffffff851488b8 in ?? () from /boot/thayer/ipfw.ko > > #15 0xfffffe0100000000 in ?? () > > #16 0xffffffff8134dcd5 in pcpu_find (cpuid=3D1238425856) > > at /usr/src/sys/kern/subr_pcpu.c:286 > > #17 0xffffffff85151f6f in ?? () from /boot/thayer/ipfw.ko > > #18 0x0000000000000000 in ?? () > > --000000000000f273500610fa199d Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Just in case it's relevant, I'm carrying arou= nd this patch on my fairly busy little RISC-V machine.

=
diff --git a/sys/fs/nfsclient/nfs_clvnops.c b/sys/fs/nfsclient/nfs_clv= nops.c
index 0b8c587a542c..85c0ebd7a10f 100644
--- a/sys/fs/nfsclient= /nfs_clvnops.c
+++ b/sys/fs/nfsclient/nfs_clvnops.c
@@ -2459,6 +2459,= 16 @@ nfs_readdir(struct vop_readdir_args *ap)
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 return (EINVAL);
=C2=A0 =C2=A0 =C2=A0 = =C2=A0 uio->uio_resid -=3D left;

+ =C2=A0 =C2=A0 =C2=A0 /*
+ = =C2=A0 =C2=A0 =C2=A0 =C2=A0* For readdirplus, if starting to read the direc= tory,
+ =C2=A0 =C2=A0 =C2=A0 =C2=A0* purge the name cache, since it will= be reloaded by
+ =C2=A0 =C2=A0 =C2=A0 =C2=A0* this directory read.
+= =C2=A0 =C2=A0 =C2=A0 =C2=A0* This removes potentially stale name cache ent= ries.
+ =C2=A0 =C2=A0 =C2=A0 =C2=A0*/
+ =C2=A0 =C2=A0 =C2=A0 if (uio-= >uio_offset =3D=3D 0 &&
+ =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = (VFSTONFS(vp->v_mount)->nm_flag & NFSMNT_RDIRPLUS) !=3D 0)
+ = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 cache_purge(vp);
+
= =C2=A0 =C2=A0 =C2=A0 =C2=A0 /*
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0* Call = ncl_bioread() to do the real work.
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0*/<= /div>
... without it, I can panic.


On Fri, Feb 9, 2= 024 at 4:18=E2=80=AFPM Mark Johnston <markj@freebsd.org> wrote:
On Fri, Feb 09, 2024 at 06:23:08PM +0000, Matthew L. Da= iley wrote:
> I had my first kernel panic with a KASAN kernel after only 01:27. This=
> first panic was a "double fault," which isn't anything w= e've seen
> previously - usually we've seen trap 9 or trap 12, but sometimes o= thers.
> Based on the backtrace, it definitely looks like KASAN caught somethin= g,
> but I don't have the expertise to know if this points to anything =
> specific. From the backtrace, it looks like this might have originated=
> in ipfw code.

A double fault is rather unexpected.=C2=A0 I presume you're running
releng/14.0?=C2=A0 Is it at all possible to test with FreeBSD-CURRENT?

Did you add INVARIANTS etc. to the kernel configuration used here, or
just KASAN?

> Please let me know what other info I can provide or what I can do to d= ig
> deeper.

If you could repeat the test several times, I'd be interested in seeing=
if you always get the same result.=C2=A0 If you're willing to share the=
vmcore (or several), I'd be willing to take a look at it.

> Thanks!!
>
> Panic message:
> [5674] Fatal double fault
> [5674] rip 0xffffffff812f6e32 rsp 0xfffffe014677afe0 rbp 0xfffffe01467= 7b430
> [5674] rax 0x1fffffc028cef620 rdx 0xf2f2f2f8f2f2f2f2 rbx 0x1
> [5674] rcx 0xdffff7c000000000 rsi 0xfffffe004086a4a0 rdi 0xf8f8f8f8f2f= 2f2f8
> [5674] r8 0xf8f8f8f8f8f8f8f8 r9 0x162a r10 0x835003002d3a64e1
> [5674] r11 0 r12 0xfffff78028cef620 r13 0xfffffe004086a440
> [5674] r14 0xfffffe01488c0560 r15 0x26f40 rflags 0x10006
> [5674] cs 0x20 ss 0x28 ds 0x3b es 0x3b fs 0x13 gs 0x1b
> [5674] fsbase 0x95d1d81a130 gsbase 0xffffffff84a14000 kgsbase 0
> [5674] cpuid =3D 4; apic id =3D 08
> [5674] panic: double fault
> [5674] cpuid =3D 4
> [5674] time =3D 1707498420
> [5674] KDB: stack backtrace:
> [5674] Uptime: 1h34m34s
>
> Backtrace:
> #0=C2=A0 __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:57 > #1=C2=A0 doadump (textdump=3D<optimized out>) at
> /usr/src/sys/kern/kern_shutdown.c:405
> #2=C2=A0 0xffffffff8128b7dc in kern_reboot (howto=3Dhowto@entry=3D260)=
>=C2=A0 =C2=A0 =C2=A0 at /usr/src/sys/kern/kern_shutdown.c:526
> #3=C2=A0 0xffffffff8128c000 in vpanic (
>=C2=A0 =C2=A0 =C2=A0 fmt=3Dfmt@entry=3D0xffffffff82589a00 <str> &= quot;double fault",
>=C2=A0 =C2=A0 =C2=A0 ap=3Dap@entry=3D0xfffffe0040866de0) at
> /usr/src/sys/kern/kern_shutdown.c:970
> #4=C2=A0 0xffffffff8128bd75 in panic (fmt=3D0xffffffff82589a00 <str= > "double
> fault")
>=C2=A0 =C2=A0 =C2=A0 at /usr/src/sys/kern/kern_shutdown.c:894
> #5=C2=A0 0xffffffff81c4b335 in dblfault_handler (frame=3D<optimized= out>)
>=C2=A0 =C2=A0 =C2=A0 at /usr/src/sys/amd64/amd64/trap.c:1012
> #6=C2=A0 <signal handler called>
> #7=C2=A0 0xffffffff812f6e32 in sched_clock (td=3Dtd@entry=3D0xfffffe01= 488c0560,
>=C2=A0 =C2=A0 =C2=A0 cnt=3Dcnt@entry=3D1) at /usr/src/sys/kern/sched_ul= e.c:2601
> #8=C2=A0 0xffffffff8119e2a7 in statclock (cnt=3Dcnt@entry=3D1,
>=C2=A0 =C2=A0 =C2=A0 usermode=3Dusermode@entry=3D0) at /usr/src/sys/ker= n/kern_clock.c:760
> #9=C2=A0 0xffffffff8119fb67 in handleevents (now=3Dnow@entry=3D2437185= 5699832,
>=C2=A0 =C2=A0 =C2=A0 fake=3Dfake@entry=3D0) at /usr/src/sys/kern/kern_c= locksource.c:195
> #10 0xffffffff811a10cc in timercb (et=3D<optimized out>, arg=3D&= lt;optimized out>)
>=C2=A0 =C2=A0 =C2=A0 at /usr/src/sys/kern/kern_clocksource.c:353
> #11 0xffffffff81dcd280 in lapic_handle_timer (frame=3D0xfffffe014677b7= 50)
>=C2=A0 =C2=A0 =C2=A0 at /usr/src/sys/x86/x86/local_apic.c:1343
> #12 <signal handler called>
> #13 __asan_load8_noabort (addr=3D18446741880219689232)
>=C2=A0 =C2=A0 =C2=A0 at /usr/src/sys/kern/subr_asan.c:1113
> #14 0xffffffff851488b8 in ?? () from /boot/thayer/ipfw.ko
> #15 0xfffffe0100000000 in ?? ()
> #16 0xffffffff8134dcd5 in pcpu_find (cpuid=3D1238425856)
>=C2=A0 =C2=A0 =C2=A0 at /usr/src/sys/kern/subr_pcpu.c:286
> #17 0xffffffff85151f6f in ?? () from /boot/thayer/ipfw.ko
> #18 0x0000000000000000 in ?? ()

--000000000000f273500610fa199d--