Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 21 Aug 2024 07:45:31 -0700
From:      Rick Macklem <rick.macklem@gmail.com>
To:        "Matthew L. Dailey" <Matthew.L.Dailey@dartmouth.edu>
Cc:        "freebsd-current@freebsd.org" <freebsd-current@freebsd.org>
Subject:   Re: Kernel panics with vfs.nfsd.enable_locallocks=1 and nfs clients doing hdf5 file operations
Message-ID:  <CAM5tNy4EjOZhtd=SjvY1E2rgerBeFDW2eSrHsbufzf%2BMu%2BROhQ@mail.gmail.com>
In-Reply-To: <3ed15b15-6f1c-4290-a552-aaafef7cc82e@dartmouth.edu>
References:  <3ed15b15-6f1c-4290-a552-aaafef7cc82e@dartmouth.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
--000000000000a3b8890620329877
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

Please create a PR for this and include at least
one backtrace. I will try and figure out how
locallocks could cause it.

I suspect few use locallocks=3D1.

rick

On Wed, Aug 21, 2024 at 7:29=E2=80=AFAM Matthew L. Dailey <
Matthew.L.Dailey@dartmouth.edu> wrote:

> Hi all,
>
> I posted messages to the this list back in February and March
> (
> https://lists.freebsd.org/archives/freebsd-current/2024-February/005546.h=
tml)
>
> regarding kernel panics we were having with nfs clients doing hdf5 file
> operations. After a hiatus in troubleshooting, I had more time this
> summer and have found the cause - the vfs.nfsd.enable_locallocks sysctl.
>
> When this is set to 1, we can induce either a panic or hung nfs server
> (more rarely) usually within a few hours, but sometimes within several
> days to a week. We have replicated this on 13.0 through 15.0-CURRENT
> (20240725-82283cad12a4-271360). With this set to 0 (default), we are
> unable to replicate the issue, even after several weeks of 24/7 hdf5
> file operations.
>
> One other side-effect of these panics is that on a few occasions it has
> corrupted the root zpool beyond repair. This makes sense since kernel
> memory is getting corrupted, but obviously makes this issue more impactfu=
l.
>
> I'm hoping this is enough information to start narrowing down this
> issue. We are specifically using this sysctl because we are also serving
> files via samba and want to ensure consistent locking.
>
> I have provided some core dumps and backtraces previously, but am happy
> to provide more as needed. I also have a writeup of exactly how to
> reproduce this that I can send directly to anyone who is interested.
>
> Thanks so much for any and all help with this tricky problem. I'm happy
> to do whatever I can to help get this squashed.
>
> Best,
> Matt
>

--000000000000a3b8890620329877
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div class=3D"gmail_default" style=3D"font-family:monospac=
e">Please create a PR for this and include at least</div><div class=3D"gmai=
l_default" style=3D"font-family:monospace">one backtrace. I will try and fi=
gure out how</div><div class=3D"gmail_default" style=3D"font-family:monospa=
ce">locallocks could cause it.</div><div class=3D"gmail_default" style=3D"f=
ont-family:monospace"><br></div><div class=3D"gmail_default" style=3D"font-=
family:monospace">I suspect few use locallocks=3D1.</div><div class=3D"gmai=
l_default" style=3D"font-family:monospace"><br></div><div class=3D"gmail_de=
fault" style=3D"font-family:monospace">rick</div></div><br><div class=3D"gm=
ail_quote"><div dir=3D"ltr" class=3D"gmail_attr">On Wed, Aug 21, 2024 at 7:=
29=E2=80=AFAM Matthew L. Dailey &lt;<a href=3D"mailto:Matthew.L.Dailey@dart=
mouth.edu">Matthew.L.Dailey@dartmouth.edu</a>&gt; wrote:<br></div><blockquo=
te class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px =
solid rgb(204,204,204);padding-left:1ex">Hi all,<br>
<br>
I posted messages to the this list back in February and March <br>
(<a href=3D"https://lists.freebsd.org/archives/freebsd-current/2024-Februar=
y/005546.html" rel=3D"noreferrer" target=3D"_blank">https://lists.freebsd.o=
rg/archives/freebsd-current/2024-February/005546.html</a>) <br>
regarding kernel panics we were having with nfs clients doing hdf5 file <br=
>
operations. After a hiatus in troubleshooting, I had more time this <br>
summer and have found the cause - the vfs.nfsd.enable_locallocks sysctl.<br=
>
<br>
When this is set to 1, we can induce either a panic or hung nfs server <br>
(more rarely) usually within a few hours, but sometimes within several <br>
days to a week. We have replicated this on 13.0 through 15.0-CURRENT <br>
(20240725-82283cad12a4-271360). With this set to 0 (default), we are <br>
unable to replicate the issue, even after several weeks of 24/7 hdf5 <br>
file operations.<br>
<br>
One other side-effect of these panics is that on a few occasions it has <br=
>
corrupted the root zpool beyond repair. This makes sense since kernel <br>
memory is getting corrupted, but obviously makes this issue more impactful.=
<br>
<br>
I&#39;m hoping this is enough information to start narrowing down this <br>
issue. We are specifically using this sysctl because we are also serving <b=
r>
files via samba and want to ensure consistent locking.<br>
<br>
I have provided some core dumps and backtraces previously, but am happy <br=
>
to provide more as needed. I also have a writeup of exactly how to <br>
reproduce this that I can send directly to anyone who is interested.<br>
<br>
Thanks so much for any and all help with this tricky problem. I&#39;m happy=
 <br>
to do whatever I can to help get this squashed.<br>
<br>
Best,<br>
Matt<br>
</blockquote></div>

--000000000000a3b8890620329877--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAM5tNy4EjOZhtd=SjvY1E2rgerBeFDW2eSrHsbufzf%2BMu%2BROhQ>