Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 30 Sep 2024 17:10:03 -0700
From:      Rick Macklem <rick.macklem@gmail.com>
To:        "Matthew L. Dailey" <Matthew.L.Dailey@dartmouth.edu>
Cc:        "freebsd-current@freebsd.org" <freebsd-current@freebsd.org>
Subject:   Re: Kernel panics with vfs.nfsd.enable_locallocks=1 and nfs clients doing hdf5 file operations
Message-ID:  <CAM5tNy68b5nYzwndL6UzsQqB-uqSzK%2BKS%2B8BGurL1tfBT4e7SQ@mail.gmail.com>
In-Reply-To: <43c2f01e-c1b4-493d-8187-11e16d1da851@dartmouth.edu>
References:  <3ed15b15-6f1c-4290-a552-aaafef7cc82e@dartmouth.edu> <CAM5tNy4EjOZhtd=SjvY1E2rgerBeFDW2eSrHsbufzf%2BMu%2BROhQ@mail.gmail.com> <43c2f01e-c1b4-493d-8187-11e16d1da851@dartmouth.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Aug 21, 2024 at 8:02=E2=80=AFAM Matthew L. Dailey
<Matthew.L.Dailey@dartmouth.edu> wrote:
>
> Hi Rick,
>
> Done - https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D280978
Just fyi for everyone, the bugzilla PR now has a patch that has been
committed to main as eb345e05ac66.  Early indications are that it
fixes the race that was causing this problem.
Although testing is still in progress, I committed it so that it can be
MFC'd to stable/14 in time for 14.2.

Thanks go to Matt for reporting this and testing the patch, rick

>
> Thanks!
>
> -Matt
>
> On 8/21/24 10:45 AM, Rick Macklem wrote:
> > Please create a PR for this and include at least
> > one backtrace. I will try and figure out how
> > locallocks could cause it.
> >
> > I suspect few use locallocks=3D1.
> >
> > rick
> >
> > On Wed, Aug 21, 2024 at 7:29=E2=80=AFAM Matthew L. Dailey
> > <Matthew.L.Dailey@dartmouth.edu <mailto:Matthew.L.Dailey@dartmouth.edu>=
>
> > wrote:
> >
> >     Hi all,
> >
> >     I posted messages to the this list back in February and March
> >     (https://lists.freebsd.org/archives/freebsd-current/2024-February/0=
05546.html <https://lists.freebsd.org/archives/freebsd-current/2024-Februar=
y/005546.html>)
> >     regarding kernel panics we were having with nfs clients doing hdf5 =
file
> >     operations. After a hiatus in troubleshooting, I had more time this
> >     summer and have found the cause - the vfs.nfsd.enable_locallocks sy=
sctl.
> >
> >     When this is set to 1, we can induce either a panic or hung nfs ser=
ver
> >     (more rarely) usually within a few hours, but sometimes within seve=
ral
> >     days to a week. We have replicated this on 13.0 through 15.0-CURREN=
T
> >     (20240725-82283cad12a4-271360). With this set to 0 (default), we ar=
e
> >     unable to replicate the issue, even after several weeks of 24/7 hdf=
5
> >     file operations.
> >
> >     One other side-effect of these panics is that on a few occasions it=
 has
> >     corrupted the root zpool beyond repair. This makes sense since kern=
el
> >     memory is getting corrupted, but obviously makes this issue more
> >     impactful.
> >
> >     I'm hoping this is enough information to start narrowing down this
> >     issue. We are specifically using this sysctl because we are also
> >     serving
> >     files via samba and want to ensure consistent locking.
> >
> >     I have provided some core dumps and backtraces previously, but am h=
appy
> >     to provide more as needed. I also have a writeup of exactly how to
> >     reproduce this that I can send directly to anyone who is interested=
.
> >
> >     Thanks so much for any and all help with this tricky problem. I'm h=
appy
> >     to do whatever I can to help get this squashed.
> >
> >     Best,
> >     Matt
> >



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAM5tNy68b5nYzwndL6UzsQqB-uqSzK%2BKS%2B8BGurL1tfBT4e7SQ>