Date: Mon, 30 Sep 2024 17:10:03 -0700 From: Rick Macklem <rick.macklem@gmail.com> To: "Matthew L. Dailey" <Matthew.L.Dailey@dartmouth.edu> Cc: "freebsd-current@freebsd.org" <freebsd-current@freebsd.org> Subject: Re: Kernel panics with vfs.nfsd.enable_locallocks=1 and nfs clients doing hdf5 file operations Message-ID: <CAM5tNy68b5nYzwndL6UzsQqB-uqSzK%2BKS%2B8BGurL1tfBT4e7SQ@mail.gmail.com> In-Reply-To: <43c2f01e-c1b4-493d-8187-11e16d1da851@dartmouth.edu> References: <3ed15b15-6f1c-4290-a552-aaafef7cc82e@dartmouth.edu> <CAM5tNy4EjOZhtd=SjvY1E2rgerBeFDW2eSrHsbufzf%2BMu%2BROhQ@mail.gmail.com> <43c2f01e-c1b4-493d-8187-11e16d1da851@dartmouth.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Aug 21, 2024 at 8:02=E2=80=AFAM Matthew L. Dailey <Matthew.L.Dailey@dartmouth.edu> wrote: > > Hi Rick, > > Done - https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D280978 Just fyi for everyone, the bugzilla PR now has a patch that has been committed to main as eb345e05ac66. Early indications are that it fixes the race that was causing this problem. Although testing is still in progress, I committed it so that it can be MFC'd to stable/14 in time for 14.2. Thanks go to Matt for reporting this and testing the patch, rick > > Thanks! > > -Matt > > On 8/21/24 10:45 AM, Rick Macklem wrote: > > Please create a PR for this and include at least > > one backtrace. I will try and figure out how > > locallocks could cause it. > > > > I suspect few use locallocks=3D1. > > > > rick > > > > On Wed, Aug 21, 2024 at 7:29=E2=80=AFAM Matthew L. Dailey > > <Matthew.L.Dailey@dartmouth.edu <mailto:Matthew.L.Dailey@dartmouth.edu>= > > > wrote: > > > > Hi all, > > > > I posted messages to the this list back in February and March > > (https://lists.freebsd.org/archives/freebsd-current/2024-February/0= 05546.html <https://lists.freebsd.org/archives/freebsd-current/2024-Februar= y/005546.html>) > > regarding kernel panics we were having with nfs clients doing hdf5 = file > > operations. After a hiatus in troubleshooting, I had more time this > > summer and have found the cause - the vfs.nfsd.enable_locallocks sy= sctl. > > > > When this is set to 1, we can induce either a panic or hung nfs ser= ver > > (more rarely) usually within a few hours, but sometimes within seve= ral > > days to a week. We have replicated this on 13.0 through 15.0-CURREN= T > > (20240725-82283cad12a4-271360). With this set to 0 (default), we ar= e > > unable to replicate the issue, even after several weeks of 24/7 hdf= 5 > > file operations. > > > > One other side-effect of these panics is that on a few occasions it= has > > corrupted the root zpool beyond repair. This makes sense since kern= el > > memory is getting corrupted, but obviously makes this issue more > > impactful. > > > > I'm hoping this is enough information to start narrowing down this > > issue. We are specifically using this sysctl because we are also > > serving > > files via samba and want to ensure consistent locking. > > > > I have provided some core dumps and backtraces previously, but am h= appy > > to provide more as needed. I also have a writeup of exactly how to > > reproduce this that I can send directly to anyone who is interested= . > > > > Thanks so much for any and all help with this tricky problem. I'm h= appy > > to do whatever I can to help get this squashed. > > > > Best, > > Matt > >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAM5tNy68b5nYzwndL6UzsQqB-uqSzK%2BKS%2B8BGurL1tfBT4e7SQ>