Date: Wed, 21 Aug 2024 07:45:31 -0700 From: Rick Macklem <rick.macklem@gmail.com> To: "Matthew L. Dailey" <Matthew.L.Dailey@dartmouth.edu> Cc: "freebsd-current@freebsd.org" <freebsd-current@freebsd.org> Subject: Re: Kernel panics with vfs.nfsd.enable_locallocks=1 and nfs clients doing hdf5 file operations Message-ID: <CAM5tNy4EjOZhtd=SjvY1E2rgerBeFDW2eSrHsbufzf%2BMu%2BROhQ@mail.gmail.com> In-Reply-To: <3ed15b15-6f1c-4290-a552-aaafef7cc82e@dartmouth.edu>
index | next in thread | previous in thread | raw e-mail
[-- Attachment #1 --] Please create a PR for this and include at least one backtrace. I will try and figure out how locallocks could cause it. I suspect few use locallocks=1. rick On Wed, Aug 21, 2024 at 7:29 AM Matthew L. Dailey < Matthew.L.Dailey@dartmouth.edu> wrote: > Hi all, > > I posted messages to the this list back in February and March > ( > https://lists.freebsd.org/archives/freebsd-current/2024-February/005546.html) > > regarding kernel panics we were having with nfs clients doing hdf5 file > operations. After a hiatus in troubleshooting, I had more time this > summer and have found the cause - the vfs.nfsd.enable_locallocks sysctl. > > When this is set to 1, we can induce either a panic or hung nfs server > (more rarely) usually within a few hours, but sometimes within several > days to a week. We have replicated this on 13.0 through 15.0-CURRENT > (20240725-82283cad12a4-271360). With this set to 0 (default), we are > unable to replicate the issue, even after several weeks of 24/7 hdf5 > file operations. > > One other side-effect of these panics is that on a few occasions it has > corrupted the root zpool beyond repair. This makes sense since kernel > memory is getting corrupted, but obviously makes this issue more impactful. > > I'm hoping this is enough information to start narrowing down this > issue. We are specifically using this sysctl because we are also serving > files via samba and want to ensure consistent locking. > > I have provided some core dumps and backtraces previously, but am happy > to provide more as needed. I also have a writeup of exactly how to > reproduce this that I can send directly to anyone who is interested. > > Thanks so much for any and all help with this tricky problem. I'm happy > to do whatever I can to help get this squashed. > > Best, > Matt > [-- Attachment #2 --] <div dir="ltr"><div class="gmail_default" style="font-family:monospace">Please create a PR for this and include at least</div><div class="gmail_default" style="font-family:monospace">one backtrace. I will try and figure out how</div><div class="gmail_default" style="font-family:monospace">locallocks could cause it.</div><div class="gmail_default" style="font-family:monospace"><br></div><div class="gmail_default" style="font-family:monospace">I suspect few use locallocks=1.</div><div class="gmail_default" style="font-family:monospace"><br></div><div class="gmail_default" style="font-family:monospace">rick</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Aug 21, 2024 at 7:29 AM Matthew L. Dailey <<a href="mailto:Matthew.L.Dailey@dartmouth.edu">Matthew.L.Dailey@dartmouth.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi all,<br> <br> I posted messages to the this list back in February and March <br> (<a href="https://lists.freebsd.org/archives/freebsd-current/2024-February/005546.html" rel="noreferrer" target="_blank">https://lists.freebsd.org/archives/freebsd-current/2024-February/005546.html</a>) <br> regarding kernel panics we were having with nfs clients doing hdf5 file <br> operations. After a hiatus in troubleshooting, I had more time this <br> summer and have found the cause - the vfs.nfsd.enable_locallocks sysctl.<br> <br> When this is set to 1, we can induce either a panic or hung nfs server <br> (more rarely) usually within a few hours, but sometimes within several <br> days to a week. We have replicated this on 13.0 through 15.0-CURRENT <br> (20240725-82283cad12a4-271360). With this set to 0 (default), we are <br> unable to replicate the issue, even after several weeks of 24/7 hdf5 <br> file operations.<br> <br> One other side-effect of these panics is that on a few occasions it has <br> corrupted the root zpool beyond repair. This makes sense since kernel <br> memory is getting corrupted, but obviously makes this issue more impactful.<br> <br> I'm hoping this is enough information to start narrowing down this <br> issue. We are specifically using this sysctl because we are also serving <br> files via samba and want to ensure consistent locking.<br> <br> I have provided some core dumps and backtraces previously, but am happy <br> to provide more as needed. I also have a writeup of exactly how to <br> reproduce this that I can send directly to anyone who is interested.<br> <br> Thanks so much for any and all help with this tricky problem. I'm happy <br> to do whatever I can to help get this squashed.<br> <br> Best,<br> Matt<br> </blockquote></div>help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAM5tNy4EjOZhtd=SjvY1E2rgerBeFDW2eSrHsbufzf%2BMu%2BROhQ>
