From nobody Wed Aug 21 14:45:31 2024 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4WppzW3njrz5V0LK for ; Wed, 21 Aug 2024 14:45:43 +0000 (UTC) (envelope-from rick.macklem@gmail.com) Received: from mail-pg1-x530.google.com (mail-pg1-x530.google.com [IPv6:2607:f8b0:4864:20::530]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "WR4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4WppzW25cgz4Jq9 for ; Wed, 21 Aug 2024 14:45:43 +0000 (UTC) (envelope-from rick.macklem@gmail.com) Authentication-Results: mx1.freebsd.org; none Received: by mail-pg1-x530.google.com with SMTP id 41be03b00d2f7-656d8b346d2so3989846a12.2 for ; Wed, 21 Aug 2024 07:45:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1724251542; x=1724856342; darn=freebsd.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=ZxvJ8xMhaVCajf4JEkGK8K5zDzXri/JsiHJwbuIOgUQ=; b=SexOk49UBTJfoZa5dJx85E1YwFCofO93RxdgtAnK4fA8XvtGABDue9DwS3w2oLd+rr CAIvBr2JJNjWfEDdIPmWLhi4+3mh/O60DueZfQ3biGqlpxA/vOo6Xr4h6lqOrXWrVW9g CzxXE/fx3reqlWyL99iVZii+yObiKFDK/Kj6K0bsk9Zif70UcLwWAh2NFnGjTYc1JANV p+VcBtJlkeFjO0lWxq+KpWaB14WEjQETQaPx9sU9GkMHVlcrBPMKrIbL2/3SikAkBz9Z kAM/TJa2DrwSugP+XNij8AGDelFbuaO/EtrstqTuyRo7dAzhAP/41Qe3WARhLWRJNozu v4Ow== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724251542; x=1724856342; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=ZxvJ8xMhaVCajf4JEkGK8K5zDzXri/JsiHJwbuIOgUQ=; b=pEBznvQbeICFeKIVCUoHvbNYAR8n96mq4A4+9qETW8j6jPLsnI+vTlZhOOAMTT3zSO FZP7jQLhHUeCWqP2nQB37PH7UNnIon43lI6taLWzsXArocV76pO/259BJSg0alMEHyev xrGEl9GvRARbIPLrqsNCAqhmxHPEXcCYeR880+Kl3zZ/oqxaekkmD7B0dch9oQz++yQq zTkanBw2dNqBIjRnZ6sz6gT2mSHneNQTDepqzHyJsh3hzFiEAR5geBIt7bSGuf1zh+XL 699qiaHNfYYTcZZAqPKZ/Ca/I1Z2ranbdEQfAopvPa87FFga+oIDLO5gPut0kt+EgrQd TtvQ== X-Gm-Message-State: AOJu0YyWYc/jAdfEZ3rXFt95dgXP2b9C8qnqn9RigpD10YtSnqhUaRdY XEZEL3mis22bEhuMG1rrf7G8rVIMDJitWYQE1C5lxwVyAbOuMfHPUSnhbH8uEDaThvevHGRD8IL FbgzpFdopVpfskxw8UYO3eNMRCQ== X-Google-Smtp-Source: AGHT+IHhD1U4u/OveiN8Smiz2F4bVFwS9mRww+cN828shBmU5arM5FO8X2md76sm8cpBsfs7ZqczWGndAa58uJWatD4= X-Received: by 2002:a05:6a21:3983:b0:1c0:e1a5:9583 with SMTP id adf61e73a8af0-1cad810c8b2mr3192759637.17.1724251541580; Wed, 21 Aug 2024 07:45:41 -0700 (PDT) List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@FreeBSD.org MIME-Version: 1.0 References: <3ed15b15-6f1c-4290-a552-aaafef7cc82e@dartmouth.edu> In-Reply-To: <3ed15b15-6f1c-4290-a552-aaafef7cc82e@dartmouth.edu> From: Rick Macklem Date: Wed, 21 Aug 2024 07:45:31 -0700 Message-ID: Subject: Re: Kernel panics with vfs.nfsd.enable_locallocks=1 and nfs clients doing hdf5 file operations To: "Matthew L. Dailey" Cc: "freebsd-current@freebsd.org" Content-Type: multipart/alternative; boundary="000000000000a3b8890620329877" X-Spamd-Bar: ---- X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; TAGGED_FROM(0.00)[]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US] X-Rspamd-Queue-Id: 4WppzW25cgz4Jq9 --000000000000a3b8890620329877 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Please create a PR for this and include at least one backtrace. I will try and figure out how locallocks could cause it. I suspect few use locallocks=3D1. rick On Wed, Aug 21, 2024 at 7:29=E2=80=AFAM Matthew L. Dailey < Matthew.L.Dailey@dartmouth.edu> wrote: > Hi all, > > I posted messages to the this list back in February and March > ( > https://lists.freebsd.org/archives/freebsd-current/2024-February/005546.h= tml) > > regarding kernel panics we were having with nfs clients doing hdf5 file > operations. After a hiatus in troubleshooting, I had more time this > summer and have found the cause - the vfs.nfsd.enable_locallocks sysctl. > > When this is set to 1, we can induce either a panic or hung nfs server > (more rarely) usually within a few hours, but sometimes within several > days to a week. We have replicated this on 13.0 through 15.0-CURRENT > (20240725-82283cad12a4-271360). With this set to 0 (default), we are > unable to replicate the issue, even after several weeks of 24/7 hdf5 > file operations. > > One other side-effect of these panics is that on a few occasions it has > corrupted the root zpool beyond repair. This makes sense since kernel > memory is getting corrupted, but obviously makes this issue more impactfu= l. > > I'm hoping this is enough information to start narrowing down this > issue. We are specifically using this sysctl because we are also serving > files via samba and want to ensure consistent locking. > > I have provided some core dumps and backtraces previously, but am happy > to provide more as needed. I also have a writeup of exactly how to > reproduce this that I can send directly to anyone who is interested. > > Thanks so much for any and all help with this tricky problem. I'm happy > to do whatever I can to help get this squashed. > > Best, > Matt > --000000000000a3b8890620329877 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Please create a PR for this and include at least
one backtrace. I will try and fi= gure out how
locallocks could cause it.

I suspect few use locallocks=3D1.

rick

On Wed, Aug 21, 2024 at 7:= 29=E2=80=AFAM Matthew L. Dailey <Matthew.L.Dailey@dartmouth.edu> wrote:
Hi all,

I posted messages to the this list back in February and March
(https://lists.freebsd.o= rg/archives/freebsd-current/2024-February/005546.html)
regarding kernel panics we were having with nfs clients doing hdf5 file operations. After a hiatus in troubleshooting, I had more time this
summer and have found the cause - the vfs.nfsd.enable_locallocks sysctl.
When this is set to 1, we can induce either a panic or hung nfs server
(more rarely) usually within a few hours, but sometimes within several
days to a week. We have replicated this on 13.0 through 15.0-CURRENT
(20240725-82283cad12a4-271360). With this set to 0 (default), we are
unable to replicate the issue, even after several weeks of 24/7 hdf5
file operations.

One other side-effect of these panics is that on a few occasions it has corrupted the root zpool beyond repair. This makes sense since kernel
memory is getting corrupted, but obviously makes this issue more impactful.=

I'm hoping this is enough information to start narrowing down this
issue. We are specifically using this sysctl because we are also serving files via samba and want to ensure consistent locking.

I have provided some core dumps and backtraces previously, but am happy to provide more as needed. I also have a writeup of exactly how to
reproduce this that I can send directly to anyone who is interested.

Thanks so much for any and all help with this tricky problem. I'm happy=
to do whatever I can to help get this squashed.

Best,
Matt
--000000000000a3b8890620329877--