From nobody Tue Oct 1 00:10:03 2024 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4XHdcV6c7Jz5Xt65 for ; Tue, 01 Oct 2024 00:10:18 +0000 (UTC) (envelope-from rick.macklem@gmail.com) Received: from mail-ej1-x632.google.com (mail-ej1-x632.google.com [IPv6:2a00:1450:4864:20::632]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "WR4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4XHdcT4LdXz4gn1 for ; Tue, 1 Oct 2024 00:10:17 +0000 (UTC) (envelope-from rick.macklem@gmail.com) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20230601 header.b=m1xg46gZ; spf=pass (mx1.freebsd.org: domain of rick.macklem@gmail.com designates 2a00:1450:4864:20::632 as permitted sender) smtp.mailfrom=rick.macklem@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-ej1-x632.google.com with SMTP id a640c23a62f3a-a8d3cde1103so687549066b.2 for ; Mon, 30 Sep 2024 17:10:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1727741414; x=1728346214; darn=freebsd.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=OFdPC85aWHwhGWZCnWRTsPL922MAGjfEx/uXFj5od4g=; b=m1xg46gZnp9m+HZ0DZKW1ntlF/sMH0emT0dVbnyq9Xw7nNBrzP1V00MbSCtJkX9Rop RzwLoPQEMA1Ds0icHuX5EsSRTZcDsoYxL/u5OR+JkJCrBDvs4mRJ20VpY4rmk8JT+VgN +GplwbLetXQZY/JxOm2RRwP+0bBqas3wh7KM4XIPWzpOTYdnefDdxHUQGKQKkpYgFbUy uwBdwlTnCnDV6OeZgdhbVrK8qDzX90u2VpzI5s31IA91EpYA4Y3oraf+f8Du8vBAujkd Rp+nOWrBOmd+drVuF43A/JT833CU311eXRjDVNpUzpBDoe0XY2/Q2jLQytfSAxq594aR rUEQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727741414; x=1728346214; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=OFdPC85aWHwhGWZCnWRTsPL922MAGjfEx/uXFj5od4g=; b=TfT1G0uKOHyG43Fas7zHqMJtiqBWbfNawqSira3OcEn1nM3Ua9KhTTzrWN0c/YxhOK c4LlbAxc/E6QabPa96SG2VbdLlGKaB1e+bwL1sPWiDFCE19RzvuRv/a8Rf6p4GTBIKzE boY2iIRXEr8UxCtV4e7gwW5A43r01VzxKVsTfawzpt4Md2SXyA6bCDz4bmU8HesCXRO3 s5rozGCslpNhtxi7b/v9lcPfC09DYqW/L2JXkxeFe2QHLfsptur/PDyHkp/r2sY64CKc zT2+YzpSL7YNN5YI302TSsMb8VZXvDTr79425KslPcjv+don0Vd7n3xk66uu5SANYRU0 WrbQ== X-Gm-Message-State: AOJu0YyfMXue2azxvSTnvLZkwdm7CAc4kD21XbUUOsuhaG+3EZkvZzmR ZRcVDFt4jUa9jP27jRRZrOjE8V0dmOKa7dm4V9YFldd17VDml0M7fsug91goL2Ir9K4E9VigXKA oG44viV3CbvitVc9r0bV6WsOJEg== X-Google-Smtp-Source: AGHT+IHDNmp/WaAm9Tk8CcSizPhzk6cyeNcJXHqhYI5JUIzJm5PK4y69DZGFJj2S3ohpDd5oXLINtg0dGNcQCyNYJas= X-Received: by 2002:a17:907:25c3:b0:a77:b01b:f949 with SMTP id a640c23a62f3a-a93c492a3a9mr1437817066b.35.1727741414240; Mon, 30 Sep 2024 17:10:14 -0700 (PDT) List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@FreeBSD.org MIME-Version: 1.0 References: <3ed15b15-6f1c-4290-a552-aaafef7cc82e@dartmouth.edu> <43c2f01e-c1b4-493d-8187-11e16d1da851@dartmouth.edu> In-Reply-To: <43c2f01e-c1b4-493d-8187-11e16d1da851@dartmouth.edu> From: Rick Macklem Date: Mon, 30 Sep 2024 17:10:03 -0700 Message-ID: Subject: Re: Kernel panics with vfs.nfsd.enable_locallocks=1 and nfs clients doing hdf5 file operations To: "Matthew L. Dailey" Cc: "freebsd-current@freebsd.org" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spamd-Result: default: False [-3.97 / 15.00]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_SHORT(-1.00)[-0.999]; NEURAL_HAM_MEDIUM(-0.97)[-0.968]; DMARC_POLICY_ALLOW(-0.50)[gmail.com,none]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20230601]; R_SPF_ALLOW(-0.20)[+ip6:2a00:1450:4000::/36]; MIME_GOOD(-0.10)[text/plain]; RCVD_TLS_LAST(0.00)[]; ARC_NA(0.00)[]; RCPT_COUNT_TWO(0.00)[2]; FROM_HAS_DN(0.00)[]; MIME_TRACE(0.00)[0:+]; DWL_DNSWL_NONE(0.00)[gmail.com:dkim]; TO_DN_EQ_ADDR_SOME(0.00)[]; FREEMAIL_FROM(0.00)[gmail.com]; TO_DN_SOME(0.00)[]; TAGGED_FROM(0.00)[]; MISSING_XM_UA(0.00)[]; FREEMAIL_ENVFROM(0.00)[gmail.com]; PREVIOUSLY_DELIVERED(0.00)[freebsd-current@freebsd.org]; TO_MATCH_ENVRCPT_SOME(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; DKIM_TRACE(0.00)[gmail.com:+]; MID_RHS_MATCH_FROMTLD(0.00)[]; ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US]; MLMMJ_DEST(0.00)[freebsd-current@freebsd.org]; RCVD_COUNT_ONE(0.00)[1]; RCVD_IN_DNSWL_NONE(0.00)[2a00:1450:4864:20::632:from] X-Rspamd-Queue-Id: 4XHdcT4LdXz4gn1 X-Spamd-Bar: --- On Wed, Aug 21, 2024 at 8:02=E2=80=AFAM Matthew L. Dailey wrote: > > Hi Rick, > > Done - https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D280978 Just fyi for everyone, the bugzilla PR now has a patch that has been committed to main as eb345e05ac66. Early indications are that it fixes the race that was causing this problem. Although testing is still in progress, I committed it so that it can be MFC'd to stable/14 in time for 14.2. Thanks go to Matt for reporting this and testing the patch, rick > > Thanks! > > -Matt > > On 8/21/24 10:45 AM, Rick Macklem wrote: > > Please create a PR for this and include at least > > one backtrace. I will try and figure out how > > locallocks could cause it. > > > > I suspect few use locallocks=3D1. > > > > rick > > > > On Wed, Aug 21, 2024 at 7:29=E2=80=AFAM Matthew L. Dailey > > = > > > wrote: > > > > Hi all, > > > > I posted messages to the this list back in February and March > > (https://lists.freebsd.org/archives/freebsd-current/2024-February/0= 05546.html ) > > regarding kernel panics we were having with nfs clients doing hdf5 = file > > operations. After a hiatus in troubleshooting, I had more time this > > summer and have found the cause - the vfs.nfsd.enable_locallocks sy= sctl. > > > > When this is set to 1, we can induce either a panic or hung nfs ser= ver > > (more rarely) usually within a few hours, but sometimes within seve= ral > > days to a week. We have replicated this on 13.0 through 15.0-CURREN= T > > (20240725-82283cad12a4-271360). With this set to 0 (default), we ar= e > > unable to replicate the issue, even after several weeks of 24/7 hdf= 5 > > file operations. > > > > One other side-effect of these panics is that on a few occasions it= has > > corrupted the root zpool beyond repair. This makes sense since kern= el > > memory is getting corrupted, but obviously makes this issue more > > impactful. > > > > I'm hoping this is enough information to start narrowing down this > > issue. We are specifically using this sysctl because we are also > > serving > > files via samba and want to ensure consistent locking. > > > > I have provided some core dumps and backtraces previously, but am h= appy > > to provide more as needed. I also have a writeup of exactly how to > > reproduce this that I can send directly to anyone who is interested= . > > > > Thanks so much for any and all help with this tricky problem. I'm h= appy > > to do whatever I can to help get this squashed. > > > > Best, > > Matt > >