ly@freebsd.org)
X-Authentication-Warning: kenobi.freebsd.org: www set sender to bugzilla-noreply@freebsd.org using -f
From: bugzilla-noreply@freebsd.org
To: fs@FreeBSD.org
Subject: [Bug 278958] zfs panic: page fault in sync_dnodes_task
Date: Tue, 14 May 2024 15:45:55 +0000
X-Bugzilla-Reason: AssignedTo
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: Base System
X-Bugzilla-Component: kern
X-Bugzilla-Version: 14.0-RELEASE
X-Bugzilla-Keywords: crash
X-Bugzilla-Severity: Affects Only Me
X-Bugzilla-Who: nunziotocci2000@gmail.com
X-Bugzilla-Status: New
X-Bugzilla-Resolution:
X-Bugzilla-Priority: ---
X-Bugzilla-Assigned-To: fs@FreeBSD.org
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields:
Message-ID:
In-Reply-To:
References:
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/
Auto-Submitted: auto-generated
List-Id: Filesystems
List-Archive: https://lists.freebsd.org/archives/freebsd-fs
List-Help:
List-Post:
List-Subscribe:
List-Unsubscribe:
Sender: owner-freebsd-fs@FreeBSD.org
MIME-Version: 1.0
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D278958
--- Comment #1 from nunziotocci2000@gmail.com ---
Another panic this morning at 3:33AM with an identical backtrace.
Looking at the core.txt there was a backup running this time as well. There=
is
a running `zfs` process with a `sudo` as the parent, and `sshd` as the next
process in the tree.
A `zpool status` shoes that our jail dataset experience an error:
pool: zsmtp_jail
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P
scan: scrub repaired 92K in 00:23:52 with 0 errors on Tue May 14 03:53:36
2024
config:
NAME STATE READ WRITE CKSUM
zsmtp_jail ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
nda1p3 ONLINE 0 0 1
nda2p3 ONLINE 0 0 0
It seems to me that this was likely caused by the panic corrupting somethin=
g on
that drive. smartctl -a /dev/nvme1 doesn't seem out of the ordinary:
=3D=3D=3D START OF INFORMATION SECTION =3D=3D=3D
Model Number: Force MP600
Serial Number: 2006823000012856205C
Firmware Version: EGFM11.3
PCI Vendor/Subsystem ID: 0x1987
IEEE OUI Identifier: 0x6479a7
Total NVM Capacity: 2,000,398,934,016 [2.00 TB]
Unallocated NVM Capacity: 0
Controller ID: 1
NVMe Version: 1.3
Number of Namespaces: 1
Namespace 1 Size/Capacity: 2,000,398,934,016 [2.00 TB]
Namespace 1 Formatted LBA Size: 512
Namespace 1 IEEE EUI-64: 6479a7 2fc124183d
Local Time is: Tue May 14 10:42:58 2024 CDT
Firmware Updates (0x12): 1 Slot, no Reset required
Optional Admin Commands (0x0017): Security Format Frmw_DL Self_Test
Optional NVM Commands (0x005d): Comp DS_Mngmt Wr_Zero Sav/Sel_Feat Time=
stmp
Log Page Attributes (0x08): Telmtry_Lg
Maximum Data Transfer Size: 512 Pages
Warning Comp. Temp. Threshold: 90 Celsius
Critical Comp. Temp. Threshold: 95 Celsius
Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 9.78W - - 0 0 0 0 0 0
1 + 6.75W - - 1 1 1 1 0 0
2 + 5.23W - - 2 2 2 2 0 0
3 - 0.0490W - - 3 3 3 3 2000 2000
4 - 0.0018W - - 4 4 4 4 25000 25000
Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 + 512 0 2
1 - 4096 0 1
=3D=3D=3D START OF SMART DATA SECTION =3D=3D=3D
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x00
Temperature: 39 Celsius
Available Spare: 100%
Available Spare Threshold: 5%
Percentage Used: 3%
Data Units Read: 295,807,203 [151 TB]
Data Units Written: 98,425,650 [50.3 TB]
Host Read Commands: 1,499,250,766
Host Write Commands: 2,297,088,561
Controller Busy Time: 5,756
Power Cycles: 108
Power On Hours: 33,640
Unsafe Shutdowns: 67
Media and Data Integrity Errors: 0
Error Information Log Entries: 543
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Error Information (NVMe Log 0x01, 16 of 63 entries)
No Errors Logged
Self-test Log (NVMe Log 0x06)
Self-test status: No self-test in progress
Num Test_Description Status Power_on_Hours Failing=
_LBA
NSID Seg SCT Code
0 Short Completed without error 30728 =
-
- - - -
I will run a SMART self test and report the results.
--=20
You are receiving this mail because:
You are the assignee for the bug.=