ly@freebsd.org) X-Authentication-Warning: kenobi.freebsd.org: www set sender to bugzilla-noreply@freebsd.org using -f From: bugzilla-noreply@freebsd.org To: fs@FreeBSD.org Subject: [Bug 278958] zfs panic: page fault in sync_dnodes_task Date: Tue, 14 May 2024 15:45:55 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 14.0-RELEASE X-Bugzilla-Keywords: crash X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: nunziotocci2000@gmail.com X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: fs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated List-Id: Filesystems List-Archive: https://lists.freebsd.org/archives/freebsd-fs List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-fs@FreeBSD.org MIME-Version: 1.0 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D278958 --- Comment #1 from nunziotocci2000@gmail.com --- Another panic this morning at 3:33AM with an identical backtrace. Looking at the core.txt there was a backup running this time as well. There= is a running `zfs` process with a `sudo` as the parent, and `sshd` as the next process in the tree. A `zpool status` shoes that our jail dataset experience an error: pool: zsmtp_jail state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P scan: scrub repaired 92K in 00:23:52 with 0 errors on Tue May 14 03:53:36 2024 config: NAME STATE READ WRITE CKSUM zsmtp_jail ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 nda1p3 ONLINE 0 0 1 nda2p3 ONLINE 0 0 0 It seems to me that this was likely caused by the panic corrupting somethin= g on that drive. smartctl -a /dev/nvme1 doesn't seem out of the ordinary: =3D=3D=3D START OF INFORMATION SECTION =3D=3D=3D Model Number: Force MP600 Serial Number: 2006823000012856205C Firmware Version: EGFM11.3 PCI Vendor/Subsystem ID: 0x1987 IEEE OUI Identifier: 0x6479a7 Total NVM Capacity: 2,000,398,934,016 [2.00 TB] Unallocated NVM Capacity: 0 Controller ID: 1 NVMe Version: 1.3 Number of Namespaces: 1 Namespace 1 Size/Capacity: 2,000,398,934,016 [2.00 TB] Namespace 1 Formatted LBA Size: 512 Namespace 1 IEEE EUI-64: 6479a7 2fc124183d Local Time is: Tue May 14 10:42:58 2024 CDT Firmware Updates (0x12): 1 Slot, no Reset required Optional Admin Commands (0x0017): Security Format Frmw_DL Self_Test Optional NVM Commands (0x005d): Comp DS_Mngmt Wr_Zero Sav/Sel_Feat Time= stmp Log Page Attributes (0x08): Telmtry_Lg Maximum Data Transfer Size: 512 Pages Warning Comp. Temp. Threshold: 90 Celsius Critical Comp. Temp. Threshold: 95 Celsius Supported Power States St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat 0 + 9.78W - - 0 0 0 0 0 0 1 + 6.75W - - 1 1 1 1 0 0 2 + 5.23W - - 2 2 2 2 0 0 3 - 0.0490W - - 3 3 3 3 2000 2000 4 - 0.0018W - - 4 4 4 4 25000 25000 Supported LBA Sizes (NSID 0x1) Id Fmt Data Metadt Rel_Perf 0 + 512 0 2 1 - 4096 0 1 =3D=3D=3D START OF SMART DATA SECTION =3D=3D=3D SMART overall-health self-assessment test result: PASSED SMART/Health Information (NVMe Log 0x02) Critical Warning: 0x00 Temperature: 39 Celsius Available Spare: 100% Available Spare Threshold: 5% Percentage Used: 3% Data Units Read: 295,807,203 [151 TB] Data Units Written: 98,425,650 [50.3 TB] Host Read Commands: 1,499,250,766 Host Write Commands: 2,297,088,561 Controller Busy Time: 5,756 Power Cycles: 108 Power On Hours: 33,640 Unsafe Shutdowns: 67 Media and Data Integrity Errors: 0 Error Information Log Entries: 543 Warning Comp. Temperature Time: 0 Critical Comp. Temperature Time: 0 Error Information (NVMe Log 0x01, 16 of 63 entries) No Errors Logged Self-test Log (NVMe Log 0x06) Self-test status: No self-test in progress Num Test_Description Status Power_on_Hours Failing= _LBA NSID Seg SCT Code 0 Short Completed without error 30728 = - - - - - I will run a SMART self test and report the results. --=20 You are receiving this mail because: You are the assignee for the bug.=