Date: Fri, 28 Feb 2025 13:14:40 +0300 (MSK) From: Dmitry Morozovsky <woozle@woozle.net> To: Ronald Klop <ronald-lists@klop.ws> Cc: freebsd-fs@FreeBSD.org, mm@FreeBSD.org Subject: Re: zfs: non-redundant zpool suspended till hard boot on any transient error Message-ID: <alpine.BSF.2.00.2502281257030.11003@woozle.rinet.ru> In-Reply-To: <431556233.2998.1740664261847@localhost> References: <431556233.2998.1740664261847@localhost>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 27 Feb 2025, Ronald Klop wrote: > See man zpoolprops and look for failmode. > Default is ?wait?. > > Does that help? unfortunately no, actually, it doesn't seem to have any effect: (bext pool is on external USB SATA disk, which I power-cycled$ the same effect is on USB cable remove/insert, e.g., any device reset situation): - no `zpool clear' could be completed after pool suspension in standard 'wait' mode: NAME STATE READ WRITE CKSUM bext UNAVAIL 0 0 0 insufficient replicas gpt/bext REMOVED 0 0 0 errors: List of errors unavailable: pool I/O is currently suspended # zpool online bext /dev/gpt/bext cannot online /dev/gpt/bext: pool I/O is currently suspended # zpool clear bext cannot clear errors for bext: I/O error - setting mode to continue: # zpool set failmode=continue bext # zpool get failmode bext NAME PROPERTY VALUE SOURCE bext failmode continue local [power cycle the drive, kernlog excerpt: Feb 28 13:05:12 <kern.crit> bat kernel: ugen1.2: <JMicron USB to ATA/ATAPI Bridge> at usbus1 (disconnected) ... Feb 28 13:05:14 <kern.crit> bat kernel: (da0:umass-sim0:0:0:0): Periph destroyed ... Feb 28 13:05:18 <kern.crit> bat kernel: ugen1.2: <JMicron USB to ATA/ATAPI Bridge> at usbus1 ... Feb 28 13:05:19 <kern.crit> bat kernel: Solaris: WARNING: Feb 28 13:05:19 <kern.crit> bat kernel: Pool 'bext' has encountered an uncorrectable I/O failure and has been suspended. ... Feb 28 13:05:21 <kern.crit> bat kernel: da0: <JMicron Generic 0508> Fixed Direct Access SPC-4 SCSI device ... Feb 28 13:05:21 <kern.crit> bat kernel: da0: 3815447MB (7814037168 512 byte sectors) ] # zpool status -v bext pool: bext state: SUSPENDED status: One or more devices are faulted in response to IO failures. action: Make sure the affected devices are connected, then run 'zpool clear'. see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-JQ scan: scrub repaired 0B in 01:31:09 with 0 errors on Wed Feb 12 20:21:49 2025 config: NAME STATE READ WRITE CKSUM bext UNAVAIL 0 0 0 insufficient replicas gpt/bext REMOVED 0 0 0 errors: List of errors unavailable: pool I/O is currently suspended # zpool online bext dev/gpt/bext # zpool status -v bext pool: bext state: SUSPENDED status: One or more devices are faulted in response to IO failures. action: Make sure the affected devices are connected, then run 'zpool clear'. see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-JQ scan: scrub repaired 0B in 01:31:09 with 0 errors on Wed Feb 12 20:21:49 2025 config: NAME STATE READ WRITE CKSUM bext UNAVAIL 0 0 0 insufficient replicas gpt/bext REMOVED 0 0 0 errors: List of errors unavailable: pool I/O is currently suspended # zpool clear bext cannot clear errors for bext: I/O error # zpool export bext load: 0.03 cmd: zpool 2507 [tx->tx_sync_done_cv] 1.95r 0.00u 0.00s 0% 8808k load: 0.04 cmd: zpool 2507 [tx->tx_sync_done_cv] 86.99r 0.00u 0.00s 0% 8808k (another hard lock till power cycle) - panic does not seem to be viable option to me > > Regards, > Ronald. > > Van: Dmitry Morozovsky <woozle@woozle.net> > Datum: 27 februari 2025 09:33 > Aan: freebsd-fs@freebsd.org > CC: mm@freebsd.org > Onderwerp: zfs: non-redundant zpool suspended till hard boot on any transient > error > > > > > > > Colleagues, > > > > regarding situations like > > https://forums.freebsd.org/threads/external-drive-zfs-power-loss-insufficient-replicas-pool-suspended-cannot-online-dev-da0-pool-i-o-is-currently-suspended.94141/ > > (non redundant zpool on external drive) > > > > as noted, even subsecond disconnect on otherwise idle pool leads to instant > > and irreversible pool suspension. hard reset (as kernel hangs on I/O > > requests on suspended pool forever) is the only way to recover > > > > also, found this patch, but can't evaluate it myself: > > https://github.com/openzfs/zfs/pull/11082 > > > > any thoughts? thanks in advance! > > > > -- > > Sincerely, > > D.Marck [MCK-RIPE] > > [ FreeBSD committer: marck@FreeBSD.org ] > > --------------------------------------------------------------------------- > > *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- woozle@woozle.net *** > > --------------------------------------------------------------------------- > > > > > > > > > > -- Sincerely, D.Marck [MCK-RIPE] [ FreeBSD committer: marck@FreeBSD.org ] --------------------------------------------------------------------------- *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- woozle@woozle.net *** ---------------------------------------------------------------------------
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?alpine.BSF.2.00.2502281257030.11003>