Date: Tue, 31 Oct 2023 11:16:01 +0100 From: Alexander Leidinger <alexleidingerde@gmail.com> To: freebsd-fs@freebsd.org Subject: ZFS txg rollback: expected timeframe? Message-ID: <CAJg7qzHONfMeLUm20OE6Jo5uFLt9bY5VVhbY8z%2BoEVcHYwyoXw@mail.gmail.com>
next in thread | raw e-mail | index | archive | help
--000000000000af01d806090071d2 Content-Type: text/plain; charset="UTF-8" Hi, yes, the answer to $Subject is hard. I know. Issue: a overheating CPU may have corrupted a zpool (4 * 4TB in raidz2 setup) in a way that a normal import of the pool panics the machine with "VERIFY3(l->blk_birth == r->blk_birth) failed (101867360 == 101867222)". There are backups, but a zpoool import with "-N -F -T xxx" should work too and remove the need to restore from full backup (via USB) + incremental (from S3/tarsnap). During the crash there was a poudriere run of maybe 15 ports active (including qt6-<web-something>), ccache is in use for this. The rest (in amounts of data) is just little stuff. What is the expectation of the runtime on 5k rpm spinning rust (WD red)? So far all the disks are at 100% (gstat) since about 26h. On a related note: Is there a reason why "-T" is not documented? After a while I get "Panic String: deadlres_td_sleep_q: possible deadlock detected for 0xfffffe023ae831e0 (l2arc_feed_thread), blocked for 1802817 ticks" during such an import and I had to set debug.deadlkres.blktime_threshold: 1215752191 debug.deadlkres.slptime_threshold: 1215752191 Setting vfs.zfs.deadman.enabled=0 didn't help (it's still set). Is there something more wrong with my pool than expected, or is this some kind of a bug that such an import is triggering this panic? The SSDs with l2arc and ZIL don't show up at all in the gstat, and I don't expect them to show up on an import with rollback to a previous txg, as such I was surprised to see such a panic. Bye, Alexander. --000000000000af01d806090071d2 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable <div dir=3D"ltr"><div>Hi,</div><div><br></div><div>yes, the answer to $Subj= ect is hard. I know.</div><div><br></div><div>Issue: a overheating CPU may = have corrupted a zpool (4 * 4TB in raidz2 setup) in a way that a normal imp= ort of the pool panics the machine with "VERIFY3(l->blk_birth =3D= =3D r->blk_birth) failed (101867360 =3D=3D 101867222)".</div><div><= br></div><div>There are backups, but a zpoool import with "-N -F -T xx= x" should work too and remove the need to restore from full backup (vi= a=C2=A0 USB) + incremental (from S3/tarsnap).</div><div><br></div><div>Duri= ng the crash there was a poudriere run of maybe 15 ports active (including = qt6-<web-something>), ccache is in use for this. The rest (in amounts= of data) is just little stuff.</div><div><br></div><div>What is the expect= ation of the runtime on 5k rpm spinning rust (WD red)? So far all the disks= are at 100% (gstat) since about 26h.</div><div><br></div><div>On a related= note:</div><div>Is there a reason why "-T" is not documented?</d= iv><div>After a while I get "Panic String: deadlres_td_sleep_q: possib= le deadlock detected for 0xfffffe023ae831e0 (l2arc_feed_thread), blocked fo= r 1802817 ticks" during such an import and I had to set</div><div styl= e=3D"margin-left:40px">debug.deadlkres.blktime_threshold: 1215752191<br>deb= ug.deadlkres.slptime_threshold: 1215752191<br></div><div>Setting vfs.zfs.de= adman.enabled=3D0 didn't help (it's still set).</div><div>Is there = something more wrong with my pool than expected, or is this some kind of a = bug that such an import is triggering this panic?</div><div>The SSDs with l= 2arc and ZIL don't show up at all in the gstat, and I don't expect = them to show up on an import with rollback to a previous txg, as such I was= surprised to see such a panic.<br></div><div><br></div><div>Bye,</div><div= >Alexander.</div></div> --000000000000af01d806090071d2--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJg7qzHONfMeLUm20OE6Jo5uFLt9bY5VVhbY8z%2BoEVcHYwyoXw>